From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by sourceware.org (Postfix) with ESMTPS id BE4DB385E03D for ; Mon, 30 May 2022 12:14:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE4DB385E03D Received: by mail-qv1-xf35.google.com with SMTP id cv1so9919442qvb.5 for ; Mon, 30 May 2022 05:14:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FbREDtVMW6xOAIxP7z8u+ocy4M5KpKEMIar6dp8ImAA=; b=tiP0/c4azadarB9sIiLXMOZDsV2y8s6L213mKxO5qLyiyqazo+xKtapDra5zLIgc9T su+inFmp8jnmGpgEjJRtmqqJUsuu5Pa6xIaMS7YUqhQ6mmoFTn5xiauc96LBgrSle1Ek 6oCToNf0GoXKZiGZkP6PIKvp06kSzrTB5mqym44ZmaZn0pPOT20AzatebGiJSPsosf9U QAA56apdLrUCUfxrLe1he0heh0U4zj48eHPEaKl22dzYchRtihE1ssI8w9fIOIlUYBfx AV9akMVrvyrkKV+YzxGupsyAATUVe6QsoTpqUv7NuQCTkaNuULKDK/ZTlf7sjhj8UKdm Inew== X-Gm-Message-State: AOAM532OLfqCFIdEoG3DwoUGUK07Zl6VkU/bj8xANqS7XgTAIGr8v4mc IVnvgBhYTj7U/DzBwCmLvZbUy9895owVgp2igAs= X-Google-Smtp-Source: ABdhPJyuCcDM9dPC3wUhFZOJQb+b17s9V2/WEgVPh1VLIcTwmuyN5MZXe716rHcp9g0gIw7TYqcetQY/hpUmwUv52B4= X-Received: by 2002:a05:6214:2307:b0:432:e753:e0c4 with SMTP id gc7-20020a056214230700b00432e753e0c4mr44641477qvb.55.1653912855072; Mon, 30 May 2022 05:14:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 30 May 2022 14:14:03 +0200 Message-ID: Subject: Re: Loop splitting based on constant prefix of an array To: Laleh Beni Cc: GCC Development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 May 2022 12:14:17 -0000 On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc wrote= : > > GCC compiler is able to understand if the prefix of an array holds > constant/static data and apply compiler optimizations on that partial > constant part of the array, however, it seems that it is not leveraging > this information in all cases. > > On understanding the behavior of compiler optimization for partially > constant arrays and especially how the loop splitting pass could have an > influence on the potential constant related optimizations such as constan= t > folding I am using the following example: > > > > Considering an array where the prefix of that array is compile-time > constant data, and the rest of the array is runtime data, should the > compiler be able to optimize the calculation for the first part of the > array? > > Let's look at the below example: > > > > You can see the code and its assembly here: https://godbolt.org/z/xjxbz43= 1b > > > > #include > > inline int sum(const int array[], size_t len) { > > int res =3D 0; > > for (size_t i =3D 0; i < len; i++) { > > res +=3D array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > int arr1[6] =3D {200,2,3, argc, argc+1, argc+2}; > > return sum(arr1, 6); > > } > > > > > > In our sum function we are measuring the some of the array elements, wher= e > the first half of it is static compile-time constants and the second hal= f > are dynamic data. > > When we compile this with the "x86-64 GCC 12.1" compiler with "-O3 > -std=3Dc++2a " flags, we get the following assembly code: > > > > main: > > mov rax, QWORD PTR .LC0[rip] > > mov DWORD PTR [rsp-28], edi > > mov DWORD PTR [rsp-32], 3 > > movq xmm1, QWORD PTR [rsp-32] > > mov QWORD PTR [rsp-40], rax > > movq xmm0, QWORD PTR [rsp-40] > > lea eax, [rdi+1] > > add edi, 2 > > mov DWORD PTR [rsp-24], eax > > paddd xmm0, xmm1 > > mov DWORD PTR [rsp-20], edi > > movq xmm1, QWORD PTR [rsp-24] > > paddd xmm0, xmm1 > > movd eax, xmm0 > > pshufd xmm2, xmm0, 0xe5 > > movd edx, xmm2 > > add eax, edx > > ret > > .LC0: > > .long 200 > > .long 2 > > > > > > However, if we add an =E2=80=9Cif=E2=80=9D condition in the loop for calc= ulating the result > of the sum, the if condition seems to enable the loop splitting pass: > > > > You can see the code and its assembly here: https://godbolt.org/z/ejecbj= MKG > > > > #include > > inline int sum(const int array[], size_t len) { > > int res =3D 0; > > for (size_t i =3D 0; i < len; i++) { > > if (i < 1) > > res +=3D array[i]; > > else > > res +=3D array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > int arr1[6] =3D {200,2,3, argc, argc+1, argc+2}; > > return sum(arr1, 6); > > } > > > > > > we get the following assembly code: > > > > main: > > lea eax, [rdi+208+rdi*2] > > ret > > > > > > As you can see the =E2=80=9Cif=E2=80=9D condition has the same calculatio= n for both the > =E2=80=9Cif=E2=80=9D and =E2=80=9Celse=E2=80=9D branch in calculating the= sum over the array, however, it > seems that it is triggering the =E2=80=9Cloop splitting pass=E2=80=9D whi= ch results in > further optimizations such as constant folding of the whole computation a= nd > resulting in such a smaller and faster assembly code. > > My question is, why the compiler is not able to take advantage of > constantans in the prefix of the array in the first place? > > Also adding a not necessary =E2=80=9Cif condition=E2=80=9D which is just = repeating the same > code for "if" and "else", doesn=E2=80=99t seem to be the best way to hint= the > compiler to take advantage of this optimization; so is there another way = to > make the compiler aware of this? ( I used the -fsplit-loops flag and it > didn't have any effect for this example.) > > > > As a next step if we use an array that has some constant values in the > prefix but not a compile time constant length such as the following examp= le: > > Code link is here: https://godbolt.org/z/3qGqshzn9 > > > > #include > > inline int sum(const int array[], size_t len) { > > int res =3D 0; > > for (size_t i =3D 0; i < len; i++) { > > if (i < 1) > > res +=3D array[i]; > > else > > res +=3D array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > size_t len =3D argc+3; > > int arr3[len] =3D {600,10,1}; > > for (unsigned int i =3D 3; i < len; i++) arr3[i] =3D argc+i; > > return sum(arr3, 2); > > } > > > > In this case the GCC compiler is not able to apply constant folding on th= e > first part of the array! > > In general is there anyway that the GCC compiler would understand this an= d > apply constant folding optimizations here? GCC can currently only constant fold this when it decides to unroll the loop portions operating in constant data. Richard.