From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com
 [IPv6:2607:f8b0:4864:20::f35])
 by sourceware.org (Postfix) with ESMTPS id BE4DB385E03D
 for <gcc@gcc.gnu.org>; Mon, 30 May 2022 12:14:15 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE4DB385E03D
Received: by mail-qv1-xf35.google.com with SMTP id cv1so9919442qvb.5
 for <gcc@gcc.gnu.org>; Mon, 30 May 2022 05:14:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=FbREDtVMW6xOAIxP7z8u+ocy4M5KpKEMIar6dp8ImAA=;
 b=tiP0/c4azadarB9sIiLXMOZDsV2y8s6L213mKxO5qLyiyqazo+xKtapDra5zLIgc9T
 su+inFmp8jnmGpgEjJRtmqqJUsuu5Pa6xIaMS7YUqhQ6mmoFTn5xiauc96LBgrSle1Ek
 6oCToNf0GoXKZiGZkP6PIKvp06kSzrTB5mqym44ZmaZn0pPOT20AzatebGiJSPsosf9U
 QAA56apdLrUCUfxrLe1he0heh0U4zj48eHPEaKl22dzYchRtihE1ssI8w9fIOIlUYBfx
 AV9akMVrvyrkKV+YzxGupsyAATUVe6QsoTpqUv7NuQCTkaNuULKDK/ZTlf7sjhj8UKdm
 Inew==
X-Gm-Message-State: AOAM532OLfqCFIdEoG3DwoUGUK07Zl6VkU/bj8xANqS7XgTAIGr8v4mc
 IVnvgBhYTj7U/DzBwCmLvZbUy9895owVgp2igAs=
X-Google-Smtp-Source: ABdhPJyuCcDM9dPC3wUhFZOJQb+b17s9V2/WEgVPh1VLIcTwmuyN5MZXe716rHcp9g0gIw7TYqcetQY/hpUmwUv52B4=
X-Received: by 2002:a05:6214:2307:b0:432:e753:e0c4 with SMTP id
 gc7-20020a056214230700b00432e753e0c4mr44641477qvb.55.1653912855072; Mon, 30
 May 2022 05:14:15 -0700 (PDT)
MIME-Version: 1.0
References: <CAOEUB2C-uTpZGisBzszYFGXRBd56GaaYWRp2mfRWuUYFoGJ9wA@mail.gmail.com>
In-Reply-To: <CAOEUB2C-uTpZGisBzszYFGXRBd56GaaYWRp2mfRWuUYFoGJ9wA@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 30 May 2022 14:14:03 +0200
Message-ID: <CAFiYyc3NzgV1XHXrJHy=Bwe=8=wws0sJrCUrp-ku-et5P5Sntg@mail.gmail.com>
Subject: Re: Loop splitting based on constant prefix of an array
To: Laleh Beni <laleh.beni@gmail.com>
Cc: GCC Development <gcc@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 30 May 2022 12:14:17 -0000

On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc <gcc@gcc.gnu.org> wrote=
:
>
> GCC compiler is able to understand if the prefix of an array holds
> constant/static data and apply compiler optimizations on that partial
> constant part of the array, however, it seems that it is not leveraging
> this information in all cases.
>
> On understanding the behavior of compiler optimization for partially
> constant arrays and especially how the loop splitting pass could have an
> influence on the potential constant related optimizations such as constan=
t
> folding I am using  the following example:
>
>
>
> Considering an array where the prefix of that array is compile-time
> constant data, and the rest of the array is runtime data, should the
> compiler be able to optimize the calculation for the first part of the
> array?
>
> Let's look at the below example:
>
>
>
> You can see the code and its assembly here: https://godbolt.org/z/xjxbz43=
1b
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res =3D 0;
>
>   for (size_t i =3D 0; i < len; i++) {
>
>         res +=3D array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] =3D {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> In our sum function we are measuring the some of the array elements, wher=
e
> the first half of it is  static compile-time constants and the second hal=
f
> are dynamic data.
>
> When we compile this with the "x86-64 GCC 12.1" compiler with "-O3
> -std=3Dc++2a " flags, we get the following assembly code:
>
>
>
>  main:
>
>         mov     rax, QWORD PTR .LC0[rip]
>
>         mov     DWORD PTR [rsp-28], edi
>
>         mov     DWORD PTR [rsp-32], 3
>
>         movq    xmm1, QWORD PTR [rsp-32]
>
>         mov     QWORD PTR [rsp-40], rax
>
>         movq    xmm0, QWORD PTR [rsp-40]
>
>         lea     eax, [rdi+1]
>
>         add     edi, 2
>
>         mov     DWORD PTR [rsp-24], eax
>
>         paddd   xmm0, xmm1
>
>         mov     DWORD PTR [rsp-20], edi
>
>         movq    xmm1, QWORD PTR [rsp-24]
>
>         paddd   xmm0, xmm1
>
>         movd    eax, xmm0
>
>         pshufd  xmm2, xmm0, 0xe5
>
>         movd    edx, xmm2
>
>         add     eax, edx
>
>         ret
>
> .LC0:
>
>         .long   200
>
>         .long   2
>
>
>
>
>
> However, if we add an =E2=80=9Cif=E2=80=9D condition in the loop for calc=
ulating the result
> of the sum, the if condition seems to enable the loop splitting pass:
>
>
>
> You can see the code and its assembly here:  https://godbolt.org/z/ejecbj=
MKG
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res =3D 0;
>
>   for (size_t i =3D 0; i < len; i++) {
>
>     if (i < 1)
>
>         res +=3D array[i];
>
>     else
>
>         res +=3D array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] =3D {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> we get the following assembly code:
>
>
>
> main:
>
>         lea     eax, [rdi+208+rdi*2]
>
>         ret
>
>
>
>
>
> As you can see the =E2=80=9Cif=E2=80=9D condition has the same calculatio=
n for both the
> =E2=80=9Cif=E2=80=9D and =E2=80=9Celse=E2=80=9D branch in calculating the=
 sum over the array, however, it
> seems that it is triggering the =E2=80=9Cloop splitting pass=E2=80=9D whi=
ch results in
> further optimizations such as constant folding of the whole computation a=
nd
> resulting in such a smaller and faster assembly code.
>
> My question is, why the compiler is not able to take advantage of
> constantans in the prefix of the array in the first place?
>
> Also adding a not necessary =E2=80=9Cif condition=E2=80=9D which is just =
repeating the same
> code for "if" and "else", doesn=E2=80=99t seem to be the best way to hint=
 the
> compiler to take advantage of this optimization; so is there another way =
to
> make the compiler aware of this? ( I used the -fsplit-loops flag and it
> didn't have any effect for this example.)
>
>
>
> As a next step if we use an array that has some constant values in the
> prefix but not a compile time constant length such as the following examp=
le:
>
> Code link is here: https://godbolt.org/z/3qGqshzn9
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res =3D 0;
>
>   for (size_t i =3D 0; i < len; i++) {
>
>     if (i < 1)
>
>         res +=3D array[i];
>
>     else
>
>         res +=3D array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     size_t len =3D argc+3;
>
>     int arr3[len] =3D {600,10,1};
>
>     for (unsigned int i =3D 3; i < len; i++) arr3[i] =3D argc+i;
>
>     return sum(arr3, 2);
>
> }
>
>
>
> In this case the GCC compiler is not able to apply constant folding on th=
e
> first part of the array!
>
> In general is there anyway that the GCC compiler would understand this an=
d
> apply constant folding optimizations here?

GCC can currently only constant fold this when it decides to unroll the
loop portions operating in constant data.

Richard.