public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Loop splitting based on constant prefix of an array
@ 2022-05-27 21:12 Laleh Beni
  2022-05-30 12:14 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Laleh Beni @ 2022-05-27 21:12 UTC (permalink / raw)
  To: gcc

GCC compiler is able to understand if the prefix of an array holds
constant/static data and apply compiler optimizations on that partial
constant part of the array, however, it seems that it is not leveraging
this information in all cases.

On understanding the behavior of compiler optimization for partially
constant arrays and especially how the loop splitting pass could have an
influence on the potential constant related optimizations such as constant
folding I am using  the following example:



Considering an array where the prefix of that array is compile-time
constant data, and the rest of the array is runtime data, should the
compiler be able to optimize the calculation for the first part of the
array?

Let's look at the below example:



You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b



#include <cstddef>

inline int sum(const int array[], size_t len) {

  int res = 0;

  for (size_t i = 0; i < len; i++) {

        res += array[i];

  }

  return res;

}

int main(int argc, char** argv)

{

    int arr1[6] = {200,2,3, argc, argc+1, argc+2};

    return  sum(arr1, 6);

}





In our sum function we are measuring the some of the array elements, where
the first half of it is  static compile-time constants and the second half
are dynamic data.

When we compile this with the "x86-64 GCC 12.1" compiler with "-O3
-std=c++2a " flags, we get the following assembly code:



 main:

        mov     rax, QWORD PTR .LC0[rip]

        mov     DWORD PTR [rsp-28], edi

        mov     DWORD PTR [rsp-32], 3

        movq    xmm1, QWORD PTR [rsp-32]

        mov     QWORD PTR [rsp-40], rax

        movq    xmm0, QWORD PTR [rsp-40]

        lea     eax, [rdi+1]

        add     edi, 2

        mov     DWORD PTR [rsp-24], eax

        paddd   xmm0, xmm1

        mov     DWORD PTR [rsp-20], edi

        movq    xmm1, QWORD PTR [rsp-24]

        paddd   xmm0, xmm1

        movd    eax, xmm0

        pshufd  xmm2, xmm0, 0xe5

        movd    edx, xmm2

        add     eax, edx

        ret

.LC0:

        .long   200

        .long   2





However, if we add an “if” condition in the loop for calculating the result
of the sum, the if condition seems to enable the loop splitting pass:



You can see the code and its assembly here:  https://godbolt.org/z/ejecbjMKG



#include <cstddef>

inline int sum(const int array[], size_t len) {

  int res = 0;

  for (size_t i = 0; i < len; i++) {

    if (i < 1)

        res += array[i];

    else

        res += array[i];

  }

  return res;

}

int main(int argc, char** argv)

{

    int arr1[6] = {200,2,3, argc, argc+1, argc+2};

    return  sum(arr1, 6);

}





we get the following assembly code:



main:

        lea     eax, [rdi+208+rdi*2]

        ret





As you can see the “if” condition has the same calculation for both the
“if” and “else” branch in calculating the sum over the array, however, it
seems that it is triggering the “loop splitting pass” which results in
further optimizations such as constant folding of the whole computation and
resulting in such a smaller and faster assembly code.

My question is, why the compiler is not able to take advantage of
constantans in the prefix of the array in the first place?

Also adding a not necessary “if condition” which is just repeating the same
code for "if" and "else", doesn’t seem to be the best way to hint the
compiler to take advantage of this optimization; so is there another way to
make the compiler aware of this? ( I used the -fsplit-loops flag and it
didn't have any effect for this example.)



As a next step if we use an array that has some constant values in the
prefix but not a compile time constant length such as the following example:

Code link is here: https://godbolt.org/z/3qGqshzn9



#include <cstddef>

inline int sum(const int array[], size_t len) {

  int res = 0;

  for (size_t i = 0; i < len; i++) {

    if (i < 1)

        res += array[i];

    else

        res += array[i];

  }

  return res;

}

int main(int argc, char** argv)

{

    size_t len = argc+3;

    int arr3[len] = {600,10,1};

    for (unsigned int i = 3; i < len; i++) arr3[i] = argc+i;

    return sum(arr3, 2);

}



In this case the GCC compiler is not able to apply constant folding on the
first part of the array!

In general is there anyway that the GCC compiler would understand this and
apply constant folding optimizations here?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Loop splitting based on constant prefix of an array
  2022-05-27 21:12 Loop splitting based on constant prefix of an array Laleh Beni
@ 2022-05-30 12:14 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2022-05-30 12:14 UTC (permalink / raw)
  To: Laleh Beni; +Cc: GCC Development

On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc <gcc@gcc.gnu.org> wrote:
>
> GCC compiler is able to understand if the prefix of an array holds
> constant/static data and apply compiler optimizations on that partial
> constant part of the array, however, it seems that it is not leveraging
> this information in all cases.
>
> On understanding the behavior of compiler optimization for partially
> constant arrays and especially how the loop splitting pass could have an
> influence on the potential constant related optimizations such as constant
> folding I am using  the following example:
>
>
>
> Considering an array where the prefix of that array is compile-time
> constant data, and the rest of the array is runtime data, should the
> compiler be able to optimize the calculation for the first part of the
> array?
>
> Let's look at the below example:
>
>
>
> You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> In our sum function we are measuring the some of the array elements, where
> the first half of it is  static compile-time constants and the second half
> are dynamic data.
>
> When we compile this with the "x86-64 GCC 12.1" compiler with "-O3
> -std=c++2a " flags, we get the following assembly code:
>
>
>
>  main:
>
>         mov     rax, QWORD PTR .LC0[rip]
>
>         mov     DWORD PTR [rsp-28], edi
>
>         mov     DWORD PTR [rsp-32], 3
>
>         movq    xmm1, QWORD PTR [rsp-32]
>
>         mov     QWORD PTR [rsp-40], rax
>
>         movq    xmm0, QWORD PTR [rsp-40]
>
>         lea     eax, [rdi+1]
>
>         add     edi, 2
>
>         mov     DWORD PTR [rsp-24], eax
>
>         paddd   xmm0, xmm1
>
>         mov     DWORD PTR [rsp-20], edi
>
>         movq    xmm1, QWORD PTR [rsp-24]
>
>         paddd   xmm0, xmm1
>
>         movd    eax, xmm0
>
>         pshufd  xmm2, xmm0, 0xe5
>
>         movd    edx, xmm2
>
>         add     eax, edx
>
>         ret
>
> .LC0:
>
>         .long   200
>
>         .long   2
>
>
>
>
>
> However, if we add an “if” condition in the loop for calculating the result
> of the sum, the if condition seems to enable the loop splitting pass:
>
>
>
> You can see the code and its assembly here:  https://godbolt.org/z/ejecbjMKG
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>     if (i < 1)
>
>         res += array[i];
>
>     else
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> we get the following assembly code:
>
>
>
> main:
>
>         lea     eax, [rdi+208+rdi*2]
>
>         ret
>
>
>
>
>
> As you can see the “if” condition has the same calculation for both the
> “if” and “else” branch in calculating the sum over the array, however, it
> seems that it is triggering the “loop splitting pass” which results in
> further optimizations such as constant folding of the whole computation and
> resulting in such a smaller and faster assembly code.
>
> My question is, why the compiler is not able to take advantage of
> constantans in the prefix of the array in the first place?
>
> Also adding a not necessary “if condition” which is just repeating the same
> code for "if" and "else", doesn’t seem to be the best way to hint the
> compiler to take advantage of this optimization; so is there another way to
> make the compiler aware of this? ( I used the -fsplit-loops flag and it
> didn't have any effect for this example.)
>
>
>
> As a next step if we use an array that has some constant values in the
> prefix but not a compile time constant length such as the following example:
>
> Code link is here: https://godbolt.org/z/3qGqshzn9
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>     if (i < 1)
>
>         res += array[i];
>
>     else
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     size_t len = argc+3;
>
>     int arr3[len] = {600,10,1};
>
>     for (unsigned int i = 3; i < len; i++) arr3[i] = argc+i;
>
>     return sum(arr3, 2);
>
> }
>
>
>
> In this case the GCC compiler is not able to apply constant folding on the
> first part of the array!
>
> In general is there anyway that the GCC compiler would understand this and
> apply constant folding optimizations here?

GCC can currently only constant fold this when it decides to unroll the
loop portions operating in constant data.

Richard.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-05-30 12:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 21:12 Loop splitting based on constant prefix of an array Laleh Beni
2022-05-30 12:14 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).