[Bug tree-optimization/111563] New: Missed optimization of LICM

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/111563] New: Missed optimization of LICM
@ 2023-09-24  3:42 652023330028 at smail dot nju.edu.cn
  2023-09-24  3:54 ` [Bug tree-optimization/111563] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: 652023330028 at smail dot nju.edu.cn @ 2023-09-24  3:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

            Bug ID: 111563
           Summary: Missed optimization of LICM
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: 652023330028 at smail dot nju.edu.cn
  Target Milestone: ---

Hello, we found some optimizations (regarding Loop Invariant Code Motion) that
GCC may have missed. We would greatly appreicate if you can take a look and let
us know what you think.
Here are two examples.


Example 1:
https://godbolt.org/z/cn9v45vdh

Given the following code: 
```c++
extern int var_12;
extern int var_13;
extern int var_17;
void test(int var_0, int var_1, int var_5, int var_6, int var_9) {

    for (int i_4 = 1; i_4 < var_0-2147483638; i_4 += 1) 
    {
        var_17 += 1774623354;
        if (var_9)
        {
            var_13 += var_1;
            var_17 += var_6 + var_0;
            var_12 += var_5;
        }
        else
        {
            var_17 += var_6;
        }
        i_4+=i_4/3;;
    }
}
```

In fact, `var_6 + var_0` is a loop invariant, but gcc-trunk -O3:
```asm
test(int, int, int, int, int):
        cmp     edi, 2147483639
        jle     .L15
        mov     r9d, DWORD PTR var_17[rip]
        lea     r10d, [rdi-2147483638]
        test    r8d, r8d
        jne     .L4
        lea     edi, [rcx+1774623354]
        mov     edx, 1
        lea     ecx, [r9+1774623354+rcx]
.L5:
        movsx   rax, edx
        mov     esi, edx
        imul    rax, rax, 1431655766
        sar     esi, 31
        shr     rax, 32
        sub     eax, esi
        lea     edx, [rdx+1+rax]
        mov     eax, ecx
        add     ecx, edi
        cmp     r10d, edx
        jg      .L5
        mov     DWORD PTR var_17[rip], eax
        ret
.L15:
        ret
.L4:
        mov     eax, edi
        mov     edi, DWORD PTR var_13[rip]
        push    rbx
        mov     ebx, esi
        mov     r11d, edx
        lea     r8d, [rax+1774623354+rcx]
        add     edi, esi
        mov     esi, DWORD PTR var_12[rip]
        add     esi, edx
        mov     edx, 1
.L7:
        movsx   rax, edx
        mov     ecx, edx
        add     r9d, r8d
        imul    rax, rax, 1431655766
        sar     ecx, 31
        shr     rax, 32
        sub     eax, ecx
        mov     ecx, esi
        add     esi, r11d
        lea     edx, [rax+1+rdx]
        mov     eax, edi
        add     edi, ebx
        cmp     edx, r10d
        jl      .L7
        mov     DWORD PTR var_17[rip], r9d
        pop     rbx
        mov     DWORD PTR var_12[rip], ecx
        mov     DWORD PTR var_13[rip], eax
        ret
```


Example 2:
https://godbolt.org/z/418K59eoE

Given the following code: 
```c++
extern int var_24;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

    for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
    {
        if(var_3){
            var_24 += (-(200 / var_10)) + (-var_8);
            var_24 += var_14 + var_2;
        }

        i_2+=i_2/3;
    }
}
```

We note that `(-(200 / var_10)) + (-var_8) + var_14 + var_2` as a whole can be
treated as a loop invariant, but gcc-trunk -O3 does not:
```asm
test(int, int, int, int, int):
        mov     r10d, edi
        mov     r11d, edx
        test    esi, esi
        je      .L1
        mov     eax, -200
        mov     edx, -1
        mov     edi, DWORD PTR var_24[rip]
        add     r8d, r10d
        idiv    ecx
        lea     r9d, [rsi+1056714155]
        mov     ecx, -3247424
        sub     eax, r11d
.L3:
        movsx   rdx, ecx
        mov     esi, ecx
        add     edi, eax
        imul    rdx, rdx, 1431655766
        sar     esi, 31
        add     edi, r8d
        shr     rdx, 32
        sub     edx, esi
        add     ecx, edx
        add     ecx, r9d
        cmp     ecx, 18
        jle     .L3
        mov     DWORD PTR var_24[rip], edi
.L1:
        ret
```

Thank you very much for your time and effort! We look forward to hearing from
you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
@ 2023-09-24  3:54 ` pinskia at gcc dot gnu.org
  2023-09-24 16:01 ` 652023330028 at smail dot nju.edu.cn
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-24  3:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
_5 = var_0_16(D) + var_6_18(D);
  invariant up to level 1, cost 1.

Basically because the cost is not high enough ...

If you use   --param=lim-expensive=1. then it will pull it out of the loop.

So the cost model is doing the correct thing here ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
  2023-09-24  3:54 ` [Bug tree-optimization/111563] " pinskia at gcc dot gnu.org
@ 2023-09-24 16:01 ` 652023330028 at smail dot nju.edu.cn
  2023-09-24 16:33 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: 652023330028 at smail dot nju.edu.cn @ 2023-09-24 16:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #2 from Yi <652023330028 at smail dot nju.edu.cn> ---
(In reply to Andrew Pinski from comment #1)
> _5 = var_0_16(D) + var_6_18(D);
>   invariant up to level 1, cost 1.
> 
> Basically because the cost is not high enough ...
> 
> If you use   --param=lim-expensive=1. then it will pull it out of the loop.
> 
> So the cost model is doing the correct thing here ...


Thank you very much for your prompt reply! It took me some time to confirm our
work. For Example 1, GCC does exactly what you say it does. But for Example 2,
it doesn't seem to work as expected.

https://godbolt.org/z/eeWThnqWs

Given the following code: 
```c++
extern int var_24;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

    for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
    {
        if(var_3){
            var_24 += (-(200 / var_10)) + (-var_8);
            var_24 += var_14 + var_2;
        }

        i_2+=i_2/3;
    }
}
```

We note that `(-(200 / var_10)) + (-var_8) + var_14 + var_2` as a whole can be
treated as a loop invariant, but gcc-trunk -O3 --param=lim-expensive=1 does
not:

```asm
test(int, int, int, int, int):
        mov     r10d, edx
        test    esi, esi
        je      .L1
        mov     eax, -200
        mov     edx, -1
        mov     r9d, DWORD PTR var_24[rip]
        add     r8d, edi
        idiv    ecx
        lea     edi, [rsi+1056714155]
        mov     ecx, -3247424
        sub     eax, r10d
.L3:
        movsx   rdx, ecx
        mov     esi, ecx
        add     r9d, eax                # ... + (-(200 / var_10)) + (-var_8)
        imul    rdx, rdx, 1431655766
        sar     esi, 31
        add     r9d, r8d                # ... + var_14 + var_2
        shr     rdx, 32
        sub     edx, esi
        add     ecx, edx
        add     ecx, edi
        cmp     ecx, 18
        jle     .L3
        mov     DWORD PTR var_24[rip], r9d
.L1:
        ret
```

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
  2023-09-24  3:54 ` [Bug tree-optimization/111563] " pinskia at gcc dot gnu.org
  2023-09-24 16:01 ` 652023330028 at smail dot nju.edu.cn
@ 2023-09-24 16:33 ` pinskia at gcc dot gnu.org
  2023-09-24 16:38 ` 652023330028 at smail dot nju.edu.cn
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-09-24 16:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Yi from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > _5 = var_0_16(D) + var_6_18(D);
> >   invariant up to level 1, cost 1.
> > 
> > Basically because the cost is not high enough ...
> > 
> > If you use   --param=lim-expensive=1. then it will pull it out of the loop.
> > 
> > So the cost model is doing the correct thing here ...
> 
> 
> Thank you very much for your prompt reply! It took me some time to confirm
> our work. For Example 1, GCC does exactly what you say it does. But for
> Example 2, it doesn't seem to work as expected.
> 
> We note that `(-(200 / var_10)) + (-var_8) + var_14 + var_2` as a whole can
> be treated as a loop invariant, but gcc-trunk -O3 --param=lim-expensive=1
> does not:

  _4 = _2 + var_24_lsm.5_21;
  _6 = _4 + _5;

So this is again reassociation with LIM, the same issue as PR 111560.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
                   ` (2 preceding siblings ...)
  2023-09-24 16:33 ` pinskia at gcc dot gnu.org
@ 2023-09-24 16:38 ` 652023330028 at smail dot nju.edu.cn
  2023-09-25 10:22 ` 652023330028 at smail dot nju.edu.cn
  2023-09-29  0:40 ` 652023330028 at smail dot nju.edu.cn
  5 siblings, 0 replies; 7+ messages in thread
From: 652023330028 at smail dot nju.edu.cn @ 2023-09-24 16:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #4 from Yi <652023330028 at smail dot nju.edu.cn> ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Yi from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > _5 = var_0_16(D) + var_6_18(D);
> > >   invariant up to level 1, cost 1.
> > > 
> > > Basically because the cost is not high enough ...
> > > 
> > > If you use   --param=lim-expensive=1. then it will pull it out of the loop.
> > > 
> > > So the cost model is doing the correct thing here ...
> > 
> > 
> > Thank you very much for your prompt reply! It took me some time to confirm
> > our work. For Example 1, GCC does exactly what you say it does. But for
> > Example 2, it doesn't seem to work as expected.
> > 
> > We note that `(-(200 / var_10)) + (-var_8) + var_14 + var_2` as a whole can
> > be treated as a loop invariant, but gcc-trunk -O3 --param=lim-expensive=1
> > does not:
> 
>   _4 = _2 + var_24_lsm.5_21;
>   _6 = _4 + _5;
> 
> So this is again reassociation with LIM, the same issue as PR 111560.

I see. Thank you very much for your reply!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
                   ` (3 preceding siblings ...)
  2023-09-24 16:38 ` 652023330028 at smail dot nju.edu.cn
@ 2023-09-25 10:22 ` 652023330028 at smail dot nju.edu.cn
  2023-09-29  0:40 ` 652023330028 at smail dot nju.edu.cn
  5 siblings, 0 replies; 7+ messages in thread
From: 652023330028 at smail dot nju.edu.cn @ 2023-09-25 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #5 from Yi <652023330028 at smail dot nju.edu.cn> ---
(In reply to Andrew Pinski from comment #3)

> So this is again reassociation with LIM, the same issue as PR 111560.

For this similar code, GCC works as expected:
https://godbolt.org/z/3TaqfeTqb

```c++
extern int var_24;
int t;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

    for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
    {
        var_24 += (-(200 / var_10)) + (-var_8);
        var_24 += var_14 + var_2;

        i_2+=i_2/3;
    }
}
```
So it seems that this and PR 111560 may not be due to the same cause.

Because it doesn't seem to be relevant to the statement, "Our re-association
only produces a canonical order within a single expression."



Meanwhile, in Example 2, 'if(var_3)' is actually optimized out of the Loop by
Loop Unswitch. So maybe the rest of the loop should be optimized as expected
like this similar code?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/111563] Missed optimization of LICM
  2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
                   ` (4 preceding siblings ...)
  2023-09-25 10:22 ` 652023330028 at smail dot nju.edu.cn
@ 2023-09-29  0:40 ` 652023330028 at smail dot nju.edu.cn
  5 siblings, 0 replies; 7+ messages in thread
From: 652023330028 at smail dot nju.edu.cn @ 2023-09-29  0:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111563

--- Comment #6 from Yi <652023330028 at smail dot nju.edu.cn> ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Yi from comment #2)
> > (In reply to Andrew Pinski from comment #1)
> > > _5 = var_0_16(D) + var_6_18(D);
> > >   invariant up to level 1, cost 1.
> > > 
> > > Basically because the cost is not high enough ...
> > > 
> > > If you use   --param=lim-expensive=1. then it will pull it out of the loop.
> > > 
> > > So the cost model is doing the correct thing here ...
> > 
> > 
> > Thank you very much for your prompt reply! It took me some time to confirm
> > our work. For Example 1, GCC does exactly what you say it does. But for
> > Example 2, it doesn't seem to work as expected.
> > 
> > We note that `(-(200 / var_10)) + (-var_8) + var_14 + var_2` as a whole can
> > be treated as a loop invariant, but gcc-trunk -O3 --param=lim-expensive=1
> > does not:
> 
>   _4 = _2 + var_24_lsm.5_21;
>   _6 = _4 + _5;
> 
> So this is again reassociation with LIM, the same issue as PR 111560.



Hi Andrew, 

We would greatly appreicate if you can help us here. 

extern int var_24;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

    for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
    {
        if(var_3){
            var_24 += (-(200 / var_10)) + (-var_8);
            var_24 += var_14 + var_2;
        }

        i_2+=i_2/3;
    }
}

Again, for the above code, if it were the reassociation issue that causes the
loop invariant not being moved out of the loop, 
the following code should have the same problem. However, the code below is
being optimized properly, thereby, there
seems to be some other factors at play here. It would be great if you can
confirm it or (if possible) fix it. Thank you
so much!

extern int var_24;
int t;
void test(int var_2, int var_3, int var_8, int var_10, int var_14) {

    for (int i_2 = -3247424; i_2 < 19; i_2 += var_3 + 1056714155) 
    {
        var_24 += (-(200 / var_10)) + (-var_8);
        var_24 += var_14 + var_2;

        i_2+=i_2/3;
    }
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-29  0:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-24  3:42 [Bug tree-optimization/111563] New: Missed optimization of LICM 652023330028 at smail dot nju.edu.cn
2023-09-24  3:54 ` [Bug tree-optimization/111563] " pinskia at gcc dot gnu.org
2023-09-24 16:01 ` 652023330028 at smail dot nju.edu.cn
2023-09-24 16:33 ` pinskia at gcc dot gnu.org
2023-09-24 16:38 ` 652023330028 at smail dot nju.edu.cn
2023-09-25 10:22 ` 652023330028 at smail dot nju.edu.cn
2023-09-29  0:40 ` 652023330028 at smail dot nju.edu.cn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).