public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region
@ 2023-06-12  6:25 wwwhhhyyy333 at gmail dot com
  2023-06-12  6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2023-06-12  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

            Bug ID: 110215
           Summary: RA fails to allocate register when loop invariant
                    lives through EH region
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
  Target Milestone: ---

Created attachment 55305
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55305&action=edit
A Testcase

Compiled with -Ofast, The innermost loop is

.L41:
        movups  (%rax), %xmm3
        movaps  (%rsp), %xmm0
        addq    $16, %rax
        subps   %xmm3, %xmm0
        andps   %xmm2, %xmm0
        movups  %xmm0, -16(%rax)
        addps   %xmm0, %xmm1
        cmpq    %rax, %rdx
        jne     .L41

While for Clang it produces

.LBB0_14:                               #   Parent Loop BB0_3 Depth=1
        movups  (%rbp,%rax), %xmm1
        movaps  %xmm3, %xmm2
        subps   %xmm1, %xmm2
        andps   %xmm4, %xmm2
        movups  %xmm2, (%rbp,%rax)
        addps   %xmm2, %xmm0
        addq    $16, %rax
        cmpq    %rax, %r12
        jne     .LBB0_14

The loop invariant `base` was spilled to stack in GCC, but for clang it can
directly use a sse register.

Godbolt: https://godbolt.org/z/TTvG8M6E8

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
@ 2023-06-12  6:37 ` pinskia at gcc dot gnu.org
  2023-06-12  6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-12  6:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|RA fails to allocate        |RA fails to allocate
                   |register when loop          |register when loop
                   |invariant lives through EH  |invariant lives across
                   |region                      |calls
           Keywords|ra                          |

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
happens on aarch64 also:
```
.L41:
        ldr     q31, [x0]
        ldr     q29, [sp, 112]
        fabd    v31.4s, v29.4s, v31.4s
        fadd    v30.4s, v30.4s, v31.4s
        str     q31, [x0], 16
        cmp     x1, x0
        bne     .L41
```


Gimple level looks like:
  <bb 19> [local count: 372044713]:
  # vect_sum_lsm.128_11.134_88 = PHI <vect__7.142_39(19), { 0.0, 0.0, 0.0, 0.0
}(18)>
  # ivtmp.155_176 = PHI <ivtmp.155_175(19), ivtmp.155_174(18)>
  _173 = (void *) ivtmp.155_176;
  vect__4.137_85 = MEM <vector(4) float> [(value_type &)_173];
  vect__5.138_74 = vect_cst__84 - vect__4.137_85;
  vect__38.139_68 = ABS_EXPR <vect__5.138_74>;
  MEM <vector(4) float> [(value_type &)_173] = vect__38.139_68;
  vect__7.142_39 = vect__38.139_68 + vect_sum_lsm.128_11.134_88;
  ivtmp.155_175 = ivtmp.155_176 + 16;
  if (_6 != ivtmp.155_175)
    goto <bb 19>; [83.33%]
  else
    goto <bb 20>; [16.67%]

That would be vect_cst__84 .

So what I think is happening is the spilling is happening is not related at all
to EH but rather a call.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
  2023-06-12  6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
@ 2023-06-12  6:52 ` pinskia at gcc dot gnu.org
  2023-06-12  8:59 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-12  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
            Summary|RA fails to allocate        |RA fails to allocate
                   |register when loop          |register when loop
                   |invariant lives across      |invariant lives across
                   |calls                       |calls and eh
     Ever confirmed|0                           |1
           Keywords|                            |ra
   Last reconfirmed|                            |2023-06-12

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced testcase for both x86_64 and aarch64:
```
#define vec __attribute__((vector_size(4*sizeof(float))))
struct s1
{
 s1();
 ~s1();
};
void g();
void g(float);
void f(float a, float b, vec float **c, int n, int j)
{
        s1 t2;
        float t = a/b;
        vec float d = {t, t, t, t};
        for (int l = 0; l < j; l++)
        {
                vec float s = {};
                for(int i =0;i<n;i++)
                {
                        c[l][i]+=d;
                        s+=c[l][i];
                }
                float sum = s[0]+s[1]+s[2]+s[3];
                g(sum);
        }
        g();
}
```

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
  2023-06-12  6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
  2023-06-12  6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
@ 2023-06-12  8:59 ` rguenth at gcc dot gnu.org
  2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-12  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org
           Keywords|EH                          |

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we fail to sink

 d_29 = {t_28, t_28, t_28 t_28};

we compute a good place in select_best_block but then since it is at the
same loop depth as the original place we apply

  /* If BEST_BB is at the same nesting level, then require it to have
     significantly lower execution frequency to avoid gratuitous movement.  */
  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
      /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
      && !(best_bb->count * 100 >= early_bb->count * threshold))
    return best_bb;

and fail to sink.  I'm not exactly sure why we do the above - we probably
should when best_bb post-dominates early_bb, also if the sunk stmt
possibly (or provably) will enlarge lifetime of its uses (but that's also
hard to guess since we process sinking of the defs of the uses only
afterwards).  In this case we have a single use and a single def so
sinking shouldn't make things worse.  We could also weight in
spilling class of a reg here.

In our case we have the dominated block with a higher(!) count than
the dominating block which means the profile is corrupt.

With --param sink-frequency-threshold we sink the ctor and the feeding
division but still get

.L5:
        movq    (%rbx), %rax
        pxor    %xmm1, %xmm1
        leaq    0(%rbp,%rax), %rdx
        .p2align 4,,10
        .p2align 3
.L4:
        movaps  (%rsp), %xmm0
        addps   (%rax), %xmm0
        addq    $16, %rax
        movaps  %xmm0, -16(%rax)
        addps   %xmm0, %xmm1
        cmpq    %rax, %rdx
        jne     .L4
        movaps  %xmm1, %xmm0
        movhlps %xmm1, %xmm0
        addps   %xmm0, %xmm1
        movaps  %xmm1, %xmm0
        shufps  $85, %xmm1, %xmm0
        addps   %xmm1, %xmm0
.LEHB1:
        call    _Z1gf
        addq    $8, %rbx
        cmpq    %rbx, %r12
        jne     .L5

because we (rightfully so) refuse to sink into the outer loop.  What we
fail to do is hoist the reload out of the inner loop (I suppose
clang does exactly that).

We don't have any pass after reload that would perform loop invatiant motion,
I'm not sure how this situation is handled in general in RA - is a post-RA
pass optimizing the spill/reload placement "globally" usually done?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
                   ` (2 preceding siblings ...)
  2023-06-12  8:59 ` rguenth at gcc dot gnu.org
@ 2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
  2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2023-06-14 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #4 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> 
> 
> We don't have any pass after reload that would perform loop invatiant motion,
> I'm not sure how this situation is handled in general in RA - is a post-RA
> pass optimizing the spill/reload placement "globally" usually done?

LRA does not do placement of reload insns.  Global RA is supposed to do this
when it forms regions for the allocation.

I've been working on this issue.  I hope the fix will be ready on this week.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
                   ` (3 preceding siblings ...)
  2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
@ 2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
  2023-06-27  6:00 ` wwwhhhyyy333 at gmail dot com
  2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-16 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:154c69039571c66b3a6d16ecfa9e6ff22942f59f

commit r14-1891-g154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Fri Jun 16 11:12:32 2023 -0400

    RA: Ignore conflicts for some pseudos from insns throwing a final exception

    IRA adds conflicts to the pseudos from insns can throw exceptions
    internally even if the exception code is final for the function and
    the pseudo value is not used in the exception code.  This results in
    spilling a pseudo in a loop (see
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).

    The following patch fixes the problem.

            PR rtl-optimization/110215

    gcc/ChangeLog:

            * ira-lives.cc: Include except.h.
            (process_bb_node_lives): Ignore conflicts from cleanup exceptions
            when the pseudo does not live at the exception landing pad.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
                   ` (4 preceding siblings ...)
  2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
@ 2023-06-27  6:00 ` wwwhhhyyy333 at gmail dot com
  2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2023-06-27  6:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #6 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
Thanks for the fix, now for the attached test, main loop will not have any
load. 

There is a remaining issue that the loop epilogue still contains load from
stack and constant pool

.L9:
        movslq  %edx, %rax
        movss   72(%rsp), %xmm5
        salq    $2, %rax
        leaq    (%rbx,%rax), %rcx
        movaps  %xmm5, %xmm1
        subss   (%rcx), %xmm1
        andps   .LC4(%rip), %xmm1
        movss   %xmm1, (%rcx)
        leal    1(%rdx), %ecx
        addss   %xmm1, %xmm0
        cmpl    %ecx, %r12d
        jle     .L8

IRA dump shows the pseudos does not have conflict but they still failed to be
allocated with register. This issue does not exist on aarch64.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh
  2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
                   ` (5 preceding siblings ...)
  2023-06-27  6:00 ` wwwhhhyyy333 at gmail dot com
@ 2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-09 18:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:a99f6bb142bc4506dcb8aa2b7722310ad92e4528

commit r14-5294-ga99f6bb142bc4506dcb8aa2b7722310ad92e4528
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Thu Nov 9 08:51:15 2023 -0500

    [IRA]: Fixing conflict calculation from region landing pads.

    The following patch fixes conflict calculation from exception landing
    pads.  The previous patch processed only one newly created landing pad.
    Besides it was wrong, it also resulted in large memory consumption by IRA.

    gcc/ChangeLog:

            PR rtl-optimization/110215
            * ira-lives.cc: (add_conflict_from_region_landing_pads): New
            function.
            (process_bb_node_lives): Use it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-09 18:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12  6:25 [Bug rtl-optimization/110215] New: RA fails to allocate register when loop invariant lives through EH region wwwhhhyyy333 at gmail dot com
2023-06-12  6:37 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls pinskia at gcc dot gnu.org
2023-06-12  6:52 ` [Bug rtl-optimization/110215] RA fails to allocate register when loop invariant lives across calls and eh pinskia at gcc dot gnu.org
2023-06-12  8:59 ` rguenth at gcc dot gnu.org
2023-06-14 13:20 ` vmakarov at gcc dot gnu.org
2023-06-16 15:17 ` cvs-commit at gcc dot gnu.org
2023-06-27  6:00 ` wwwhhhyyy333 at gmail dot com
2023-11-09 18:25 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).