[Bug middle-end/109849] New: suboptimal code for vector walking loop

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/109849] New: suboptimal code for vector walking loop
@ 2023-05-13 22:26 hubicka at gcc dot gnu.org
  2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
                   ` (36 more replies)
  0 siblings, 37 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-13 22:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

            Bug ID: 109849
           Summary: suboptimal code for vector walking loop
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

jan@localhost:/tmp> cat t.C
#include <vector>
typedef unsigned int uint32_t;
std::vector<std::pair<uint32_t, uint32_t>> stack;
void
test()
{
        while (!stack.empty()) {
                std::pair<uint32_t, uint32_t> cur = stack.back();
                stack.pop_back();
                if (cur.second)
                        break;
        }
}
jan@localhost:/tmp> gcc t.C -O3 -S 

yields to:

_Z4testv:
.LFB1264:
        .cfi_startproc
        movq    stack(%rip), %rcx
        movq    stack+8(%rip), %rax
        jmp     .L5
        .p2align 4,,10
        .p2align 3
.L6:
        movl    -4(%rax), %edx
        subq    $8, %rax
        movq    %rax, stack+8(%rip)
        testl   %edx, %edx
        jne     .L4
.L5:
        cmpq    %rax, %rcx
        jne     .L6
.L4:
        ret

We really should order the basic blocks putting cmpq before L6 saving a jump.
Moreover clang does

        .p2align        4, 0x90
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
        cmpq    %rax, %rcx
        je      .LBB1_3
# %bb.2:                                #   in Loop: Header=BB1_1 Depth=1
        cmpl    $0, -4(%rcx)
        leaq    -8(%rcx), %rcx
        movq    %rcx, stack+8(%rip)
        je      .LBB1_1
.LBB1_3:
        retq

saving an instruction. Why we do not move stack+8 updating out of the loop?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
@ 2023-05-13 22:32 ` pinskia at gcc dot gnu.org
  2023-05-13 22:40 ` pinskia at gcc dot gnu.org
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-13 22:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually why didn't we copy the loop header in the first place?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
  2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
@ 2023-05-13 22:40 ` pinskia at gcc dot gnu.org
  2023-05-14  5:57 ` amonakov at gcc dot gnu.org
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-13 22:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #0)
> saving an instruction. Why we do not move stack+8 updating out of the loop?

Maybe because of a clobber:
  cur$second_5 = MEM[(const struct pairD.26349 &)_7 +
18446744073709551608].secondD.27577;
  # PT = nonlocal escaped 
  _4 = _7 + 18446744073709551608;
  # .MEM_9 = VDEF <.MEM_1>
  stackD.26352.D.27437._M_implD.26667.D.26744._M_finishD.26670 = _4;
  # .MEM_10 = VDEF <.MEM_9>
  MEM[(struct pairD.26349 *)_7 + -8B] ={v} {CLOBBER};

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
  2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
  2023-05-13 22:40 ` pinskia at gcc dot gnu.org
@ 2023-05-14  5:57 ` amonakov at gcc dot gnu.org
  2023-05-14  9:58 ` hubicka at ucw dot cz
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-14  5:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Rather, because store-motion out of a loop that might iterate zero times would
create a data race.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-05-14  5:57 ` amonakov at gcc dot gnu.org
@ 2023-05-14  9:58 ` hubicka at ucw dot cz
  2023-05-14 10:01 ` hubicka at ucw dot cz
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-05-14  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> ---
> Rather, because store-motion out of a loop that might iterate zero times would
> create a data race.
Good point.  If we did copy loop headers all the way to the store the
problem will go away.  Also I assume we can still add a flag which is
set to true if loops iterates and then make store conditional...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-05-14  9:58 ` hubicka at ucw dot cz
@ 2023-05-14 10:01 ` hubicka at ucw dot cz
  2023-05-15  6:56 ` rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-05-14 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #5 from Jan Hubicka <hubicka at ucw dot cz> ---
> Actually why didn't we copy the loop header in the first place?
Because it is considered to be do-while loop already (thanks to
the in-loop conitional, do_while_loop_p is happy).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-05-14 10:01 ` hubicka at ucw dot cz
@ 2023-05-15  6:56 ` rguenth at gcc dot gnu.org
  2023-05-17 14:53 ` hubicka at gcc dot gnu.org
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-15  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-05-15
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-05-15  6:56 ` rguenth at gcc dot gnu.org
@ 2023-05-17 14:53 ` hubicka at gcc dot gnu.org
  2023-05-17 20:36 ` rguenth at gcc dot gnu.org
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-17 14:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |109811
                 CC|                            |mjambor at suse dot cz

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Here is slightly improved testcase which actually pushes into stack and
measures something. It test loops 1000 times and returns.  It also makes stack
to be local variable so race conditions are not a problem.

#include <vector>
typedef unsigned int uint32_t;
std::pair<uint32_t, uint32_t> pair;
void
test()
{
        std::vector<std::pair<uint32_t, uint32_t>> stack;
        stack.push_back (pair);
        while (!stack.empty()) {
                std::pair<uint32_t, uint32_t> cur = stack.back();
                stack.pop_back();
                if (!cur.first)
                {
                        cur.second++;
                        stack.push_back (cur);
                }
                if (cur.second > 10000)
                        break;
        }
}
int
main()
{
        for (int i = 0; i < 10000; i++)
          test();
}

Clang code is about twice as fast

jan@localhost:/tmp> clang++ -O2 tt.C  -fno-exceptions
jan@localhost:/tmp> g++ -O2 tt.C  -fno-exceptions -o a.out-gcc
jan@localhost:/tmp> perf stat ./a.out

 Performance counter stats for './a.out':

            434.24 msec task-clock:u                     #    0.997 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
               129      page-faults:u                    #  297.073 /sec        
     1,003,191,657      cycles:u                         #    2.310 GHz         
            68,927      stalled-cycles-frontend:u        #    0.01% frontend
cycles idle      
       800,792,619      stalled-cycles-backend:u         #   79.82% backend
cycles idle       
     1,904,682,933      instructions:u                   #    1.90  insn per
cycle            
                                                  #    0.42  stalled cycles per
insn   
       500,912,196      branches:u                       #    1.154 G/sec       
            23,144      branch-misses:u                  #    0.00% of all
branches           

       0.435340389 seconds time elapsed

       0.431409000 seconds user
       0.003994000 seconds sys


jan@localhost:/tmp> perf stat ./a.out-gcc

 Performance counter stats for './a.out-gcc':

          1,197.28 msec task-clock:u                     #    0.999 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
               131      page-faults:u                    #  109.415 /sec        
     2,903,995,656      cycles:u                         #    2.425 GHz         
            86,204      stalled-cycles-frontend:u        #    0.00% frontend
cycles idle      
     2,690,907,052      stalled-cycles-backend:u         #   92.66% backend
cycles idle       
     2,005,212,311      instructions:u                   #    0.69  insn per
cycle            
                                                  #    1.34  stalled cycles per
insn   
       401,007,320      branches:u                       #  334.932 M/sec       
            23,290      branch-misses:u                  #    0.01% of all
branches           

       1.198388186 seconds time elapsed

       1.198450000 seconds user
       0.000000000 seconds sys


The problem seems to be, like in first example, that we keep updating in-memory
stack in the main loop.

.L39:
        movl    12(%rsp), %ebx
.L30:
        movq    16(%rsp), %rax
        cmpl    $10000, %ebx
        ja      .L33
.L40:
        movq    24(%rsp), %rdi
        cmpq    %rdi, %rax
        je      .L28
.L34:
        movq    -8(%rdi), %rax
        leaq    -8(%rdi), %rsi
        movq    %rsi, 24(%rsp)
        movq    %rax, 8(%rsp)
        testl   %eax, %eax
        jne     .L39

While clang does:

.LBB0_1:                                #   in Loop: Header=BB0_4 Depth=1
        movq    %rax, %r14
.LBB0_2:                                #   in Loop: Header=BB0_4 Depth=1
        movq    %rbx, %r12
        movq    %r12, %rbx
        cmpl    $10001, %r13d                   # imm = 0x2711
        jae     .LBB0_27
.LBB0_4:                                # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_16 Depth 2
                                        #     Child Loop BB0_21 Depth 2
        cmpq    %r14, %rbx
        je      .LBB0_26
# %bb.5:                                #   in Loop: Header=BB0_4 Depth=1
        leaq    -8(%r14), %rax
        movq    -8(%r14), %rcx
        movq    %rcx, %r13
        shrq    $32, %r13
        testl   %ecx, %ecx
        jne     .LBB0_1


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
[Bug 109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-05-17 14:53 ` hubicka at gcc dot gnu.org
@ 2023-05-17 20:36 ` rguenth at gcc dot gnu.org
  2023-05-18  9:35 ` hubicka at gcc dot gnu.org
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-17 20:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is nothing to sink really, loop header copying introduces a PHI and
there's not partial redundancies but only partial-partial and those are not
obvious to CSE because of the introduced PHI.

I believe we have to teach SRA to decompose 'cur' and maybe also 'stack',
there's no scalar optimization going to do it also because we have aggregate
copies involved.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-05-17 20:36 ` rguenth at gcc dot gnu.org
@ 2023-05-18  9:35 ` hubicka at gcc dot gnu.org
  2023-05-18 11:54 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-18  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We can only SRA if the address is non-escaping.  Clang does not seem to need it
to optimize better:

jan@localhost:~> cat t.c
extern void q(int *);
__attribute__ ((noinline))
void
test()
{
        for (int a = 0; a < 1000;a++)
                if (!(a%100))
                        q(&a);
}
int
main()
{
        for (int a = 0; a < 1000000;a++)
                test ();
}
jan@localhost:~> cat t2.c
void q(int *a)
{
}
jan@localhost:~> gcc -O2 t.c t2.c ; perf stat ./a.out

 Performance counter stats for './a.out':

          2,916.73 msec task-clock:u                     #    0.999 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
                52      page-faults:u                    #   17.828 /sec        
     8,344,719,833      cycles:u                         #    2.861 GHz         
        13,561,375      stalled-cycles-frontend:u        #    0.16% frontend
cycles idle      
     5,128,112,757      stalled-cycles-backend:u         #   61.45% backend
cycles idle       
    10,050,172,242      instructions:u                   #    1.20  insn per
cycle            
                                                  #    0.51  stalled cycles per
insn   
     2,034,043,082      branches:u                       #  697.370 M/sec       
        11,186,312      branch-misses:u                  #    0.55% of all
branches           

       2.918344737 seconds time elapsed

       2.917844000 seconds user
       0.000000000 seconds sys


jan@localhost:~> clang -O2 t.c t2.c ; perf stat ./a.out

 Performance counter stats for './a.out':

            664.40 msec task-clock:u                     #    0.999 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
                54      page-faults:u                    #   81.276 /sec        
     2,318,095,848      cycles:u                         #    3.489 GHz         
        10,417,694      stalled-cycles-frontend:u        #    0.45% frontend
cycles idle      
     1,057,731,301      stalled-cycles-backend:u         #   45.63% backend
cycles idle       
    10,062,172,840      instructions:u                   #    4.34  insn per
cycle            
                                                  #    0.11  stalled cycles per
insn   
     2,034,042,724      branches:u                       #    3.061 G/sec       
        10,003,620      branch-misses:u                  #    0.49% of all
branches           

       0.665267996 seconds time elapsed

       0.665247000 seconds user
       0.000000000 seconds sys


We do:

        jmp     .L3
        .p2align 4,,10
        .p2align 3
.L2:
        movl    12(%rsp), %eax
        addl    $1, %eax
        movl    %eax, 12(%rsp)
        cmpl    $999, %eax
        jg      .L7
.L3:
        imull   $-1030792151, %eax, %eax
        addl    $85899344, %eax
        rorl    $2, %eax
        cmpl    $42949672, %eax
        ja      .L2
        leaq    12(%rsp), %rdi
        call    q
        jmp     .L2

Which has stupid store-to-load dpendency in the internal loop. Clang keeps the
store but optimizes away the load:

        jmp     .LBB0_1
        .p2align        4, 0x90
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1
        leal    1(%rax), %ecx
        movl    %ecx, 12(%rsp)
        cmpl    $999, %eax                      # imm = 0x3E7
        movl    %ecx, %eax
        jge     .LBB0_4
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        imull   $-1030792151, %eax, %ecx        # imm = 0xC28F5C29
        addl    $85899344, %ecx                 # imm = 0x51EB850
        rorl    $2, %ecx
        cmpl    $42949672, %ecx                 # imm = 0x28F5C28
        ja      .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
        movq    %rbx, %rdi
        callq   q@PLT
        movl    12(%rsp), %eax
        jmp     .LBB0_3

Wonder what makes clang to think it needs @PLT though.
Why we do not consider the load as partially redundant with itself?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-05-18  9:35 ` hubicka at gcc dot gnu.org
@ 2023-05-18 11:54 ` rguenth at gcc dot gnu.org
  2023-05-18 13:00 ` hubicka at gcc dot gnu.org
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-18 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 55110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55110&action=edit
patch for the missed hoisting

For the testcase in comment#6 there is a missing code hoisting from PRE
which is caused by do_hoist_insertion doing

  /* A hoistable value must be in ANTIC_IN(block)
     but not in AVAIL_OUT(BLOCK).  */
  bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack);
  bitmap_and_compl (&hoistable_set.values,
                    &ANTIC_IN (block)->values, &AVAIL_OUT (block)->values);

but in reality we want to check ANTIC_OUT(block), not ANTIC_IN(block).
cur.second is killed by the aggregate assignment to cur at the beginning
of the block we should hoist to and that's reflected in ANTIC_IN.

The attached patch properly re-computes ANTIC_OUT and uses that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-05-18 11:54 ` rguenth at gcc dot gnu.org
@ 2023-05-18 13:00 ` hubicka at gcc dot gnu.org
  2023-05-23  9:57 ` cvs-commit at gcc dot gnu.org
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-18 13:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks. I tested the patch on jpegxl and it does not help there (I guess
becuase the redundancy there is partial). But it is cool we compile at least
the simplified testcase well.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-05-18 13:00 ` hubicka at gcc dot gnu.org
@ 2023-05-23  9:57 ` cvs-commit at gcc dot gnu.org
  2023-05-23 10:10 ` rguenth at gcc dot gnu.org
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-23  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e

commit r14-1127-g9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e
Author: Richard Biener <rguenther@suse.de>
Date:   Thu May 18 13:52:29 2023 +0200

    tree-optimization/109849 - missed code hoisting

    The following fixes code hoisting to properly consider ANTIC_OUT instead
    of ANTIC_IN.  That's a bit expensive to re-compute but since we no
    longer iterate we're doing this only once per BB which should be
    acceptable.  This avoids missing hoistings to the end of blocks where
    something in the block clobbers the hoisted value.

            PR tree-optimization/109849
            * tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT
            and use that to determine what to hoist.

            * gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-05-23  9:57 ` cvs-commit at gcc dot gnu.org
@ 2023-05-23 10:10 ` rguenth at gcc dot gnu.org
  2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this fixed the missing code hoisting - partial PRE is done with -O3 only.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-05-23 10:10 ` rguenth at gcc dot gnu.org
@ 2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
  2023-06-16 14:20 ` hubicka at gcc dot gnu.org
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-24 11:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5476de2618ffb77f3a52e59e2c9f10b018329689

commit r14-1161-g5476de2618ffb77f3a52e59e2c9f10b018329689
Author: Richard Biener <rguenther@suse.de>
Date:   Wed May 24 12:36:28 2023 +0200

    tree-optimization/109849 - fix fallout of PRE hoisting change

    The PR109849 fix made us no longer hoist some memory loads because
    of the expression set intersection.  We can still avoid to compute
    the union by simply taking the first sets expressions and leave
    the pruning of expressions with values not suitable for hoisting
    to sorted_array_from_bitmap_set.

            PR tree-optimization/109849
            * tree-ssa-pre.cc (do_hoist_insertion): Do not intersect
            expressions but take the first sets.

            * gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
@ 2023-06-16 14:20 ` hubicka at gcc dot gnu.org
  2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-16 14:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
One interesting situation is:
void std::vector<std::pair<unsigned int, unsigned int> >::push_back (struct
vector * const this, const struct value_type & __x)
{
  struct __normal_iterator D.27894;
  struct pair * _1;
  struct pair * _2;
  struct pair * _3;

  <bb 2> [local count: 1073741824]:
  _1 = this_6(D)->D.26707._M_impl.D.26014._M_finish;
  _2 = this_6(D)->D.26707._M_impl.D.26014._M_end_of_storage;
  if (_1 != _2)
    goto <bb 3>; [82.57%]
  else
    goto <bb 4>; [17.43%]

  <bb 3> [local count: 886588625]:
  *_1 = MEM[(const struct pair &)__x_7(D)];
  _3 = _1 + 8;
  this_6(D)->D.26707._M_impl.D.26014._M_finish = _3;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 187153200]:
  D.27894._M_current = _1;
  std::vector<std::pair<unsigned int, unsigned int> >::_M_realloc_insert<const
std::pair<unsigned int, unsigned int>&> (this_6(D), D.27894, __x_7(D));

  <bb 5> [local count: 1073741824]:
  return;

}
here we could do partial inlining and offline the call to _M_realloc_insert but
we fail to cut since _1 is already load:

Split point at BB 4
  header time: 9.302800 header size: 9
  split time: 2.440200 split size: 5
  bbs: 4
  SSA names to pass: 1, 9, 11
  Refused: need to pass non-param values 

It should be easy to insert code loading the parameter again in the split part.
We still hit the SRA limitation since this would be still escaping, but it is
another missed optimization on this simple testcase.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-06-16 14:20 ` hubicka at gcc dot gnu.org
@ 2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
  2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-18 16:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e

commit r14-1909-g5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e
Author: Honza <jh@ryzen3.suse.cz>
Date:   Sun Jun 18 18:58:26 2023 +0200

    Analyze SRA candidates in ipa-fnsummary

    this patch extends ipa-fnsummary to anticipate statements that will be
removed
    by SRA.  This is done by looking for calls passing addresses of automatic
    variables.  In function body we look for dereferences from pointers of such
    variables and mark them with new not_sra_candidate condition.

    This is just first step which is overly optimistic.  We do not try to prove
that
    given automatic variable will not be SRAed even after inlining.  We now
also
    optimistically assume that the transformation will always happen.  I will
restrict
    this in a followup patch, but I think it is useful to gether some data on
how
    much code is affected by this.

    This is motivated by PR109849 where we fail to fully inline push_back.
    The patch alone does not solve the problem even for -O3, but improves
    analysis in this case.

    gcc/ChangeLog:

            PR tree-optimization/109849
            * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new
parameter
            ES; handle ipa_predicate::not_sra_candidate.
            (evaluate_properties_for_edge): Pass es to
            evaluate_conditions_for_known_args.
            (ipa_fn_summary_t::duplicate): Handle sra candidates.
            (dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
            (load_or_store_of_ptr_parameter): New function.
            (points_to_possible_sra_candidate_p): New function.
            (analyze_function_body): Initialize
points_to_possible_sra_candidate;
            determine sra predicates.
            (estimate_ipcp_clone_size_and_time): Update call of
            evaluate_conditions_for_known_args.
            (remap_edge_params): Update points_to_possible_sra_candidate.
            (read_ipa_call_summary): Stream points_to_possible_sra_candidate
            (write_ipa_call_summary): Likewise.
            * ipa-predicate.cc (ipa_predicate::add_clause): Handle
not_sra_candidate.
            (dump_condition): Dump it.
            * ipa-predicate.h (struct inline_param_summary): Add
            points_to_possible_sra_candidate.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/109849
            * g++.dg/ipa/devirt-45.C: Update template.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
@ 2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
  2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-19 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b

commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
Author: Jan Hubicka <jh@suse.cz>
Date:   Mon Jun 19 18:28:17 2023 +0200

    optimize std::max early

    we currently produce very bad code on loops using std::vector as a stack,
since
    we fail to inline push_back which in turn prevents SRA and we fail to
optimize
    out some store-to-load pairs.

    I looked into why this function is not inlined and it is inlined by clang. 
We
    currently estimate it to 66 instructions and inline limits are 15 at -O2
and 30
    at -O3.  Clang has similar estimate, but still decides to inline at -O2.

    I looked into reason why the body is so large and one problem I spotted is
the
    way std::max is implemented by taking and returning reference to the
values.

      const T& max( const T& a, const T& b );

    This makes it necessary to store the values to memory and load them later
    and max is used by code computing new size of vector on resize.

    We optimize this to MAX_EXPR, but only during late optimizations.  I think
this
    is a common enough coding pattern and we ought to make this transparent to
    early opts and IPA.  The following is easist fix that simply adds phiprop
pass
    that turns the PHI of address values into PHI of values so later FRE can
    propagate values across memory, phiopt discover the MAX_EXPR pattern and
DSE
    remove the memory stores.

    gcc/ChangeLog:

            PR tree-optimization/109811
            PR tree-optimization/109849
            * passes.def: Add phiprop to early optimization passes.
            * tree-ssa-phiprop.cc: Allow clonning.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/109811
            PR tree-optimization/109849
            * gcc.dg/tree-ssa/phiprop-1.c: New test.
            * gcc.dg/tree-ssa/pr21463.c: Adjust template.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
@ 2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
  2023-06-28  9:47 ` cvs-commit at gcc dot gnu.org
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-26 16:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:c2ebccc97190a978a44e341516b488f02a78c598

commit r14-2101-gc2ebccc97190a978a44e341516b488f02a78c598
Author: Jan Hubicka <jh@suse.cz>
Date:   Mon Jun 26 18:29:39 2023 +0200

    Fix profile of forwarders produced by cd-dce

    compiling the testcase from PR109849 (which uses std:vector based stack to
    drive a loop) with profile feedbakc leads to profile mismatches introduced
by
    tree-ssa-dce.  This is the new code to produce unified forwarder blocks for
    PHIs.

    I am not including the testcase itself since
    checking it for Invalid sum is probably going to be too fragile and this
should
    show in our LNT testers. The patch however fixes the mismatch.

    Bootstrapped/regtested x86_64-linux and plan to commit it shortly.

    gcc/ChangeLog:

            PR tree-optimization/109849
            * tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix
profile
            count of newly constructed forwarder block.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
@ 2023-06-28  9:47 ` cvs-commit at gcc dot gnu.org
  2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-28  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:45c53768b6fa3d737ae818e31d3c50da62e0ad2b

commit r14-2157-g45c53768b6fa3d737ae818e31d3c50da62e0ad2b
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Jun 28 11:45:15 2023 +0200

    Add cold attribute to throw wrappers and terminate

            PR middle-end/109849
            * include/bits/c++config (std::__terminate): Mark cold.
            * include/bits/functexcept.h: Mark everything as cold.
            * libsupc++/exception: Mark terminate and unexpected as cold.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-06-28  9:47 ` cvs-commit at gcc dot gnu.org
@ 2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
  2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-29 20:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:9dc18fca431626404b0692c689a2e103666e7adb

commit r14-2202-g9dc18fca431626404b0692c689a2e103666e7adb
Author: Jan Hubicka <jh@suse.cz>
Date:   Thu Jun 29 22:45:37 2023 +0200

    Compute ipa-predicates for conditionals involving __builtin_expect_p

    std::vector allocator looks as follows:

    __attribute__((nodiscard))
    struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int>
>::allocate (struct __new_allocator * const this, size_type __n, const void *
D.27753)
    {
      bool _1;
      long int _2;
      long int _3;
      long unsigned int _5;
      struct pair * _9;

      <bb 2> [local count: 1073741824]:
      _1 = __n_7(D) > 1152921504606846975;
      _2 = (long int) _1;
      _3 = __builtin_expect (_2, 0);
      if (_3 != 0)
        goto <bb 3>; [10.00%]
      else
        goto <bb 6>; [90.00%]

      <bb 3> [local count: 107374184]:
      if (__n_7(D) > 2305843009213693951)
        goto <bb 4>; [50.00%]
      else
        goto <bb 5>; [50.00%]

      <bb 4> [local count: 53687092]:
      std::__throw_bad_array_new_length ();

      <bb 5> [local count: 53687092]:
      std::__throw_bad_alloc ();

      <bb 6> [local count: 966367641]:
      _5 = __n_7(D) * 8;
      _9 = operator new (_5);
      return _9;
    }

    So there is check for allocated block size being greater than max_size
which is
    wrapper in __builtin_expect.  This makes ipa-fnsummary to give up analyzing
    predicates and it will miss the fact that the two different calls to
__throw
    will be optimized out if __n is larady smaller than 1152921504606846975
which
    it is after _M_check_len.

    This patch extends ipa-fnsummary to understand functions that return their
    parameter.

    gcc/ChangeLog:

            PR tree-optimization/109849
            * ipa-fnsummary.cc (decompose_param_expr): Skip
            functions returning its parameter.
            (set_cond_stmt_execution_predicate): Return early
            if predicate was constructed.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/109849
            * gcc.dg/ipa/pr109849.c: New test.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
@ 2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
  2023-11-19 15:25 ` hubicka at gcc dot gnu.org
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30 14:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:eab57b825bcc350e9ff44eb2fa739a80199d9bb1

commit r14-2219-geab57b825bcc350e9ff44eb2fa739a80199d9bb1
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Jun 30 16:27:27 2023 +0200

    Fix handling of __builtin_expect_with_probability and improve first-match
heuristics

    While looking into the std::vector _M_realloc_insert codegen I noticed that
    call of __throw_bad_alloc is predicted with 10% probability. This is
because
    the conditional guarding it has __builtin_expect (cond, 0) on it.  This
    incorrectly takes precedence over more reliable heuristics predicting that
call
    to cold noreturn is likely not going to happen.

    So I reordered the predictors so __builtin_expect_with_probability comes
first
    after predictors that never makes a mistake (so user can use it to always
    specify the outcome by hand).  I also downgraded malloc predictor since I
do
    think user-defined malloc functions & new operators may behave funny ways
and
    moved usual __builtin_expect after the noreturn cold predictor.

    This triggered latent bug in expr_expected_value_1 where

              if (*predictor < predictor2)
                *predictor = predictor2;

    should be:

              if (predictor2 < *predictor)
                *predictor = predictor2;

    which eventually triggered an ICE on combining heuristics.  This made me
notice
    that we can do slightly better while combining expected values in case only
    one of the parameters (such as in a*b when we expect a==0) can determine
    overall result.

    Note that the new code may pick weaker heuristics in case that both values
are
    predicted.  Not sure if this scenario is worth the extra CPU time: there is
    not correct way to combine the probabilities anyway since we do not know if
    the predictions are independent, so I think users should not rely on it.

    Fixing this issue uncovered another problem.  In 2018 Martin Liska added
    code predicting that MALLOC returns non-NULL but instead of that he
predicts
    that it returns true (boolean 1).  This sort of works for testcase testing
     malloc (10) != NULL
    but, for example, we will predict
     malloc (10) == malloc (10)
    as true, which is not right and such comparsion may happen in real code

    I think proper way is to update expr_expected_value_1 to work with value
    ranges, but that needs greater surgery so I decided to postpone this and
    only add FIXME and fill PR110499.

    gcc/ChangeLog:

            PR middle-end/109849
            * predict.cc (estimate_bb_frequencies): Turn to static function.
            (expr_expected_value_1): Fix handling of binary expressions with
            predicted values.
            * predict.def (PRED_MALLOC_NONNULL): Move later in the priority
queue.
            (PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the
priority
            queue.
            * predict.h (estimate_bb_frequencies): No longer declare it.

    gcc/testsuite/ChangeLog:

            PR middle-end/109849
            * gcc.dg/predict-18.c: Improve testcase.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
@ 2023-11-19 15:25 ` hubicka at gcc dot gnu.org
  2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-19 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
gets us closer to inlining _M_realloc_insert at -O3 (3 insns away)

Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636935.html
reduces the expense when _M_realloc_insert is not inlined at -O2 (where I think
we should not inline it, unlike for clang)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-11-19 15:25 ` hubicka at gcc dot gnu.org
@ 2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
  2023-11-21 15:12 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-21 14:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:1d82fc2e6824bf83159389729c31a942f7b91b04

commit r14-5679-g1d82fc2e6824bf83159389729c31a942f7b91b04
Author: Jan Hubicka <jh@suse.cz>
Date:   Tue Nov 21 15:17:16 2023 +0100

    optimize std::vector::push_back

    this patch speeds up the push_back at -O3 significantly by making the
    reallocation to be inlined by default.  _M_realloc_insert is general
    insertion that takes iterator pointing to location where the value
    should be inserted.  As such it contains code to move other entries around
    that is quite large.

    Since appending to the end of array is common operation, I think we should
    have specialized code for that.  Sadly it is really hard to work out this
    from IPA passes, since we basically care whether the iterator points to
    the same place as the end pointer, which are both passed by reference.
    This is inter-procedural value numbering that is quite out of reach.

    I also added extra check making it clear that the new length of the vector
    is non-zero.  This saves extra conditionals.  Again it is quite hard case
    since _M_check_len seem to be able to return 0 if its parameter is 0.
    This never happens here, but we are not able to propagate this early nor
    at IPA stage.

    libstdc++-v3/ChangeLog:

            PR libstdc++/110287
            PR middle-end/109811
            PR middle-end/109849
            * include/bits/stl_vector.h (_M_realloc_append): New member
function.
            (push_back): Use it.
            * include/bits/vector.tcc: (emplace_back): Use it.
            (_M_realloc_insert): Let compiler know that new vector size is
non-zero.
            (_M_realloc_append): New member function.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
@ 2023-11-21 15:12 ` hubicka at gcc dot gnu.org
  2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-21 15:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug 109849 depends on bug 110377, which changed state.

Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2023-11-21 15:12 ` hubicka at gcc dot gnu.org
@ 2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
  2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-24 16:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363

commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
Author: Martin Jambor <mjambor@suse.cz>
Date:   Fri Nov 24 17:32:35 2023 +0100

    sra: SRA of non-escaped aggregates passed by reference to calls

    PR109849 shows that a loop that heavily pushes and pops from a stack
    implemented by a C++ std::vec results in slow code, mainly because the
    vector structure is not split by SRA and so we end up in many loads
    and stores into it.  This is because it is passed by reference
    to (re)allocation methods and so needs to live in memory, even though
    it does not escape from them and so we could SRA it if we
    re-constructed it before the call and then separated it to distinct
    replacements afterwards.

    This patch does exactly that, first relaxing the selection of
    candidates to also include those which are addressable but do not
    escape and then adding code to deal with the calls.  The
    micro-benchmark that is also the (scan-dump) testcase in this patch
    runs twice as fast with it than with current trunk.  Honza measured
    its effect on the libjxl benchmark and it almost closes the
    performance gap between Clang and GCC while not requiring excessive
    inlining and thus code growth.

    The patch disallows creation of replacements for such aggregates which
    are also accessed with a precision smaller than their size because I
    have observed that this led to excessive zero-extending of data
    leading to slow-downs of perlbench (on some CPUs).  Apart from this
    case I have not noticed any regressions, at least not so far.

    Gimple call argument flags can tell if an argument is unused (and then
    we do not need to generate any statements for it) or if it is not
    written to and then we do not need to generate statements loading
    replacements from the original aggregate after the call statement.
    Unfortunately, we cannot symmetrically use flags that an aggregate is
    not read because to avoid re-constructing the aggregate before the
    call because flags don't tell which what parts of aggregates were not
    written to, so we load all replacements, and so all need to have the
    correct value before the call.

    This version of the patch also takes care to avoid attempts to modify
    abnormal edges, something which was missing in the previosu version.

    gcc/ChangeLog:

    2023-11-23  Martin Jambor  <mjambor@suse.cz>

            PR middle-end/109849
            * tree-sra.cc (passed_by_ref_in_call): New.
            (sra_initialize): Allocate passed_by_ref_in_call.
            (sra_deinitialize): Free passed_by_ref_in_call.
            (create_access): Add decl pool candidates only if they are not
            already candidates.
            (build_access_from_expr_1): Bail out on ADDR_EXPRs.
            (build_access_from_call_arg): New function.
            (asm_visit_addr): Rename to scan_visit_addr, change the
            disqualification dump message.
            (scan_function): Check taken addresses for all non-call statements,
            including phi nodes.  Process all call arguments, including the
static
            chain, build_access_from_call_arg.
            (maybe_add_sra_candidate): Relax need_to_live_in_memory check to
allow
            non-escaped local variables.
            (sort_and_splice_var_accesses): Disallow smaller-than-precision
            replacements for aggregates passed by reference to functions.
            (sra_modify_expr): Use a separate stmt iterator for adding
satements
            before the processed statement and after it.
            (enum out_edge_check): New type.
            (abnormal_edge_after_stmt_p): New function.
            (sra_modify_call_arg): New function.
            (sra_modify_assign): Adjust calls to sra_modify_expr.
            (sra_modify_function_body): Likewise, use sra_modify_call_arg to
            process call arguments, including the static chain.

    gcc/testsuite/ChangeLog:

    2023-11-23  Martin Jambor  <mjambor@suse.cz>

            PR middle-end/109849
            * g++.dg/tree-ssa/pr109849.C: New test.
            * g++.dg/tree-ssa/sra-eh-1.C: Likewise.
            * gcc.dg/tree-ssa/pr109849.c: Likewise.
            * gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
            * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
@ 2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
  2023-11-24 17:00 ` jamborm at gcc dot gnu.org
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-24 17:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:c2dcfb6ba6e9a84a16e63ae73a822ae2a843170c

commit r14-5832-gc2dcfb6ba6e9a84a16e63ae73a822ae2a843170c
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Nov 24 17:59:44 2023 +0100

    Use memcpy instead of memmove in __relocate_a_1

    __relocate_a_1 is used to copy data after vector reizing.  This can be done
by memcpy
    rather than memmove.

    libstdc++-v3/ChangeLog:

            PR middle-end/109849
            * include/bits/stl_uninitialized.h (__relocate_a_1): Use memcpy
instead
            of memmove.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
@ 2023-11-24 17:00 ` jamborm at gcc dot gnu.org
  2023-11-27 14:39 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-24 17:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #25 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> There is nothing to sink really, loop header copying introduces a PHI and
> there's not partial redundancies but only partial-partial and those are not
> obvious to CSE because of the introduced PHI.
> 

SRA now decomposes stack.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2023-11-24 17:00 ` jamborm at gcc dot gnu.org
@ 2023-11-27 14:39 ` rguenth at gcc dot gnu.org
  2023-11-28  9:33 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-27 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug 109849 depends on bug 112653, which changed state.

Bug 112653 Summary: PTA should handle correctly escape information of values returned by a function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2023-11-27 14:39 ` rguenth at gcc dot gnu.org
@ 2023-11-28  9:33 ` redi at gcc dot gnu.org
  2023-11-28 10:32 ` jamborm at gcc dot gnu.org
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #26 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to GCC Commits from comment #23)
> https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
> 
> commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> Author: Martin Jambor <mjambor@suse.cz>
> Date:   Fri Nov 24 17:32:35 2023 +0100
> 
>     sra: SRA of non-escaped aggregates passed by reference to calls
>     

I'm seeing a large number of libstdc++ testsuite failures, bisected to this
patch.

For example:

make check -C x86_64-pc-linux-gnu/libstdc++-v3
RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc
--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"

The full list of FAILs is:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)
FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-1.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-2.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-3.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-4.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-big.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc  -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc  -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc  -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc  -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc  -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc  -std=gnu++26 execution
test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/char/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/char/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc  -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++11 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++14 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++17 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++20 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++23 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc  -std=gnu++26 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++11 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++14 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++17 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++20 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++23 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc  -std=gnu++26 execution
test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++11 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++14 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++17 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++20 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++23 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/operations/canonical.cc  -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++17 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++20 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++23 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc  -std=gnu++26 execution
test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++17 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++20 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++23 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc  -std=gnu++26 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++11 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++14 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++17 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++20 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++23 execution test
FAIL: 28_regex/basic_regex/106607.cc  -std=gnu++26 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++11 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++14 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++17 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++20 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++23 execution test
FAIL: 30_threads/thread/id/output.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++11
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++14
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++17
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++20
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++23
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc  -std=gnu++26
execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/iterators/pop.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/canonical.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/copy.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/create_directory.cc  -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc  -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/exists.cc  -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/file_size.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc  -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/permissions.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc  -std=gnu++26
execution test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++11 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++14 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++17 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++20 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++23 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc  -std=gnu++26 execution
test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++14 execution
test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++17 execution
test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++20 execution
test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++23 execution
test
FAIL: experimental/net/internet/address/v6/members.cc  -std=gnu++26 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++14 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++17 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++20 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++23 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc  -std=gnu++26 execution
test
FAIL: ext/pb_ds/regression/hash_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++14
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++17
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++20
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++23
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc  -std=gnu++26
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++14
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++17
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++20
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++23
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc  -std=gnu++26
execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++11 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++14 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++17 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++20 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++23 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc  -std=gnu++26 execution
test
FAIL: ext/pb_ds/regression/tree_map_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc  -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc  -std=gnu++26 execution test
FAIL: std/format/functions/format.cc  -std=gnu++20 execution test
FAIL: std/format/functions/format.cc  -std=gnu++23 execution test
FAIL: std/format/functions/format.cc  -std=gnu++26 execution test

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2023-11-28  9:33 ` redi at gcc dot gnu.org
@ 2023-11-28 10:32 ` jamborm at gcc dot gnu.org
  2023-11-28 12:41 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-28 10:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #27 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #26)
> (In reply to GCC Commits from comment #23)
> > https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
> > 
> > commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> > Author: Martin Jambor <mjambor@suse.cz>
> > Date:   Fri Nov 24 17:32:35 2023 +0100
> > 
> >     sra: SRA of non-escaped aggregates passed by reference to calls
> >     
> 
> I'm seeing a large number of libstdc++ testsuite failures, bisected to this
> patch.
> 
> For example:
> 
> make check -C x86_64-pc-linux-gnu/libstdc++-v3
> RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc
> --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"
> 

Unfortunately I cannot reproduce this, the above (on pristine master
commit 006e90e1344 on an x86_64-linux) results in:

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Running
/home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 (test for
excess errors)
PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test

Can you please try if
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
fixes this?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2023-11-28 10:32 ` jamborm at gcc dot gnu.org
@ 2023-11-28 12:41 ` redi at gcc dot gnu.org
  2023-11-28 13:29 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #28 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #27)
> Unfortunately I cannot reproduce this, the above (on pristine master
> commit 006e90e1344 on an x86_64-linux) results in:
> 
> Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
> Running
> /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.
> exp ...
> PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 (test for
> excess errors)
> PASS: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution
> test

Oops, sorry, that particular FAIL needs either
--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG which then
makes it fail for all -std modes:

Schedule of variations:
    unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++26 execution test

Or just set GLIBCXX_TESTSUITE_STDS="17,20" in the env before running the test:

Schedule of variations:
    unix/-D_GLIBCXX_USE_CXX11_ABI=0

Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
FAIL: 21_strings/basic_string/operators/char/1.cc  -std=gnu++20 execution test

It seems I picked a bad example to give, which requires additional options to
FAIL.

Many of the other FAILs do not require _GLIBCXX_DEBUG or -std=gnu++20 to FAIL,
but the -D_GLIBCXX_USE_CXX11_ABI=0 option is necessary, at least for all the
ones I inspected. That option isn't used by default, but I run the full
testsuite with that several times a day, and with
GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26.

> Can you please try if
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> fixes this?

Testing now ...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2023-11-28 12:41 ` redi at gcc dot gnu.org
@ 2023-11-28 13:29 ` redi at gcc dot gnu.org
  2023-11-28 15:29 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 13:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #29 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #28)
> (In reply to Martin Jambor from comment #27)
> > Can you please try if
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> > fixes this?
> 
> Testing now ...

Yes, this fixes 21_strings/basic_string/operators/char/* and
28_regex/basic_regex/106607.cc

Running the full testsuite now...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2023-11-28 13:29 ` redi at gcc dot gnu.org
@ 2023-11-28 15:29 ` redi at gcc dot gnu.org
  2023-11-28 22:21 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #30 from Jonathan Wakely <redi at gcc dot gnu.org> ---
So far the only FAIL is still see is:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)

I'm not sure if this is caused by your patch or one of Honza's. The test only
fails with GLIBCXX_TESTSUITE_STDS=98 defined.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2023-11-28 15:29 ` redi at gcc dot gnu.org
@ 2023-11-28 22:21 ` redi at gcc dot gnu.org
  2023-11-29 12:27 ` hubicka at ucw dot cz
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 22:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #31 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for that
remaining FAIL as well (and it isn't fixed by the new patch).

It introduced a new warning which wasn't present before:

/tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
writing between 2 and 9223372036854775806 bytes into a region of size 0
overflows the destination [-Wstringop-overflow=]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2023-11-28 22:21 ` redi at gcc dot gnu.org
@ 2023-11-29 12:27 ` hubicka at ucw dot cz
  2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
  2024-01-03 17:41 ` jamborm at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-11-29 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #32 from Jan Hubicka <hubicka at ucw dot cz> ---
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> writing between 2 and 9223372036854775806 bytes into a region of size 0
> overflows the destination [-Wstringop-overflow=]

It warns on:

  template<bool _IsMove>
    struct __copy_move<_IsMove, true, random_access_iterator_tag>
    {
      template<typename _Tp, typename _Up>
        _GLIBCXX20_CONSTEXPR
        static _Up*
        __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
        {
          const ptrdiff_t _Num = __last - __first;
          if (__builtin_expect(_Num > 1, true))
            __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
          else if (_Num == 1)
            std::__copy_move<_IsMove, false, random_access_iterator_tag>::
              __assign_one(__result, __first);
          return __result + _Num;
        }
    };

It is likely false positive on a code path that never happens in real
code, but we now optimize it better.

Does it show an inline path?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2023-11-29 12:27 ` hubicka at ucw dot cz
@ 2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
  2024-01-03 17:41 ` jamborm at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-29 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #33 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:302461ad9a04d82fee904bddac69811d13d5bb6a

commit r14-5971-g302461ad9a04d82fee904bddac69811d13d5bb6a
Author: Martin Jambor <mjambor@suse.cz>
Date:   Wed Nov 29 16:24:33 2023 +0100

    tree-sra: Avoid returns of references to SRA candidates

    The enhancement to address PR 109849 contained an importsnt thinko,
    and that any reference that is passed to a function and does not
    escape, must also not happen to be aliased by the return value of the
    function.  This has quickly transpired as bugs PR 112711 and PR
    112721.

    Just as IPA-modref does a good enough job to allow us to rely on the
    escaped set of variables, it sems to be doing well also on updating
    EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
    exactly the situation we need to avoid.  Of course, if a call
    statement ignores any returned value, we also do not need to check the
    flag.

    Hopefully this does not pessimize things too much, I have verified
    that the PR 109849 testcae remains quick and so should also the
    benchmark it is derived from.

    gcc/ChangeLog:

    2023-11-27  Martin Jambor  <mjambor@suse.cz>

            PR tree-optimization/112711
            PR tree-optimization/112721
            * tree-sra.cc (build_access_from_call_arg): New parameter
            CAN_BE_RETURNED, disqualify any candidate passed by reference if it
is
            true.  Adjust leading comment.
            (scan_function): Pass appropriate value to CAN_BE_RETURNED of
            build_access_from_call_arg.

    gcc/testsuite/ChangeLog:

    2023-11-29  Martin Jambor  <mjambor@suse.cz>

            PR tree-optimization/112711
            PR tree-optimization/112721
            * g++.dg/tree-ssa/pr112711.C: New test.
            * gcc.dg/tree-ssa/pr112721.c: Likewise.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Bug middle-end/109849] suboptimal code for vector walking loop
  2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
@ 2024-01-03 17:41 ` jamborm at gcc dot gnu.org
  36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-01-03 17:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #34 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #32)
> > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> > writing between 2 and 9223372036854775806 bytes into a region of size 0
> > overflows the destination [-Wstringop-overflow=]
> 
> It warns on:
> 
>   template<bool _IsMove>
>     struct __copy_move<_IsMove, true, random_access_iterator_tag>
>     {
>       template<typename _Tp, typename _Up>
>         _GLIBCXX20_CONSTEXPR
>         static _Up*
>         __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
>         {
>           const ptrdiff_t _Num = __last - __first;
>           if (__builtin_expect(_Num > 1, true))
>             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
>           else if (_Num == 1)
>             std::__copy_move<_IsMove, false, random_access_iterator_tag>::
>               __assign_one(__result, __first);
>           return __result + _Num;
>         }
>     };
> 
> It is likely false positive on a code path that never happens in real
> code, but we now optimize it better.
> 

We end up with:
  <bb 16> [local count: 64736968]:
  __builtin_memcpy (1B, v$_M_impl$D10203$_M_start_448, _354);

IIRC the statement variant is created by jump threading (specifically
thread2).

Moreover, if I understand the comment in compute_objsize_r about the
INTEGER_CST case correctly, small integers are considered potential
"result of erroneous null pointer addition/subtraction."  So not
warning on a constant 1 destination does not seem to be desirable.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-01-03 17:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
2023-05-13 22:40 ` pinskia at gcc dot gnu.org
2023-05-14  5:57 ` amonakov at gcc dot gnu.org
2023-05-14  9:58 ` hubicka at ucw dot cz
2023-05-14 10:01 ` hubicka at ucw dot cz
2023-05-15  6:56 ` rguenth at gcc dot gnu.org
2023-05-17 14:53 ` hubicka at gcc dot gnu.org
2023-05-17 20:36 ` rguenth at gcc dot gnu.org
2023-05-18  9:35 ` hubicka at gcc dot gnu.org
2023-05-18 11:54 ` rguenth at gcc dot gnu.org
2023-05-18 13:00 ` hubicka at gcc dot gnu.org
2023-05-23  9:57 ` cvs-commit at gcc dot gnu.org
2023-05-23 10:10 ` rguenth at gcc dot gnu.org
2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
2023-06-16 14:20 ` hubicka at gcc dot gnu.org
2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
2023-06-28  9:47 ` cvs-commit at gcc dot gnu.org
2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
2023-11-19 15:25 ` hubicka at gcc dot gnu.org
2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
2023-11-21 15:12 ` hubicka at gcc dot gnu.org
2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` jamborm at gcc dot gnu.org
2023-11-27 14:39 ` rguenth at gcc dot gnu.org
2023-11-28  9:33 ` redi at gcc dot gnu.org
2023-11-28 10:32 ` jamborm at gcc dot gnu.org
2023-11-28 12:41 ` redi at gcc dot gnu.org
2023-11-28 13:29 ` redi at gcc dot gnu.org
2023-11-28 15:29 ` redi at gcc dot gnu.org
2023-11-28 22:21 ` redi at gcc dot gnu.org
2023-11-29 12:27 ` hubicka at ucw dot cz
2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
2024-01-03 17:41 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).