public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/109849] New: suboptimal code for vector walking loop
@ 2023-05-13 22:26 hubicka at gcc dot gnu.org
2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
` (36 more replies)
0 siblings, 37 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-13 22:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug ID: 109849
Summary: suboptimal code for vector walking loop
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jan@localhost:/tmp> cat t.C
#include <vector>
typedef unsigned int uint32_t;
std::vector<std::pair<uint32_t, uint32_t>> stack;
void
test()
{
while (!stack.empty()) {
std::pair<uint32_t, uint32_t> cur = stack.back();
stack.pop_back();
if (cur.second)
break;
}
}
jan@localhost:/tmp> gcc t.C -O3 -S
yields to:
_Z4testv:
.LFB1264:
.cfi_startproc
movq stack(%rip), %rcx
movq stack+8(%rip), %rax
jmp .L5
.p2align 4,,10
.p2align 3
.L6:
movl -4(%rax), %edx
subq $8, %rax
movq %rax, stack+8(%rip)
testl %edx, %edx
jne .L4
.L5:
cmpq %rax, %rcx
jne .L6
.L4:
ret
We really should order the basic blocks putting cmpq before L6 saving a jump.
Moreover clang does
.p2align 4, 0x90
.LBB1_1: # =>This Inner Loop Header: Depth=1
cmpq %rax, %rcx
je .LBB1_3
# %bb.2: # in Loop: Header=BB1_1 Depth=1
cmpl $0, -4(%rcx)
leaq -8(%rcx), %rcx
movq %rcx, stack+8(%rip)
je .LBB1_1
.LBB1_3:
retq
saving an instruction. Why we do not move stack+8 updating out of the loop?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
@ 2023-05-13 22:32 ` pinskia at gcc dot gnu.org
2023-05-13 22:40 ` pinskia at gcc dot gnu.org
` (35 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-13 22:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually why didn't we copy the loop header in the first place?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
@ 2023-05-13 22:40 ` pinskia at gcc dot gnu.org
2023-05-14 5:57 ` amonakov at gcc dot gnu.org
` (34 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-13 22:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #0)
> saving an instruction. Why we do not move stack+8 updating out of the loop?
Maybe because of a clobber:
cur$second_5 = MEM[(const struct pairD.26349 &)_7 +
18446744073709551608].secondD.27577;
# PT = nonlocal escaped
_4 = _7 + 18446744073709551608;
# .MEM_9 = VDEF <.MEM_1>
stackD.26352.D.27437._M_implD.26667.D.26744._M_finishD.26670 = _4;
# .MEM_10 = VDEF <.MEM_9>
MEM[(struct pairD.26349 *)_7 + -8B] ={v} {CLOBBER};
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
2023-05-13 22:40 ` pinskia at gcc dot gnu.org
@ 2023-05-14 5:57 ` amonakov at gcc dot gnu.org
2023-05-14 9:58 ` hubicka at ucw dot cz
` (33 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-14 5:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Rather, because store-motion out of a loop that might iterate zero times would
create a data race.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (2 preceding siblings ...)
2023-05-14 5:57 ` amonakov at gcc dot gnu.org
@ 2023-05-14 9:58 ` hubicka at ucw dot cz
2023-05-14 10:01 ` hubicka at ucw dot cz
` (32 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-05-14 9:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> ---
> Rather, because store-motion out of a loop that might iterate zero times would
> create a data race.
Good point. If we did copy loop headers all the way to the store the
problem will go away. Also I assume we can still add a flag which is
set to true if loops iterates and then make store conditional...
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (3 preceding siblings ...)
2023-05-14 9:58 ` hubicka at ucw dot cz
@ 2023-05-14 10:01 ` hubicka at ucw dot cz
2023-05-15 6:56 ` rguenth at gcc dot gnu.org
` (31 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-05-14 10:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #5 from Jan Hubicka <hubicka at ucw dot cz> ---
> Actually why didn't we copy the loop header in the first place?
Because it is considered to be do-while loop already (thanks to
the in-loop conitional, do_while_loop_p is happy).
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (4 preceding siblings ...)
2023-05-14 10:01 ` hubicka at ucw dot cz
@ 2023-05-15 6:56 ` rguenth at gcc dot gnu.org
2023-05-17 14:53 ` hubicka at gcc dot gnu.org
` (30 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-15 6:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-05-15
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (5 preceding siblings ...)
2023-05-15 6:56 ` rguenth at gcc dot gnu.org
@ 2023-05-17 14:53 ` hubicka at gcc dot gnu.org
2023-05-17 20:36 ` rguenth at gcc dot gnu.org
` (29 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-17 14:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |109811
CC| |mjambor at suse dot cz
--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Here is slightly improved testcase which actually pushes into stack and
measures something. It test loops 1000 times and returns. It also makes stack
to be local variable so race conditions are not a problem.
#include <vector>
typedef unsigned int uint32_t;
std::pair<uint32_t, uint32_t> pair;
void
test()
{
std::vector<std::pair<uint32_t, uint32_t>> stack;
stack.push_back (pair);
while (!stack.empty()) {
std::pair<uint32_t, uint32_t> cur = stack.back();
stack.pop_back();
if (!cur.first)
{
cur.second++;
stack.push_back (cur);
}
if (cur.second > 10000)
break;
}
}
int
main()
{
for (int i = 0; i < 10000; i++)
test();
}
Clang code is about twice as fast
jan@localhost:/tmp> clang++ -O2 tt.C -fno-exceptions
jan@localhost:/tmp> g++ -O2 tt.C -fno-exceptions -o a.out-gcc
jan@localhost:/tmp> perf stat ./a.out
Performance counter stats for './a.out':
434.24 msec task-clock:u # 0.997 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
129 page-faults:u # 297.073 /sec
1,003,191,657 cycles:u # 2.310 GHz
68,927 stalled-cycles-frontend:u # 0.01% frontend
cycles idle
800,792,619 stalled-cycles-backend:u # 79.82% backend
cycles idle
1,904,682,933 instructions:u # 1.90 insn per
cycle
# 0.42 stalled cycles per
insn
500,912,196 branches:u # 1.154 G/sec
23,144 branch-misses:u # 0.00% of all
branches
0.435340389 seconds time elapsed
0.431409000 seconds user
0.003994000 seconds sys
jan@localhost:/tmp> perf stat ./a.out-gcc
Performance counter stats for './a.out-gcc':
1,197.28 msec task-clock:u # 0.999 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
131 page-faults:u # 109.415 /sec
2,903,995,656 cycles:u # 2.425 GHz
86,204 stalled-cycles-frontend:u # 0.00% frontend
cycles idle
2,690,907,052 stalled-cycles-backend:u # 92.66% backend
cycles idle
2,005,212,311 instructions:u # 0.69 insn per
cycle
# 1.34 stalled cycles per
insn
401,007,320 branches:u # 334.932 M/sec
23,290 branch-misses:u # 0.01% of all
branches
1.198388186 seconds time elapsed
1.198450000 seconds user
0.000000000 seconds sys
The problem seems to be, like in first example, that we keep updating in-memory
stack in the main loop.
.L39:
movl 12(%rsp), %ebx
.L30:
movq 16(%rsp), %rax
cmpl $10000, %ebx
ja .L33
.L40:
movq 24(%rsp), %rdi
cmpq %rdi, %rax
je .L28
.L34:
movq -8(%rdi), %rax
leaq -8(%rdi), %rsi
movq %rsi, 24(%rsp)
movq %rax, 8(%rsp)
testl %eax, %eax
jne .L39
While clang does:
.LBB0_1: # in Loop: Header=BB0_4 Depth=1
movq %rax, %r14
.LBB0_2: # in Loop: Header=BB0_4 Depth=1
movq %rbx, %r12
movq %r12, %rbx
cmpl $10001, %r13d # imm = 0x2711
jae .LBB0_27
.LBB0_4: # =>This Loop Header: Depth=1
# Child Loop BB0_16 Depth 2
# Child Loop BB0_21 Depth 2
cmpq %r14, %rbx
je .LBB0_26
# %bb.5: # in Loop: Header=BB0_4 Depth=1
leaq -8(%r14), %rax
movq -8(%r14), %rcx
movq %rcx, %r13
shrq $32, %r13
testl %ecx, %ecx
jne .LBB0_1
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
[Bug 109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (6 preceding siblings ...)
2023-05-17 14:53 ` hubicka at gcc dot gnu.org
@ 2023-05-17 20:36 ` rguenth at gcc dot gnu.org
2023-05-18 9:35 ` hubicka at gcc dot gnu.org
` (28 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-17 20:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
There is nothing to sink really, loop header copying introduces a PHI and
there's not partial redundancies but only partial-partial and those are not
obvious to CSE because of the introduced PHI.
I believe we have to teach SRA to decompose 'cur' and maybe also 'stack',
there's no scalar optimization going to do it also because we have aggregate
copies involved.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (7 preceding siblings ...)
2023-05-17 20:36 ` rguenth at gcc dot gnu.org
@ 2023-05-18 9:35 ` hubicka at gcc dot gnu.org
2023-05-18 11:54 ` rguenth at gcc dot gnu.org
` (27 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-18 9:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We can only SRA if the address is non-escaping. Clang does not seem to need it
to optimize better:
jan@localhost:~> cat t.c
extern void q(int *);
__attribute__ ((noinline))
void
test()
{
for (int a = 0; a < 1000;a++)
if (!(a%100))
q(&a);
}
int
main()
{
for (int a = 0; a < 1000000;a++)
test ();
}
jan@localhost:~> cat t2.c
void q(int *a)
{
}
jan@localhost:~> gcc -O2 t.c t2.c ; perf stat ./a.out
Performance counter stats for './a.out':
2,916.73 msec task-clock:u # 0.999 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
52 page-faults:u # 17.828 /sec
8,344,719,833 cycles:u # 2.861 GHz
13,561,375 stalled-cycles-frontend:u # 0.16% frontend
cycles idle
5,128,112,757 stalled-cycles-backend:u # 61.45% backend
cycles idle
10,050,172,242 instructions:u # 1.20 insn per
cycle
# 0.51 stalled cycles per
insn
2,034,043,082 branches:u # 697.370 M/sec
11,186,312 branch-misses:u # 0.55% of all
branches
2.918344737 seconds time elapsed
2.917844000 seconds user
0.000000000 seconds sys
jan@localhost:~> clang -O2 t.c t2.c ; perf stat ./a.out
Performance counter stats for './a.out':
664.40 msec task-clock:u # 0.999 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
54 page-faults:u # 81.276 /sec
2,318,095,848 cycles:u # 3.489 GHz
10,417,694 stalled-cycles-frontend:u # 0.45% frontend
cycles idle
1,057,731,301 stalled-cycles-backend:u # 45.63% backend
cycles idle
10,062,172,840 instructions:u # 4.34 insn per
cycle
# 0.11 stalled cycles per
insn
2,034,042,724 branches:u # 3.061 G/sec
10,003,620 branch-misses:u # 0.49% of all
branches
0.665267996 seconds time elapsed
0.665247000 seconds user
0.000000000 seconds sys
We do:
jmp .L3
.p2align 4,,10
.p2align 3
.L2:
movl 12(%rsp), %eax
addl $1, %eax
movl %eax, 12(%rsp)
cmpl $999, %eax
jg .L7
.L3:
imull $-1030792151, %eax, %eax
addl $85899344, %eax
rorl $2, %eax
cmpl $42949672, %eax
ja .L2
leaq 12(%rsp), %rdi
call q
jmp .L2
Which has stupid store-to-load dpendency in the internal loop. Clang keeps the
store but optimizes away the load:
jmp .LBB0_1
.p2align 4, 0x90
.LBB0_3: # in Loop: Header=BB0_1 Depth=1
leal 1(%rax), %ecx
movl %ecx, 12(%rsp)
cmpl $999, %eax # imm = 0x3E7
movl %ecx, %eax
jge .LBB0_4
.LBB0_1: # =>This Inner Loop Header: Depth=1
imull $-1030792151, %eax, %ecx # imm = 0xC28F5C29
addl $85899344, %ecx # imm = 0x51EB850
rorl $2, %ecx
cmpl $42949672, %ecx # imm = 0x28F5C28
ja .LBB0_3
# %bb.2: # in Loop: Header=BB0_1 Depth=1
movq %rbx, %rdi
callq q@PLT
movl 12(%rsp), %eax
jmp .LBB0_3
Wonder what makes clang to think it needs @PLT though.
Why we do not consider the load as partially redundant with itself?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (8 preceding siblings ...)
2023-05-18 9:35 ` hubicka at gcc dot gnu.org
@ 2023-05-18 11:54 ` rguenth at gcc dot gnu.org
2023-05-18 13:00 ` hubicka at gcc dot gnu.org
` (26 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-18 11:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 55110
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55110&action=edit
patch for the missed hoisting
For the testcase in comment#6 there is a missing code hoisting from PRE
which is caused by do_hoist_insertion doing
/* A hoistable value must be in ANTIC_IN(block)
but not in AVAIL_OUT(BLOCK). */
bitmap_initialize (&hoistable_set.values, &grand_bitmap_obstack);
bitmap_and_compl (&hoistable_set.values,
&ANTIC_IN (block)->values, &AVAIL_OUT (block)->values);
but in reality we want to check ANTIC_OUT(block), not ANTIC_IN(block).
cur.second is killed by the aggregate assignment to cur at the beginning
of the block we should hoist to and that's reflected in ANTIC_IN.
The attached patch properly re-computes ANTIC_OUT and uses that.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (9 preceding siblings ...)
2023-05-18 11:54 ` rguenth at gcc dot gnu.org
@ 2023-05-18 13:00 ` hubicka at gcc dot gnu.org
2023-05-23 9:57 ` cvs-commit at gcc dot gnu.org
` (25 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-05-18 13:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks. I tested the patch on jpegxl and it does not help there (I guess
becuase the redundancy there is partial). But it is cool we compile at least
the simplified testcase well.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (10 preceding siblings ...)
2023-05-18 13:00 ` hubicka at gcc dot gnu.org
@ 2023-05-23 9:57 ` cvs-commit at gcc dot gnu.org
2023-05-23 10:10 ` rguenth at gcc dot gnu.org
` (24 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-23 9:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e
commit r14-1127-g9e2017ae6ac788d3e36999bb0f0d20ea0f62c20e
Author: Richard Biener <rguenther@suse.de>
Date: Thu May 18 13:52:29 2023 +0200
tree-optimization/109849 - missed code hoisting
The following fixes code hoisting to properly consider ANTIC_OUT instead
of ANTIC_IN. That's a bit expensive to re-compute but since we no
longer iterate we're doing this only once per BB which should be
acceptable. This avoids missing hoistings to the end of blocks where
something in the block clobbers the hoisted value.
PR tree-optimization/109849
* tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT
and use that to determine what to hoist.
* gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (11 preceding siblings ...)
2023-05-23 9:57 ` cvs-commit at gcc dot gnu.org
@ 2023-05-23 10:10 ` rguenth at gcc dot gnu.org
2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
` (23 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 10:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this fixed the missing code hoisting - partial PRE is done with -O3 only.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (12 preceding siblings ...)
2023-05-23 10:10 ` rguenth at gcc dot gnu.org
@ 2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
2023-06-16 14:20 ` hubicka at gcc dot gnu.org
` (22 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-24 11:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:5476de2618ffb77f3a52e59e2c9f10b018329689
commit r14-1161-g5476de2618ffb77f3a52e59e2c9f10b018329689
Author: Richard Biener <rguenther@suse.de>
Date: Wed May 24 12:36:28 2023 +0200
tree-optimization/109849 - fix fallout of PRE hoisting change
The PR109849 fix made us no longer hoist some memory loads because
of the expression set intersection. We can still avoid to compute
the union by simply taking the first sets expressions and leave
the pruning of expressions with values not suitable for hoisting
to sorted_array_from_bitmap_set.
PR tree-optimization/109849
* tree-ssa-pre.cc (do_hoist_insertion): Do not intersect
expressions but take the first sets.
* gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (13 preceding siblings ...)
2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
@ 2023-06-16 14:20 ` hubicka at gcc dot gnu.org
2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
` (21 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-16 14:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
One interesting situation is:
void std::vector<std::pair<unsigned int, unsigned int> >::push_back (struct
vector * const this, const struct value_type & __x)
{
struct __normal_iterator D.27894;
struct pair * _1;
struct pair * _2;
struct pair * _3;
<bb 2> [local count: 1073741824]:
_1 = this_6(D)->D.26707._M_impl.D.26014._M_finish;
_2 = this_6(D)->D.26707._M_impl.D.26014._M_end_of_storage;
if (_1 != _2)
goto <bb 3>; [82.57%]
else
goto <bb 4>; [17.43%]
<bb 3> [local count: 886588625]:
*_1 = MEM[(const struct pair &)__x_7(D)];
_3 = _1 + 8;
this_6(D)->D.26707._M_impl.D.26014._M_finish = _3;
goto <bb 5>; [100.00%]
<bb 4> [local count: 187153200]:
D.27894._M_current = _1;
std::vector<std::pair<unsigned int, unsigned int> >::_M_realloc_insert<const
std::pair<unsigned int, unsigned int>&> (this_6(D), D.27894, __x_7(D));
<bb 5> [local count: 1073741824]:
return;
}
here we could do partial inlining and offline the call to _M_realloc_insert but
we fail to cut since _1 is already load:
Split point at BB 4
header time: 9.302800 header size: 9
split time: 2.440200 split size: 5
bbs: 4
SSA names to pass: 1, 9, 11
Refused: need to pass non-param values
It should be easy to insert code loading the parameter again in the split part.
We still hit the SRA limitation since this would be still escaping, but it is
another missed optimization on this simple testcase.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (14 preceding siblings ...)
2023-06-16 14:20 ` hubicka at gcc dot gnu.org
@ 2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
` (20 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-18 16:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e
commit r14-1909-g5a1ef1cfac005370d0a5a0f85798724cb2c9cf5e
Author: Honza <jh@ryzen3.suse.cz>
Date: Sun Jun 18 18:58:26 2023 +0200
Analyze SRA candidates in ipa-fnsummary
this patch extends ipa-fnsummary to anticipate statements that will be
removed
by SRA. This is done by looking for calls passing addresses of automatic
variables. In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.
This is just first step which is overly optimistic. We do not try to prove
that
given automatic variable will not be SRAed even after inlining. We now
also
optimistically assume that the transformation will always happen. I will
restrict
this in a followup patch, but I think it is useful to gether some data on
how
much code is affected by this.
This is motivated by PR109849 where we fail to fully inline push_back.
The patch alone does not solve the problem even for -O3, but improves
analysis in this case.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new
parameter
ES; handle ipa_predicate::not_sra_candidate.
(evaluate_properties_for_edge): Pass es to
evaluate_conditions_for_known_args.
(ipa_fn_summary_t::duplicate): Handle sra candidates.
(dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
(load_or_store_of_ptr_parameter): New function.
(points_to_possible_sra_candidate_p): New function.
(analyze_function_body): Initialize
points_to_possible_sra_candidate;
determine sra predicates.
(estimate_ipcp_clone_size_and_time): Update call of
evaluate_conditions_for_known_args.
(remap_edge_params): Update points_to_possible_sra_candidate.
(read_ipa_call_summary): Stream points_to_possible_sra_candidate
(write_ipa_call_summary): Likewise.
* ipa-predicate.cc (ipa_predicate::add_clause): Handle
not_sra_candidate.
(dump_condition): Dump it.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_possible_sra_candidate.
gcc/testsuite/ChangeLog:
PR tree-optimization/109849
* g++.dg/ipa/devirt-45.C: Update template.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (15 preceding siblings ...)
2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
@ 2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
` (19 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-19 16:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
Author: Jan Hubicka <jh@suse.cz>
Date: Mon Jun 19 18:28:17 2023 +0200
optimize std::max early
we currently produce very bad code on loops using std::vector as a stack,
since
we fail to inline push_back which in turn prevents SRA and we fail to
optimize
out some store-to-load pairs.
I looked into why this function is not inlined and it is inlined by clang.
We
currently estimate it to 66 instructions and inline limits are 15 at -O2
and 30
at -O3. Clang has similar estimate, but still decides to inline at -O2.
I looked into reason why the body is so large and one problem I spotted is
the
way std::max is implemented by taking and returning reference to the
values.
const T& max( const T& a, const T& b );
This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.
We optimize this to MAX_EXPR, but only during late optimizations. I think
this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA. The following is easist fix that simply adds phiprop
pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and
DSE
remove the memory stores.
gcc/ChangeLog:
PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.
gcc/testsuite/ChangeLog:
PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.
* gcc.dg/tree-ssa/pr21463.c: Adjust template.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (16 preceding siblings ...)
2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
@ 2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
2023-06-28 9:47 ` cvs-commit at gcc dot gnu.org
` (18 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-26 16:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:c2ebccc97190a978a44e341516b488f02a78c598
commit r14-2101-gc2ebccc97190a978a44e341516b488f02a78c598
Author: Jan Hubicka <jh@suse.cz>
Date: Mon Jun 26 18:29:39 2023 +0200
Fix profile of forwarders produced by cd-dce
compiling the testcase from PR109849 (which uses std:vector based stack to
drive a loop) with profile feedbakc leads to profile mismatches introduced
by
tree-ssa-dce. This is the new code to produce unified forwarder blocks for
PHIs.
I am not including the testcase itself since
checking it for Invalid sum is probably going to be too fragile and this
should
show in our LNT testers. The patch however fixes the mismatch.
Bootstrapped/regtested x86_64-linux and plan to commit it shortly.
gcc/ChangeLog:
PR tree-optimization/109849
* tree-ssa-dce.cc (make_forwarders_with_degenerate_phis): Fix
profile
count of newly constructed forwarder block.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (17 preceding siblings ...)
2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
@ 2023-06-28 9:47 ` cvs-commit at gcc dot gnu.org
2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
` (17 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-28 9:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:45c53768b6fa3d737ae818e31d3c50da62e0ad2b
commit r14-2157-g45c53768b6fa3d737ae818e31d3c50da62e0ad2b
Author: Jan Hubicka <jh@suse.cz>
Date: Wed Jun 28 11:45:15 2023 +0200
Add cold attribute to throw wrappers and terminate
PR middle-end/109849
* include/bits/c++config (std::__terminate): Mark cold.
* include/bits/functexcept.h: Mark everything as cold.
* libsupc++/exception: Mark terminate and unexpected as cold.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (18 preceding siblings ...)
2023-06-28 9:47 ` cvs-commit at gcc dot gnu.org
@ 2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
` (16 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-29 20:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:9dc18fca431626404b0692c689a2e103666e7adb
commit r14-2202-g9dc18fca431626404b0692c689a2e103666e7adb
Author: Jan Hubicka <jh@suse.cz>
Date: Thu Jun 29 22:45:37 2023 +0200
Compute ipa-predicates for conditionals involving __builtin_expect_p
std::vector allocator looks as follows:
__attribute__((nodiscard))
struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int>
>::allocate (struct __new_allocator * const this, size_type __n, const void *
D.27753)
{
bool _1;
long int _2;
long int _3;
long unsigned int _5;
struct pair * _9;
<bb 2> [local count: 1073741824]:
_1 = __n_7(D) > 1152921504606846975;
_2 = (long int) _1;
_3 = __builtin_expect (_2, 0);
if (_3 != 0)
goto <bb 3>; [10.00%]
else
goto <bb 6>; [90.00%]
<bb 3> [local count: 107374184]:
if (__n_7(D) > 2305843009213693951)
goto <bb 4>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 4> [local count: 53687092]:
std::__throw_bad_array_new_length ();
<bb 5> [local count: 53687092]:
std::__throw_bad_alloc ();
<bb 6> [local count: 966367641]:
_5 = __n_7(D) * 8;
_9 = operator new (_5);
return _9;
}
So there is check for allocated block size being greater than max_size
which is
wrapper in __builtin_expect. This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to
__throw
will be optimized out if __n is larady smaller than 1152921504606846975
which
it is after _M_check_len.
This patch extends ipa-fnsummary to understand functions that return their
parameter.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.
gcc/testsuite/ChangeLog:
PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (19 preceding siblings ...)
2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
@ 2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
2023-11-19 15:25 ` hubicka at gcc dot gnu.org
` (15 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30 14:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:eab57b825bcc350e9ff44eb2fa739a80199d9bb1
commit r14-2219-geab57b825bcc350e9ff44eb2fa739a80199d9bb1
Author: Jan Hubicka <jh@suse.cz>
Date: Fri Jun 30 16:27:27 2023 +0200
Fix handling of __builtin_expect_with_probability and improve first-match
heuristics
While looking into the std::vector _M_realloc_insert codegen I noticed that
call of __throw_bad_alloc is predicted with 10% probability. This is
because
the conditional guarding it has __builtin_expect (cond, 0) on it. This
incorrectly takes precedence over more reliable heuristics predicting that
call
to cold noreturn is likely not going to happen.
So I reordered the predictors so __builtin_expect_with_probability comes
first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand). I also downgraded malloc predictor since I
do
think user-defined malloc functions & new operators may behave funny ways
and
moved usual __builtin_expect after the noreturn cold predictor.
This triggered latent bug in expr_expected_value_1 where
if (*predictor < predictor2)
*predictor = predictor2;
should be:
if (predictor2 < *predictor)
*predictor = predictor2;
which eventually triggered an ICE on combining heuristics. This made me
notice
that we can do slightly better while combining expected values in case only
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.
Note that the new code may pick weaker heuristics in case that both values
are
predicted. Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.
Fixing this issue uncovered another problem. In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he
predicts
that it returns true (boolean 1). This sort of works for testcase testing
malloc (10) != NULL
but, for example, we will predict
malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code
I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.
gcc/ChangeLog:
PR middle-end/109849
* predict.cc (estimate_bb_frequencies): Turn to static function.
(expr_expected_value_1): Fix handling of binary expressions with
predicted values.
* predict.def (PRED_MALLOC_NONNULL): Move later in the priority
queue.
(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the
priority
queue.
* predict.h (estimate_bb_frequencies): No longer declare it.
gcc/testsuite/ChangeLog:
PR middle-end/109849
* gcc.dg/predict-18.c: Improve testcase.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (20 preceding siblings ...)
2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
@ 2023-11-19 15:25 ` hubicka at gcc dot gnu.org
2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
` (14 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-19 15:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
gets us closer to inlining _M_realloc_insert at -O3 (3 insns away)
Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636935.html
reduces the expense when _M_realloc_insert is not inlined at -O2 (where I think
we should not inline it, unlike for clang)
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (21 preceding siblings ...)
2023-11-19 15:25 ` hubicka at gcc dot gnu.org
@ 2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
2023-11-21 15:12 ` hubicka at gcc dot gnu.org
` (13 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-21 14:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:1d82fc2e6824bf83159389729c31a942f7b91b04
commit r14-5679-g1d82fc2e6824bf83159389729c31a942f7b91b04
Author: Jan Hubicka <jh@suse.cz>
Date: Tue Nov 21 15:17:16 2023 +0100
optimize std::vector::push_back
this patch speeds up the push_back at -O3 significantly by making the
reallocation to be inlined by default. _M_realloc_insert is general
insertion that takes iterator pointing to location where the value
should be inserted. As such it contains code to move other entries around
that is quite large.
Since appending to the end of array is common operation, I think we should
have specialized code for that. Sadly it is really hard to work out this
from IPA passes, since we basically care whether the iterator points to
the same place as the end pointer, which are both passed by reference.
This is inter-procedural value numbering that is quite out of reach.
I also added extra check making it clear that the new length of the vector
is non-zero. This saves extra conditionals. Again it is quite hard case
since _M_check_len seem to be able to return 0 if its parameter is 0.
This never happens here, but we are not able to propagate this early nor
at IPA stage.
libstdc++-v3/ChangeLog:
PR libstdc++/110287
PR middle-end/109811
PR middle-end/109849
* include/bits/stl_vector.h (_M_realloc_append): New member
function.
(push_back): Use it.
* include/bits/vector.tcc: (emplace_back): Use it.
(_M_realloc_insert): Let compiler know that new vector size is
non-zero.
(_M_realloc_append): New member function.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (22 preceding siblings ...)
2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
@ 2023-11-21 15:12 ` hubicka at gcc dot gnu.org
2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
` (12 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-11-21 15:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug 109849 depends on bug 110377, which changed state.
Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (23 preceding siblings ...)
2023-11-21 15:12 ` hubicka at gcc dot gnu.org
@ 2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
` (11 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-24 16:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:
https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
Author: Martin Jambor <mjambor@suse.cz>
Date: Fri Nov 24 17:32:35 2023 +0100
sra: SRA of non-escaped aggregates passed by reference to calls
PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it. This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.
This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls. The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk. Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.
The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs). Apart from this
case I have not noticed any regressions, at least not so far.
Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.
This version of the patch also takes care to avoid attempts to modify
abnormal edges, something which was missing in the previosu version.
gcc/ChangeLog:
2023-11-23 Martin Jambor <mjambor@suse.cz>
PR middle-end/109849
* tree-sra.cc (passed_by_ref_in_call): New.
(sra_initialize): Allocate passed_by_ref_in_call.
(sra_deinitialize): Free passed_by_ref_in_call.
(create_access): Add decl pool candidates only if they are not
already candidates.
(build_access_from_expr_1): Bail out on ADDR_EXPRs.
(build_access_from_call_arg): New function.
(asm_visit_addr): Rename to scan_visit_addr, change the
disqualification dump message.
(scan_function): Check taken addresses for all non-call statements,
including phi nodes. Process all call arguments, including the
static
chain, build_access_from_call_arg.
(maybe_add_sra_candidate): Relax need_to_live_in_memory check to
allow
non-escaped local variables.
(sort_and_splice_var_accesses): Disallow smaller-than-precision
replacements for aggregates passed by reference to functions.
(sra_modify_expr): Use a separate stmt iterator for adding
satements
before the processed statement and after it.
(enum out_edge_check): New type.
(abnormal_edge_after_stmt_p): New function.
(sra_modify_call_arg): New function.
(sra_modify_assign): Adjust calls to sra_modify_expr.
(sra_modify_function_body): Likewise, use sra_modify_call_arg to
process call arguments, including the static chain.
gcc/testsuite/ChangeLog:
2023-11-23 Martin Jambor <mjambor@suse.cz>
PR middle-end/109849
* g++.dg/tree-ssa/pr109849.C: New test.
* g++.dg/tree-ssa/sra-eh-1.C: Likewise.
* gcc.dg/tree-ssa/pr109849.c: Likewise.
* gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (24 preceding siblings ...)
2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
@ 2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` jamborm at gcc dot gnu.org
` (10 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-24 17:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:c2dcfb6ba6e9a84a16e63ae73a822ae2a843170c
commit r14-5832-gc2dcfb6ba6e9a84a16e63ae73a822ae2a843170c
Author: Jan Hubicka <jh@suse.cz>
Date: Fri Nov 24 17:59:44 2023 +0100
Use memcpy instead of memmove in __relocate_a_1
__relocate_a_1 is used to copy data after vector reizing. This can be done
by memcpy
rather than memmove.
libstdc++-v3/ChangeLog:
PR middle-end/109849
* include/bits/stl_uninitialized.h (__relocate_a_1): Use memcpy
instead
of memmove.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (25 preceding siblings ...)
2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
@ 2023-11-24 17:00 ` jamborm at gcc dot gnu.org
2023-11-27 14:39 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-24 17:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #25 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> There is nothing to sink really, loop header copying introduces a PHI and
> there's not partial redundancies but only partial-partial and those are not
> obvious to CSE because of the introduced PHI.
>
SRA now decomposes stack.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (26 preceding siblings ...)
2023-11-24 17:00 ` jamborm at gcc dot gnu.org
@ 2023-11-27 14:39 ` rguenth at gcc dot gnu.org
2023-11-28 9:33 ` redi at gcc dot gnu.org
` (8 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-27 14:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug 109849 depends on bug 112653, which changed state.
Bug 112653 Summary: PTA should handle correctly escape information of values returned by a function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (27 preceding siblings ...)
2023-11-27 14:39 ` rguenth at gcc dot gnu.org
@ 2023-11-28 9:33 ` redi at gcc dot gnu.org
2023-11-28 10:32 ` jamborm at gcc dot gnu.org
` (7 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 9:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #26 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to GCC Commits from comment #23)
> https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
>
> commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> Author: Martin Jambor <mjambor@suse.cz>
> Date: Fri Nov 24 17:32:35 2023 +0100
>
> sra: SRA of non-escaped aggregates passed by reference to calls
>
I'm seeing a large number of libstdc++ testsuite failures, bisected to this
patch.
For example:
make check -C x86_64-pc-linux-gnu/libstdc++-v3
RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc
--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"
The full list of FAILs is:
FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors)
FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors)
FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/cons-1.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-1.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-2.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-3.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++11 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-4.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++14 execution test
FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++17 execution test
FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++20 execution test
FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/system_error/what-big.cc -std=gnu++26 execution test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/cons/char/moveable2.cc -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/cons/wchar_t/moveable2.cc -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/3.cc -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/4.cc -std=gnu++26 execution test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/1.cc -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/3.cc -std=gnu++26 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++11 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++14 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++17 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++20 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++23 execution
test
FAIL: 21_strings/basic_string/operators/wchar_t/4.cc -std=gnu++26 execution
test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/char/23953.cc -std=gnu++26 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++11 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++14 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++17 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++20 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++23 execution test
FAIL: 22_locale/num_get/get/wchar_t/23953.cc -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/char/23953.cc -std=gnu++26 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++11 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++14 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++17 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++20 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++23 execution test
FAIL: 22_locale/num_put/put/wchar_t/23953.cc -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_1.cc -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++11 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++14 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++17 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++20 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++23 execution test
FAIL: 22_locale/numpunct/members/char/cache_2.cc -std=gnu++26 execution test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++11 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++14 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++17 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++20 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++23 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_1.cc -std=gnu++26 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++11 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++14 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++17 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++20 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++23 execution
test
FAIL: 22_locale/numpunct/members/wchar_t/cache_2.cc -std=gnu++26 execution
test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++11 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++14 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++17 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++20 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++23 execution test
FAIL: 27_io/basic_ofstream/assign/1.cc -std=gnu++26 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++17 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++20 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++23 execution test
FAIL: 27_io/filesystem/filesystem_error/cons.cc -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++17 execution test
FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++20 execution test
FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++23 execution test
FAIL: 27_io/filesystem/operations/canonical.cc -std=gnu++26 execution test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++17 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++20 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++23 execution
test
FAIL: 27_io/filesystem/operations/copy_file_108178.cc -std=gnu++26 execution
test
FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++17 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++20 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++23 execution test
FAIL: 27_io/filesystem/path/concat/strings.cc -std=gnu++26 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++11 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++14 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++17 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++20 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++23 execution test
FAIL: 28_regex/basic_regex/106607.cc -std=gnu++26 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++11 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++14 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++17 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++20 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++23 execution test
FAIL: 30_threads/thread/id/output.cc -std=gnu++26 execution test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/filesystem_error/cons.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++11
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++14
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++17
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++20
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++23
execution test
FAIL: experimental/filesystem/iterators/error_reporting.cc -std=gnu++26
execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++11 execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++14 execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++17 execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++20 execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++23 execution test
FAIL: experimental/filesystem/iterators/pop.cc -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/canonical.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/copy.cc -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/create_directory.cc -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/create_symlink.cc -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++11 execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++14 execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++20 execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++23 execution test
FAIL: experimental/filesystem/operations/exists.cc -std=gnu++26 execution test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/file_size.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/is_empty.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/last_write_time.cc -std=gnu++26
execution test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/permissions.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++26 execution
test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++11
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++14
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++17
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++20
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++23
execution test
FAIL: experimental/filesystem/operations/temp_directory_path.cc -std=gnu++26
execution test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++11 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++14 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++17 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++20 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++23 execution
test
FAIL: experimental/filesystem/path/factory/u8path.cc -std=gnu++26 execution
test
FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++14 execution
test
FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++17 execution
test
FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++20 execution
test
FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++23 execution
test
FAIL: experimental/net/internet/address/v6/members.cc -std=gnu++26 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++14 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++17 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++20 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++23 execution
test
FAIL: experimental/net/internet/resolver/ops/lookup.cc -std=gnu++26 execution
test
FAIL: ext/pb_ds/regression/hash_map_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_map_rand_debug.cc -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/hash_set_rand_debug.cc -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++14
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++17
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++20
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++23
execution test
FAIL: ext/pb_ds/regression/list_update_map_rand_debug.cc -std=gnu++26
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++14
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++17
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++20
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++23
execution test
FAIL: ext/pb_ds/regression/list_update_set_rand_debug.cc -std=gnu++26
execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++11 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++14 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++17 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++20 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++23 execution
test
FAIL: ext/pb_ds/regression/priority_queue_rand_debug.cc -std=gnu++26 execution
test
FAIL: ext/pb_ds/regression/tree_map_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_map_rand_debug.cc -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/tree_set_rand_debug.cc -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_map_rand_debug.cc -std=gnu++26 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++11 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++14 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++17 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++20 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++23 execution test
FAIL: ext/pb_ds/regression/trie_set_rand_debug.cc -std=gnu++26 execution test
FAIL: std/format/functions/format.cc -std=gnu++20 execution test
FAIL: std/format/functions/format.cc -std=gnu++23 execution test
FAIL: std/format/functions/format.cc -std=gnu++26 execution test
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (28 preceding siblings ...)
2023-11-28 9:33 ` redi at gcc dot gnu.org
@ 2023-11-28 10:32 ` jamborm at gcc dot gnu.org
2023-11-28 12:41 ` redi at gcc dot gnu.org
` (6 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-28 10:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #27 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #26)
> (In reply to GCC Commits from comment #23)
> > https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363
> >
> > commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363
> > Author: Martin Jambor <mjambor@suse.cz>
> > Date: Fri Nov 24 17:32:35 2023 +0100
> >
> > sra: SRA of non-escaped aggregates passed by reference to calls
> >
>
> I'm seeing a large number of libstdc++ testsuite failures, bisected to this
> patch.
>
> For example:
>
> make check -C x86_64-pc-linux-gnu/libstdc++-v3
> RUNTESTFLAGS="conformance.exp=21_strings/basic_string/operators/char/1.cc
> --target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0"
>
Unfortunately I cannot reproduce this, the above (on pristine master
commit 006e90e1344 on an x86_64-linux) results in:
Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Running
/home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 (test for
excess errors)
PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test
Can you please try if
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
fixes this?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (29 preceding siblings ...)
2023-11-28 10:32 ` jamborm at gcc dot gnu.org
@ 2023-11-28 12:41 ` redi at gcc dot gnu.org
2023-11-28 13:29 ` redi at gcc dot gnu.org
` (5 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 12:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #28 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #27)
> Unfortunately I cannot reproduce this, the above (on pristine master
> commit 006e90e1344 on an x86_64-linux) results in:
>
> Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
> Running
> /home/mjambor/gcc/small/src/libstdc++-v3/testsuite/libstdc++-dg/conformance.
> exp ...
> PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 (test for
> excess errors)
> PASS: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution
> test
Oops, sorry, that particular FAIL needs either
--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG which then
makes it fail for all -std modes:
Schedule of variations:
unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG
Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0/-D_GLIBCXX_DEBUG
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++11 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++14 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++17 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++23 execution test
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++26 execution test
Or just set GLIBCXX_TESTSUITE_STDS="17,20" in the env before running the test:
Schedule of variations:
unix/-D_GLIBCXX_USE_CXX11_ABI=0
Running target unix/-D_GLIBCXX_USE_CXX11_ABI=0
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/test/src/gcc/libstdc++-v3/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running /home/test/src/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
...
FAIL: 21_strings/basic_string/operators/char/1.cc -std=gnu++20 execution test
It seems I picked a bad example to give, which requires additional options to
FAIL.
Many of the other FAILs do not require _GLIBCXX_DEBUG or -std=gnu++20 to FAIL,
but the -D_GLIBCXX_USE_CXX11_ABI=0 option is necessary, at least for all the
ones I inspected. That option isn't used by default, but I run the full
testsuite with that several times a day, and with
GLIBCXX_TESTSUITE_STDS=98,11,14,17,20,23,26.
> Can you please try if
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> fixes this?
Testing now ...
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (30 preceding siblings ...)
2023-11-28 12:41 ` redi at gcc dot gnu.org
@ 2023-11-28 13:29 ` redi at gcc dot gnu.org
2023-11-28 15:29 ` redi at gcc dot gnu.org
` (4 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 13:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #29 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #28)
> (In reply to Martin Jambor from comment #27)
> > Can you please try if
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638318.html
> > fixes this?
>
> Testing now ...
Yes, this fixes 21_strings/basic_string/operators/char/* and
28_regex/basic_regex/106607.cc
Running the full testsuite now...
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (31 preceding siblings ...)
2023-11-28 13:29 ` redi at gcc dot gnu.org
@ 2023-11-28 15:29 ` redi at gcc dot gnu.org
2023-11-28 22:21 ` redi at gcc dot gnu.org
` (3 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 15:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #30 from Jonathan Wakely <redi at gcc dot gnu.org> ---
So far the only FAIL is still see is:
FAIL: 23_containers/vector/types/1.cc -std=gnu++98 (test for excess errors)
I'm not sure if this is caused by your patch or one of Honza's. The test only
fails with GLIBCXX_TESTSUITE_STDS=98 defined.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (32 preceding siblings ...)
2023-11-28 15:29 ` redi at gcc dot gnu.org
@ 2023-11-28 22:21 ` redi at gcc dot gnu.org
2023-11-29 12:27 ` hubicka at ucw dot cz
` (2 subsequent siblings)
36 siblings, 0 replies; 38+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-28 22:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #31 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for that
remaining FAIL as well (and it isn't fixed by the new patch).
It introduced a new warning which wasn't present before:
/tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
writing between 2 and 9223372036854775806 bytes into a region of size 0
overflows the destination [-Wstringop-overflow=]
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (33 preceding siblings ...)
2023-11-28 22:21 ` redi at gcc dot gnu.org
@ 2023-11-29 12:27 ` hubicka at ucw dot cz
2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
2024-01-03 17:41 ` jamborm at gcc dot gnu.org
36 siblings, 0 replies; 38+ messages in thread
From: hubicka at ucw dot cz @ 2023-11-29 12:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #32 from Jan Hubicka <hubicka at ucw dot cz> ---
> /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> writing between 2 and 9223372036854775806 bytes into a region of size 0
> overflows the destination [-Wstringop-overflow=]
It warns on:
template<bool _IsMove>
struct __copy_move<_IsMove, true, random_access_iterator_tag>
{
template<typename _Tp, typename _Up>
_GLIBCXX20_CONSTEXPR
static _Up*
__copy_m(_Tp* __first, _Tp* __last, _Up* __result)
{
const ptrdiff_t _Num = __last - __first;
if (__builtin_expect(_Num > 1, true))
__builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
else if (_Num == 1)
std::__copy_move<_IsMove, false, random_access_iterator_tag>::
__assign_one(__result, __first);
return __result + _Num;
}
};
It is likely false positive on a code path that never happens in real
code, but we now optimize it better.
Does it show an inline path?
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (34 preceding siblings ...)
2023-11-29 12:27 ` hubicka at ucw dot cz
@ 2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
2024-01-03 17:41 ` jamborm at gcc dot gnu.org
36 siblings, 0 replies; 38+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-29 15:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #33 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:
https://gcc.gnu.org/g:302461ad9a04d82fee904bddac69811d13d5bb6a
commit r14-5971-g302461ad9a04d82fee904bddac69811d13d5bb6a
Author: Martin Jambor <mjambor@suse.cz>
Date: Wed Nov 29 16:24:33 2023 +0100
tree-sra: Avoid returns of references to SRA candidates
The enhancement to address PR 109849 contained an importsnt thinko,
and that any reference that is passed to a function and does not
escape, must also not happen to be aliased by the return value of the
function. This has quickly transpired as bugs PR 112711 and PR
112721.
Just as IPA-modref does a good enough job to allow us to rely on the
escaped set of variables, it sems to be doing well also on updating
EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
exactly the situation we need to avoid. Of course, if a call
statement ignores any returned value, we also do not need to check the
flag.
Hopefully this does not pessimize things too much, I have verified
that the PR 109849 testcae remains quick and so should also the
benchmark it is derived from.
gcc/ChangeLog:
2023-11-27 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/112711
PR tree-optimization/112721
* tree-sra.cc (build_access_from_call_arg): New parameter
CAN_BE_RETURNED, disqualify any candidate passed by reference if it
is
true. Adjust leading comment.
(scan_function): Pass appropriate value to CAN_BE_RETURNED of
build_access_from_call_arg.
gcc/testsuite/ChangeLog:
2023-11-29 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/112711
PR tree-optimization/112721
* g++.dg/tree-ssa/pr112711.C: New test.
* gcc.dg/tree-ssa/pr112721.c: Likewise.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Bug middle-end/109849] suboptimal code for vector walking loop
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
` (35 preceding siblings ...)
2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
@ 2024-01-03 17:41 ` jamborm at gcc dot gnu.org
36 siblings, 0 replies; 38+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-01-03 17:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #34 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #32)
> > /tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
> > warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
> > writing between 2 and 9223372036854775806 bytes into a region of size 0
> > overflows the destination [-Wstringop-overflow=]
>
> It warns on:
>
> template<bool _IsMove>
> struct __copy_move<_IsMove, true, random_access_iterator_tag>
> {
> template<typename _Tp, typename _Up>
> _GLIBCXX20_CONSTEXPR
> static _Up*
> __copy_m(_Tp* __first, _Tp* __last, _Up* __result)
> {
> const ptrdiff_t _Num = __last - __first;
> if (__builtin_expect(_Num > 1, true))
> __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
> else if (_Num == 1)
> std::__copy_move<_IsMove, false, random_access_iterator_tag>::
> __assign_one(__result, __first);
> return __result + _Num;
> }
> };
>
> It is likely false positive on a code path that never happens in real
> code, but we now optimize it better.
>
We end up with:
<bb 16> [local count: 64736968]:
__builtin_memcpy (1B, v$_M_impl$D10203$_M_start_448, _354);
IIRC the statement variant is created by jump threading (specifically
thread2).
Moreover, if I understand the comment in compute_objsize_r about the
INTEGER_CST case correctly, small integers are considered potential
"result of erroneous null pointer addition/subtraction." So not
warning on a constant 1 destination does not seem to be desirable.
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2024-01-03 17:41 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13 22:26 [Bug middle-end/109849] New: suboptimal code for vector walking loop hubicka at gcc dot gnu.org
2023-05-13 22:32 ` [Bug middle-end/109849] " pinskia at gcc dot gnu.org
2023-05-13 22:40 ` pinskia at gcc dot gnu.org
2023-05-14 5:57 ` amonakov at gcc dot gnu.org
2023-05-14 9:58 ` hubicka at ucw dot cz
2023-05-14 10:01 ` hubicka at ucw dot cz
2023-05-15 6:56 ` rguenth at gcc dot gnu.org
2023-05-17 14:53 ` hubicka at gcc dot gnu.org
2023-05-17 20:36 ` rguenth at gcc dot gnu.org
2023-05-18 9:35 ` hubicka at gcc dot gnu.org
2023-05-18 11:54 ` rguenth at gcc dot gnu.org
2023-05-18 13:00 ` hubicka at gcc dot gnu.org
2023-05-23 9:57 ` cvs-commit at gcc dot gnu.org
2023-05-23 10:10 ` rguenth at gcc dot gnu.org
2023-05-24 11:19 ` cvs-commit at gcc dot gnu.org
2023-06-16 14:20 ` hubicka at gcc dot gnu.org
2023-06-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-06-19 16:28 ` cvs-commit at gcc dot gnu.org
2023-06-26 16:30 ` cvs-commit at gcc dot gnu.org
2023-06-28 9:47 ` cvs-commit at gcc dot gnu.org
2023-06-29 20:46 ` cvs-commit at gcc dot gnu.org
2023-06-30 14:28 ` cvs-commit at gcc dot gnu.org
2023-11-19 15:25 ` hubicka at gcc dot gnu.org
2023-11-21 14:17 ` cvs-commit at gcc dot gnu.org
2023-11-21 15:12 ` hubicka at gcc dot gnu.org
2023-11-24 16:35 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` cvs-commit at gcc dot gnu.org
2023-11-24 17:00 ` jamborm at gcc dot gnu.org
2023-11-27 14:39 ` rguenth at gcc dot gnu.org
2023-11-28 9:33 ` redi at gcc dot gnu.org
2023-11-28 10:32 ` jamborm at gcc dot gnu.org
2023-11-28 12:41 ` redi at gcc dot gnu.org
2023-11-28 13:29 ` redi at gcc dot gnu.org
2023-11-28 15:29 ` redi at gcc dot gnu.org
2023-11-28 22:21 ` redi at gcc dot gnu.org
2023-11-29 12:27 ` hubicka at ucw dot cz
2023-11-29 15:25 ` cvs-commit at gcc dot gnu.org
2024-01-03 17:41 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).