public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization
@ 2020-12-16 22:02 mcolavita at fb dot com
2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: mcolavita at fb dot com @ 2020-12-16 22:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Bug ID: 98335
Summary: [9/10/11 Regression] Poor code generation for partial
struct initialization
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: mcolavita at fb dot com
Target Milestone: ---
Consider the following code:
struct Data {
char a;
int b;
};
char c;
Data val(int idx) {
return { c };
}
On x86-64 (with sizeof(char) = 1 and sizeof(int) = 4), val() can be implemented
with a single mov to %rax. With -O3, g++ produces the following inefficient
output:
xorl %eax, %eax
movb $0, -18(%rsp)
movabsq $72057594037927935, %rdx
movw %ax, -20(%rsp)
movl $0, -24(%rsp)
andq -24(%rsp), %rdx
movq %rdx, %rax
salq $8, %rax
movb c(%rip), %al
ret
Similar outputs are seen for any level of optimization above O0 on GCC 9, 10,
and 11. The bug is not present in GCC 8.
Reasonable code is generated if the second field of the struct is explicitly
initialized to a constant, either in the struct definition or the initializer.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug rtl-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
@ 2020-12-17 16:19 ` mcolavita at fb dot com
2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: mcolavita at fb dot com @ 2020-12-17 16:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
--- Comment #1 from Michael Colavita <mcolavita at fb dot com> ---
A similar problem appears to occur for the following example:
struct Data {
long a;
union {
long u;
struct {
char b;
char pad[3];
};
};
};
Data val(long d, long e) {
return { d, e };
}
Yielding:
movq %rdi, %xmm0
movq %rsi, %xmm1
punpcklqdq %xmm1, %xmm0
movaps %xmm0, -56(%rsp)
movq -56(%rsp), %rax
movq -48(%rsp), %rdx
ret
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
@ 2021-01-04 15:40 ` rguenth at gcc dot gnu.org
2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-04 15:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ebotcazou at gcc dot gnu.org,
| |jakub at gcc dot gnu.org
Keywords| |missed-optimization
Priority|P3 |P2
Component|rtl-optimization |tree-optimization
Last reconfirmed| |2021-01-04
Ever confirmed|0 |1
Target Milestone|--- |9.4
Status|UNCONFIRMED |NEW
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We expand the first case from
MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
c.0_1 = c;
D.2365.a = c.0_1;
return D.2365;
I guess store-merging could "merge" the stores as
D.2365 = {};
D.2365.a = c.0_1;
thus figure the partial unaligned zeroing is better done aligned
(and redundant). Alternatively it could emit
V_C_E<unsigned> = (unsigned) c.0_1;
The second testcase looks vectorization/ABI related for which we have plenty
of dups.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
@ 2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
2021-01-20 9:51 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2021-01-06 11:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
--- Comment #3 from Eric Botcazou <ebotcazou at gcc dot gnu.org> ---
> We expand the first case from
>
> MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
> c.0_1 = c;
> D.2365.a = c.0_1;
> return D.2365;
But why generate a 7-byte zeroing instead of a 8-byte one? I gather this is
the cause of the regression.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (2 preceding siblings ...)
2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
@ 2021-01-20 9:51 ` jakub at gcc dot gnu.org
2021-01-21 11:27 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-20 9:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |law at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I agree it would be weird to try to undo the
- D.2365 = {};
+ MEM <char[7]> [(struct Data *)&D.2365 + 1B] = {};
transformation by DSE in store_merging instead of adjusting the DSE
optimization to take into account costs and likely ways how the clearing will
be expanded.
On the other side, the user could have written it that way.
Regressed with r9-1663-g99e87c0eef2f6020a3ded2c785389939c07ac04e aka PR86010
fix.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (3 preceding siblings ...)
2021-01-20 9:51 ` jakub at gcc dot gnu.org
@ 2021-01-21 11:27 ` jakub at gcc dot gnu.org
2021-06-01 8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-21 11:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So I think we want to improve that
+ /* If more than a word remains, then make sure to keep the
+ starting point at least word aligned. */
+ if (last_live - first_live > UNITS_PER_WORD)
+ *trim_head &= (UNITS_PER_WORD - 1);
Note, last_live is the start of the last live byte (so last_live + 1 is the end
of that).
For the small sizes, I'd say we should consider both alignment and exact
head/tail trim values.
Whole word store is definitely more efficient than 7 bytes store at offset 1,
ditto head trim 2 and 3, storing just second half is ok.
So shall we e.g. call by_pieces_ninsns for the before/after the expected
triming and determine only trim if it doesn't increase number of by pieces
store insns?
It could also iterate on those.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (4 preceding siblings ...)
2021-01-21 11:27 ` jakub at gcc dot gnu.org
@ 2021-06-01 8:19 ` rguenth at gcc dot gnu.org
2022-03-07 13:01 ` roger at nextmovesoftware dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01 8:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.4 |9.5
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (5 preceding siblings ...)
2021-06-01 8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
@ 2022-03-07 13:01 ` roger at nextmovesoftware dot com
2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-03-07 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
Status|NEW |ASSIGNED
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
Patches proposed (one middle-end and one backend).
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591273.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591285.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (6 preceding siblings ...)
2022-03-07 13:01 ` roger at nextmovesoftware dot com
@ 2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
2022-03-14 18:44 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-11 17:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:c5288df751f9ecd11898dec5f2a7b6b03267f79e
commit r12-7615-gc5288df751f9ecd11898dec5f2a7b6b03267f79e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Mar 11 17:46:50 2022 +0000
PR tree-optimization/98335: Improvements to DSE's compute_trims.
This patch is the main middle-end piece of a fix for PR tree-opt/98335,
which is a code-quality regression affecting mainline. The issue occurs
in DSE's (dead store elimination's) compute_trims function that determines
where a store to memory can be trimmed. In the testcase given in the
PR, this function notices that the first byte of a DImode store is dead,
and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
at (unaligned) offset one. Most architectures can store a power-of-two
bytes (up to a maximum) in single instruction, so writing 7 bytes requires
more instructions than writing 8 bytes. This patch follows Jakub Jelinek's
suggestion in comment 5, that compute_trims needs improved heuristics.
On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
movl $0, -24(%rsp)
movabsq $72057594037927935, %rdx
movl $0, -21(%rsp)
andq -24(%rsp), %rdx
movq %rdx, %rax
salq $8, %rax
movb c(%rip), %al
ret
to
xorl %eax, %eax
movb c(%rip), %al
ret
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/98335
* builtins.cc (get_object_alignment_2): Export.
* builtins.h (get_object_alignment_2): Likewise.
* tree-ssa-alias.cc (ao_ref_alignment): New.
* tree-ssa-alias.h (ao_ref_alignment): Declare.
* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
to align head/tail, writing more bytes but using fewer store insns.
gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.dg/pr98335.C: New test case.
* gcc.dg/pr86010.c: New test case.
* gcc.dg/pr86010-2.c: New test case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (7 preceding siblings ...)
2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
@ 2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
2022-03-14 18:44 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-11 17:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:251ea6dfbdb4448875e41081682bb3aa451b5729
commit r12-7616-g251ea6dfbdb4448875e41081682bb3aa451b5729
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Mar 11 17:57:12 2022 +0000
PR tree-optimization/98335: New peephole2 xorl;movb -> movzbl
This patch is the backend piece of my proposed fix to PR tree-opt/98335,
to allow C++ partial struct initialization to be as efficient/optimized
as full struct initialization.
With the middle-end patch just posted to gcc-patches, the test case
in the PR compiles on x86_64-pc-linux-gnu with -O2 to:
xorl %eax, %eax
movb c(%rip), %al
ret
with this additional peephole2 (actually four peephole2s):
movzbl c(%rip), %eax
ret
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/98335
* config/i386/i386.md (peephole2): Eliminate redundant insv.
Combine movl followed by movb. Transform xorl followed by
a suitable movb or movw into the equivalent movz[bw]l.
gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.target/i386/pr98335.C: New test case.
* gcc.target/i386/pr98335.c: New test case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/98335] [9/10/11/12 Regression] Poor code generation for partial struct initialization
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
` (8 preceding siblings ...)
2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
@ 2022-03-14 18:44 ` roger at nextmovesoftware dot com
9 siblings, 0 replies; 11+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-03-14 18:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98335
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.5 |12.0
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #10 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed on mainline.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-03-14 18:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 22:02 [Bug rtl-optimization/98335] New: [9/10/11 Regression] Poor code generation for partial struct initialization mcolavita at fb dot com
2020-12-17 16:19 ` [Bug rtl-optimization/98335] " mcolavita at fb dot com
2021-01-04 15:40 ` [Bug tree-optimization/98335] " rguenth at gcc dot gnu.org
2021-01-06 11:01 ` ebotcazou at gcc dot gnu.org
2021-01-20 9:51 ` jakub at gcc dot gnu.org
2021-01-21 11:27 ` jakub at gcc dot gnu.org
2021-06-01 8:19 ` [Bug tree-optimization/98335] [9/10/11/12 " rguenth at gcc dot gnu.org
2022-03-07 13:01 ` roger at nextmovesoftware dot com
2022-03-11 17:53 ` cvs-commit at gcc dot gnu.org
2022-03-11 17:59 ` cvs-commit at gcc dot gnu.org
2022-03-14 18:44 ` roger at nextmovesoftware dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).