public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer
@ 2023-10-13 11:09 paulf at free dot fr
2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 11:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797
Bug ID: 111797
Summary: Code generation of -march=znver2 -O3 includes frame
pointer
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: paulf at free dot fr
Target Milestone: ---
I was a bit surprised recently when I (unintentinally) ran perf record on the
exe that I work on with an -O3 build without -fno-omit-frame-pointer and I
could see the callstacks.
The function prolog that I see is
0000000000000000 <function>:
0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
5: 48 83 e4 e0 and $0xffffffffffffffe0,%rsp
9: 41 ff 72 f8 push -0x8(%r10)
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 41 52 push %r10
1b: 53 push %rbx
1c: 49 89 ce mov %rcx,%r14
1f: 48 81 ec 40 10 00 00 sub $0x1040,%rsp
I asked on SO and got pointed to this post
https://stackoverflow.com/questions/45423338/whats-up-with-gcc-weird-stack-manipulation-when-it-wants-extra-stack-alignment
That problem seems to be fixed
https://godbolt.org/z/qc6fqb5hn
I can't post the source code as it is proprietary, and it doesn't seem to
reproduce with trivial examples (the function that I tried is 23kloc plus it
#includes other stuff).
I was able to reproduce the problem with the following steps (Valgrind chosen
because I'm one of the maintainers and I'm in the habit of building it).
git clone https://sourceware.org/git/valgrind.git march_zen2
cd march_zen2
./autogen.sh
./configure CFLAGS=-march=znver2
make -j 16
objdump -d --disassemble=mc_pre_clo_init mc_pre_clo_init
.in_place/memcheck-amd64-linux | less
That shows
000000005800c220 <mc_pre_clo_init>:
5800c220: 41 55 push %r13
5800c222: bf 8c 65 1d 58 mov $0x581d658c,%edi
5800c227: 4c 8d 6c 24 10 lea 0x10(%rsp),%r13
5800c22c: 48 83 e4 e0 and $0xffffffffffffffe0,%rsp
5800c230: 41 ff 75 f8 push -0x8(%r13)
5800c234: 55 push %rbp
5800c235: 48 89 e5 mov %rsp,%rbp
5800c238: 41 55 push %r13
5800c23a: 48 83 ec 08 sub $0x8,%rsp
which I believe illustrates the same problem.
mc_pre_clo_init looks like this
static void mc_pre_clo_init(void)
{
VG_(details_name) ("Memcheck");
VG_(details_version) (NULL);
VG_(details_description) ("a memory error detector");
VG_(details_copyright_author)(
"Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.");
VG_(details_bug_reports_to) (VG_BUGS_TO);
VG_ is a macro that implements a kind of C namespace. The functions are all
outputting the memcheck startup banner.
I think that I understand that there is a need for a 32byte-aligned stack and
also to shuffle the return address. Is it really necessary to also use the
frame pointer?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
@ 2023-10-13 11:49 ` rguenth at gcc dot gnu.org
2023-10-13 12:44 ` paulf at free dot fr
2023-10-13 18:02 ` paulf at free dot fr
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-13 11:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it's easiest to use a frame pointer when custom stack alignment is
needed both for the return path and accessing arguments on the stack.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
@ 2023-10-13 12:44 ` paulf at free dot fr
2023-10-13 18:02 ` paulf at free dot fr
2 siblings, 0 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 12:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797
--- Comment #2 from Paul Floyd <paulf at free dot fr> ---
(In reply to Richard Biener from comment #1)
> I think it's easiest to use a frame pointer when custom stack alignment is
> needed both for the return path and accessing arguments on the stack.
But is it faster, the same or slower?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
2023-10-13 12:44 ` paulf at free dot fr
@ 2023-10-13 18:02 ` paulf at free dot fr
2 siblings, 0 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 18:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797
--- Comment #3 from Paul Floyd <paulf at free dot fr> ---
With clang 17.0.2 (also tried 14.0) I get
0000000000000000 <function>:
0: 55 push %rbp
1: 41 57 push %r15
3: 41 56 push %r14
5: 41 55 push %r13
7: 41 54 push %r12
9: 53 push %rbx
a: 48 81 ec c8 23 00 00 sub $0x23c8,%rsp
11: c5 f9 28 c1 vmovapd %xmm1,%xmm0
15: 4c 89 8c 24 98 21 00 mov %r9,0x2198(%rsp)
With GCC if I add -mno-avx then I get back the base pointer. I presume that
this will turn off all vector extensions from avx onwards.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-13 18:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
2023-10-13 12:44 ` paulf at free dot fr
2023-10-13 18:02 ` paulf at free dot fr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).