public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer
@ 2023-10-13 11:09 paulf at free dot fr
  2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 11:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

            Bug ID: 111797
           Summary: Code generation of -march=znver2 -O3 includes frame
                    pointer
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: paulf at free dot fr
  Target Milestone: ---

I was a bit surprised recently when I (unintentinally) ran perf record on the
exe that I work on with an -O3 build without -fno-omit-frame-pointer and I
could see the callstacks.

The function prolog that I see is

0000000000000000 <function>:
       0:       4c 8d 54 24 08          lea    0x8(%rsp),%r10
       5:       48 83 e4 e0             and    $0xffffffffffffffe0,%rsp
       9:       41 ff 72 f8             push   -0x8(%r10)
       d:       55                      push   %rbp
       e:       48 89 e5                mov    %rsp,%rbp
      11:       41 57                   push   %r15
      13:       41 56                   push   %r14
      15:       41 55                   push   %r13
      17:       41 54                   push   %r12
      19:       41 52                   push   %r10
      1b:       53                      push   %rbx
      1c:       49 89 ce                mov    %rcx,%r14
      1f:       48 81 ec 40 10 00 00    sub    $0x1040,%rsp

I asked on SO and got pointed to this post

https://stackoverflow.com/questions/45423338/whats-up-with-gcc-weird-stack-manipulation-when-it-wants-extra-stack-alignment

That problem seems to be fixed

https://godbolt.org/z/qc6fqb5hn

I can't post the source code as it is proprietary, and it doesn't seem to
reproduce with trivial examples (the function that I tried is 23kloc plus it
#includes other stuff).

I was able to reproduce the problem with the following steps (Valgrind chosen
because I'm one of the maintainers and I'm in the habit of building it).

git clone https://sourceware.org/git/valgrind.git march_zen2
cd march_zen2
./autogen.sh
./configure CFLAGS=-march=znver2
make -j 16
objdump -d --disassemble=mc_pre_clo_init mc_pre_clo_init
.in_place/memcheck-amd64-linux | less

That shows

000000005800c220 <mc_pre_clo_init>:
    5800c220:   41 55                   push   %r13
    5800c222:   bf 8c 65 1d 58          mov    $0x581d658c,%edi
    5800c227:   4c 8d 6c 24 10          lea    0x10(%rsp),%r13
    5800c22c:   48 83 e4 e0             and    $0xffffffffffffffe0,%rsp
    5800c230:   41 ff 75 f8             push   -0x8(%r13)
    5800c234:   55                      push   %rbp
    5800c235:   48 89 e5                mov    %rsp,%rbp
    5800c238:   41 55                   push   %r13
    5800c23a:   48 83 ec 08             sub    $0x8,%rsp

which I believe illustrates the same problem.

mc_pre_clo_init looks like this


static void mc_pre_clo_init(void)
{
   VG_(details_name)            ("Memcheck");
   VG_(details_version)         (NULL);
   VG_(details_description)     ("a memory error detector");
   VG_(details_copyright_author)(
      "Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.");
   VG_(details_bug_reports_to)  (VG_BUGS_TO);

VG_ is a macro that implements a kind of C namespace. The functions are all
outputting the memcheck startup banner.

I think that I understand that there is a need for a 32byte-aligned stack and
also to shuffle the return address. Is it really necessary to also use the
frame pointer?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
  2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
@ 2023-10-13 11:49 ` rguenth at gcc dot gnu.org
  2023-10-13 12:44 ` paulf at free dot fr
  2023-10-13 18:02 ` paulf at free dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-13 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it's easiest to use a frame pointer when custom stack alignment is
needed both for the return path and accessing arguments on the stack.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
  2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
  2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
@ 2023-10-13 12:44 ` paulf at free dot fr
  2023-10-13 18:02 ` paulf at free dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

--- Comment #2 from Paul Floyd <paulf at free dot fr> ---
(In reply to Richard Biener from comment #1)
> I think it's easiest to use a frame pointer when custom stack alignment is
> needed both for the return path and accessing arguments on the stack.

But is it faster, the same or slower?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/111797] Code generation of -march=znver2 -O3 includes frame pointer
  2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
  2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
  2023-10-13 12:44 ` paulf at free dot fr
@ 2023-10-13 18:02 ` paulf at free dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: paulf at free dot fr @ 2023-10-13 18:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111797

--- Comment #3 from Paul Floyd <paulf at free dot fr> ---
With clang 17.0.2 (also tried 14.0) I get

0000000000000000 <function>:
       0:       55                      push   %rbp
       1:       41 57                   push   %r15
       3:       41 56                   push   %r14
       5:       41 55                   push   %r13
       7:       41 54                   push   %r12
       9:       53                      push   %rbx
       a:       48 81 ec c8 23 00 00    sub    $0x23c8,%rsp
      11:       c5 f9 28 c1             vmovapd %xmm1,%xmm0
      15:       4c 89 8c 24 98 21 00    mov    %r9,0x2198(%rsp)

With GCC if I add -mno-avx then I get back the base pointer. I presume that
this will turn off all vector extensions from avx onwards.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-13 18:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-13 11:09 [Bug target/111797] New: Code generation of -march=znver2 -O3 includes frame pointer paulf at free dot fr
2023-10-13 11:49 ` [Bug target/111797] " rguenth at gcc dot gnu.org
2023-10-13 12:44 ` paulf at free dot fr
2023-10-13 18:02 ` paulf at free dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).