public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/98884] New: Implement empty struct optimisations on ARM
@ 2021-01-29 11:18 david at westcontrol dot com
  2021-01-29 11:43 ` [Bug target/98884] " redi at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: david at westcontrol dot com @ 2021-01-29 11:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

            Bug ID: 98884
           Summary: Implement empty struct optimisations on ARM
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david at westcontrol dot com
  Target Milestone: ---

Empty "tag" structs (or classes) are useful for strong typing, function
options, and so on.  The rules of C++ require these to have a non-zero size (so
that addresses of different instances are valid and distinct), but they contain
no significant data.  Ideally, therefore, the compiler will not generate code
that sets values or copies values when passing around such types. 
Unfortunately, that is not quite the case.

Consider these two examples, with foo1 creating a tag type, and foo2 passing it
on:

struct Tag {
    friend Tag make_tag();
private:
    Tag() {}
};

Tag make_tag() { 
    return Tag{}; 
};

void needs_tag(Tag);

void foo1(void) {
    Tag t = make_tag();
    needs_tag(t);
}


struct Tag1 {};
struct Tag2 {};
struct Tag3 {};
struct Tag4 {};
struct Tag5 {};

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5);

void foo2(Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5)
{
    needs_tags(12345, t1, t2, t3, t4, t5);
}


(Here is a godbolt link for convenience: <https://godbolt.org/z/o5K78h>)

On x86, since gcc 8, this has been quite efficient (this is all with -O2):

make_tag():
        xor     eax, eax
        ret
foo1():
        jmp     needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        mov     edi, 12345
        jmp     needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

The contents of the tag instances are basically ignored.  The exception is on
"make_tag", where the return is given the value 0 unnecessarily.

But on ARM it is a different matter.  This is for the Cortex-M4:


make_tag():
        mov     r0, #0
        bx      lr
foo1():
        mov     r0, #0
        b       needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        push    {lr}
        sub     sp, sp, #12
        mov     r2, #0
        mov     r3, r2
        strb    r2, [sp, #4]
        strb    r2, [sp]
        mov     r1, r2
        movw    r0, #12345
        bl      needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)
        add     sp, sp, #12
        ldr     pc, [sp], #4

The needless register and stack allocations, initialisations and copying mean
that this technique has a significant overhead for something that should really
"disappear in the compilation".

The x86 port manages this well.  Is it possible to get such optimisations into
the ARM port too?


Oh, and for comparison, clang with the same options (-std=c++17 -Wall -Wextra
-O2 -mcpu=cortex-m4) gives:

make_tag():
        bx      lr
foo1():
        b       needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        movw    r0, #12345
        b       needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
@ 2021-01-29 11:43 ` redi at gcc dot gnu.org
  2021-01-29 11:55 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: redi at gcc dot gnu.org @ 2021-01-29 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c++                         |target
             Target|                            |arm

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
IIRC x86_64 had to change how empty structs are passed, to make C++ consistent
with the x86_64 psABI as used by C (and it wasn't a trivial change to get
right).

If the ARM ABI requires stack space for them then that's unavoidable, but
they're just padding bytes so leaving that stack space uninitialized should be
OK.

Reassigning to the arm target.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
  2021-01-29 11:43 ` [Bug target/98884] " redi at gcc dot gnu.org
@ 2021-01-29 11:55 ` rguenth at gcc dot gnu.org
  2021-01-29 12:30 ` david at westcontrol dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-29 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |11.0
           Keywords|                            |missed-optimization
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
  2021-01-29 11:43 ` [Bug target/98884] " redi at gcc dot gnu.org
  2021-01-29 11:55 ` rguenth at gcc dot gnu.org
@ 2021-01-29 12:30 ` david at westcontrol dot com
  2021-01-29 12:35 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: david at westcontrol dot com @ 2021-01-29 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #2 from David Brown <david at westcontrol dot com> ---
Yes, ABI issues were my initial thought too.  If so, then optimising away the
assignments while leaving the stack manipulation (and possibly register
allocations) in place would still be a significant improvement.

However, I note that clang has no problem with generating ideal code here for
the ARM - it is not bothered by the ABI.  There could be several reasons for
that.  Perhaps the clang folk got the ABI wrong and the optimisation is not
valid for the ARM EABI.  Maybe the EABI used on the "arm (none)" target doesn't
specify these details, meaning the optimisation is valid there even if it is
not valid on "arm (linux)" targets.  I don't know the details of the ABIs well
enough to answer.

If it is an ABI issue, then I'd be quite happy with an ARM-specific flag to
enable an variation on the ABI that lets the compiler skip empty types
entirely.  When compiling for the Cortex-M devices, you rarely link much to
pre-compiled code (other than the C library) and it's usually fine to break
standards a bit to get more optimal code (like using "-ffast-math").

It would of course be best to have a general solution that works for all ARM
users (and ideally other targets too) without needing a flag.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
                   ` (2 preceding siblings ...)
  2021-01-29 12:30 ` david at westcontrol dot com
@ 2021-01-29 12:35 ` jakub at gcc dot gnu.org
  2021-01-29 12:48 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-29 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |ktkachov at gcc dot gnu.org,
                   |                            |rearnsha at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
If GCC and Clang are ABI incompatible on this, then one of the two compilers is
buggy.  So, it is needed to look at the EABI and find out which case it is.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
                   ` (3 preceding siblings ...)
  2021-01-29 12:35 ` jakub at gcc dot gnu.org
@ 2021-01-29 12:48 ` jakub at gcc dot gnu.org
  2021-02-01 15:18 ` david at westcontrol dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-29 12:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, for ABI compatibility or incompatibility it might be better to check what
happens when some argument is passed after the empty structs.  Because at least
in some ABIs one could get away with just pretending the stack slots (or
registers) are there even when they aren't actually allocated on the stack, but
one would need to have guarantees the callee e.g. will never modify those stack
slots (in most ABIs the call argument stack slot are owned by the callee).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
                   ` (4 preceding siblings ...)
  2021-01-29 12:48 ` jakub at gcc dot gnu.org
@ 2021-02-01 15:18 ` david at westcontrol dot com
  2021-02-01 15:46 ` david at westcontrol dot com
  2021-02-01 16:34 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: david at westcontrol dot com @ 2021-02-01 15:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #5 from David Brown <david at westcontrol dot com> ---
(In reply to Jakub Jelinek from comment #4)
> Note, for ABI compatibility or incompatibility it might be better to check
> what happens when some argument is passed after the empty structs.  Because
> at least in some ABIs one could get away with just pretending the stack
> slots (or registers) are there even when they aren't actually allocated on
> the stack, but one would need to have guarantees the callee e.g. will never
> modify those stack slots (in most ABIs the call argument stack slot are
> owned by the callee).

Good point.  I tried with:

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5, int y);

and

needs_tags(12345, t1, t2, t3, t4, t5, 200);

gcc (trunk - gcc 11) on x86 puts 12345 in edi and 200 in esi, just as if the
empty tags didn't exist.

So does clang for the ARM (putting them in r0 and r1 respectively).

gcc 9 for the ARM generates code as though t1 .. t5 were type "int" and value
0.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
                   ` (5 preceding siblings ...)
  2021-02-01 15:18 ` david at westcontrol dot com
@ 2021-02-01 15:46 ` david at westcontrol dot com
  2021-02-01 16:34 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: david at westcontrol dot com @ 2021-02-01 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #6 from David Brown <david at westcontrol dot com> ---
(In reply to Jakub Jelinek from comment #3)
> If GCC and Clang are ABI incompatible on this, then one of the two compilers
> is buggy.  So, it is needed to look at the EABI and find out which case it
> is.

I've had a look at the ARM C++ ABI, to the best of my abilities:

<https://developer.arm.com/documentation/ihi0041/latest>

Section 4.1 has this to say:

GC++ABI §2.27POD Data Types

The GC++ABI defines the way in which empty class types are laid out.  For the
purposes of parameter passing in [AAPCS], a parameter whose type is an empty
class shall be treated as if its type were an aggregate with a single member of
type unsigned byte.

Note: Of course, the single member has undefined content.



(This references <http://itanium-cxx-abi.github.io/cxx-abi/abi.html#pod>)


If my reading is correct, then gcc is correct and clang is wrong here - empty
classes are treated as containing a single unsigned byte, and then expanded to
a 32-bit type before passing.  (There is still no need to put a zero in these
parameters, as the value is unspecified.)

It may be that the x86 gcc port is wrong here, but I haven't looked at the
details of x86 calling conventions.


I hope someone can check this out, and a perhaps file a bug report for clang so
that they can correct it.  (Alternatively, file a bug report with ARM so that
they can change the ABI!)


However, in this particular case, if clang is wrong then I don't want to be
right.  I can see no benefit, and significant cost, in passing zeros for these
empty tag structs.  I'd be quite happy with an explicitly non-conforming switch
to enable such optimisations (just like "-fshort-enums" or other switches that
mess with caller and callee registers).  Or I'd be even happier to find that
clang is wrong and gcc ARM gets optimised without a flag :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98884] Implement empty struct optimisations on ARM
  2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
                   ` (6 preceding siblings ...)
  2021-02-01 15:46 ` david at westcontrol dot com
@ 2021-02-01 16:34 ` jakub at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-02-01 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
x86-64 is handled correctly according to the psABI, see
https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex
The empty classes will end up with NO_CLASS and thus aren't passed at all.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-01 16:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-29 11:18 [Bug c++/98884] New: Implement empty struct optimisations on ARM david at westcontrol dot com
2021-01-29 11:43 ` [Bug target/98884] " redi at gcc dot gnu.org
2021-01-29 11:55 ` rguenth at gcc dot gnu.org
2021-01-29 12:30 ` david at westcontrol dot com
2021-01-29 12:35 ` jakub at gcc dot gnu.org
2021-01-29 12:48 ` jakub at gcc dot gnu.org
2021-02-01 15:18 ` david at westcontrol dot com
2021-02-01 15:46 ` david at westcontrol dot com
2021-02-01 16:34 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).