public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114262] New: Over-inlining when optimizing for size?
@ 2024-03-07  2:00 lh_mouse at 126 dot com
  2024-03-07  2:07 ` [Bug tree-optimization/114262] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2024-03-07  2:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

            Bug ID: 114262
           Summary: Over-inlining when optimizing for size?
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lh_mouse at 126 dot com
  Target Milestone: ---

(https://gcc.godbolt.org/z/a4ox6oEfT)
```
struct impl;
struct impl* get_impl(int key);
int get_value(struct impl* p);


extern __inline__ __attribute__((__gnu_inline__))
int get_value_by_key(int key)
  {
    struct impl* p = get_impl(key);
    if(!p)
      return -1;
    return get_value(p);
  }

int real_get_value_by_key(int key)
  {
    return get_value_by_key(key);
  }

```

This is actually two functions, one is `gnu_inline` and the other is a
non-inline one. It looks to me that if I mark a function `gnu_inline`, I assert
that 'somewhere I shall provide an external definition for you' so when
optimizing for size, GCC may generate a call instead of using the more complex
inline definition.

The `real_get_value_by_key` function is made a deliberate sibling call, so
ideally this should be
```
real_get_value_by_key:
        jmp     get_value_by_key
```
and not 
```
real_get_value_by_key:
        push    rsi
        call    get_impl
        test    rax, rax
        je      .L2
        mov     rdi, rax
        pop     rcx
        jmp     get_value
.L2:
        or      eax, -1
        pop     rdx
        ret
```

It still gets inlined with `-finline-limit=0` and can only be disabled by
`-fno-inline`. I have no idea how it is controlled.

---------------------------

# Trivia

These are two `gnu_inline` functions from the same library. Most of the time
they should both be inlined in user code. However, external definitions are
required when optimization is not turned on, or when their addresses are taken,
so they must still exist. As they are unlikely to be used  anyway, optimizing
for size makes much more sense.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114262] Over-inlining when optimizing for size?
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
@ 2024-03-07  2:07 ` pinskia at gcc dot gnu.org
  2024-03-07  2:33 ` [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function lh_mouse at 126 dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07  2:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I thought it was documented that gnu_inline also causes always_inline if
optimization is turned on but I can't seem to find that ...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
  2024-03-07  2:07 ` [Bug tree-optimization/114262] " pinskia at gcc dot gnu.org
@ 2024-03-07  2:33 ` lh_mouse at 126 dot com
  2024-03-07  2:57 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2024-03-07  2:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #2 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Andrew Pinski from comment #1)
> I thought it was documented that gnu_inline also causes always_inline if
> optimization is turned on but I can't seem to find that ...

Is that the case in GCC source? I think I would have to find a workaround for
it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
  2024-03-07  2:07 ` [Bug tree-optimization/114262] " pinskia at gcc dot gnu.org
  2024-03-07  2:33 ` [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function lh_mouse at 126 dot com
@ 2024-03-07  2:57 ` pinskia at gcc dot gnu.org
  2024-03-07  3:20 ` lh_mouse at 126 dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07  2:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |documentation

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The C++ front-end does:
  /* Handle gnu_inline attribute.  */
  if (GNU_INLINE_P (decl1))
    {
      DECL_EXTERNAL (decl1) = 1;
      DECL_NOT_REALLY_EXTERN (decl1) = 0;
      DECL_INTERFACE_KNOWN (decl1) = 1;
      DECL_DISREGARD_INLINE_LIMITS (decl1) = 1;
    }

C front-end does:
  /* For GNU C extern inline functions disregard inline limits.  */
  if (DECL_EXTERNAL (fndecl)
      && DECL_DECLARED_INLINE_P (fndecl)
      && (flag_gnu89_inline
          || lookup_attribute ("gnu_inline", DECL_ATTRIBUTES (fndecl))))
    DECL_DISREGARD_INLINE_LIMITS (fndecl) = 1;

This specifically from r0-82849-gc536a6a77a19a8 but it was done different
before that (using a language hook).

https://gcc.gnu.org/pipermail/gcc-patches/2007-July/221806.html
https://gcc.gnu.org/pipermail/gcc-patches/2007-August/223406.html


It looks like it has been this way since r0-37737-g4838c5ee553f06 (2001) (or
rather that is when it was used by the tree inline; I don't want to dig further
back to understand the RTL inliner). So looks like this is just missing
documentation ...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
                   ` (2 preceding siblings ...)
  2024-03-07  2:57 ` pinskia at gcc dot gnu.org
@ 2024-03-07  3:20 ` lh_mouse at 126 dot com
  2024-03-07  3:25 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2024-03-07  3:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #4 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Andrew Pinski from comment #3)
> It looks like it has been this way since r0-37737-g4838c5ee553f06 (2001) (or
> rather that is when it was used by the tree inline; I don't want to dig
> further back to understand the RTL inliner). So looks like this is just
> missing documentation ...

It's not just about `gnu_inline`. If we switch to C++ inline we get the same
result:

(https://gcc.godbolt.org/z/ehbjqj5xh)
```
struct impl;
struct impl* get_impl(int key);
int get_value(struct impl* p);


extern inline
int get_value_by_key(int key)
  {
    struct impl* p = get_impl(key);
    if(!p)
      return -1;
    return get_value(p);
  }

int real_get_value_by_key(int key)
  {
    return get_value_by_key(key);
  }
```

GCC outputs:
```
real_get_value_by_key(int):
        push    rsi
        call    get_impl(int)
        test    rax, rax
        je      .L2
        mov     rdi, rax
        pop     rcx
        jmp     get_value(impl*)
.L2:
        or      eax, -1
        pop     rdx
        ret
```


If we switched to C99 `extern inline` then it would produce desired result:
```
get_value_by_key:
        push    rsi
        call    get_impl
        test    rax, rax
        je      .L2
        mov     rdi, rax
        pop     rcx
        jmp     get_value
.L2:
        or      eax, -1
        pop     rdx
        ret
real_get_value_by_key:
        jmp     get_value_by_key
``

The only difference between the C99 `extern inline` and C++ `extern inline` is
that the C++ external definition is COMDAT.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
                   ` (3 preceding siblings ...)
  2024-03-07  3:20 ` lh_mouse at 126 dot com
@ 2024-03-07  3:25 ` pinskia at gcc dot gnu.org
  2024-03-07 15:55   ` Jan Hubicka
  2024-03-07 15:55 ` hubicka at ucw dot cz
  2024-03-07 17:08 ` lh_mouse at 126 dot com
  6 siblings, 1 reply; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-07  3:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to LIU Hao from comment #4) 
> The only difference between the C99 `extern inline` and C++ `extern inline`
> is that the C++ external definition is COMDAT.

Well not really. comdat changes heurstics here though. The reason being C++
inline functions are most likely smaller and should really be inlined. This is
all heurstics of inlining and figuring out locally in the TU that it might not
be called in another TU or not.

Note GCC has not retuned its -Os heurstics for a long time because it has been
decent enough for most folks and corner cases like this is almost never come
up.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  3:25 ` pinskia at gcc dot gnu.org
@ 2024-03-07 15:55   ` Jan Hubicka
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Hubicka @ 2024-03-07 15:55 UTC (permalink / raw)
  To: pinskia at gcc dot gnu.org; +Cc: gcc-bugs

> Note GCC has not retuned its -Os heurstics for a long time because it has been
> decent enough for most folks and corner cases like this is almost never come
> up.
There were quite few changes to -Os heuristics :)
One of bigger challenges is that we do see more and more C++ code built
with -Os which relies on certain functions to be inlined and optimized
in context, so we had to get more optimistic in a hope that inlined code
will optimize well.

COMDAT functions are more likely inlined because statistics shows that
many of them are not really shared between translations units
(see -param=comdat-sharing-probability parameter). This was necessary to
get reasonable code for Firefox approx 15 years ago.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
                   ` (4 preceding siblings ...)
  2024-03-07  3:25 ` pinskia at gcc dot gnu.org
@ 2024-03-07 15:55 ` hubicka at ucw dot cz
  2024-03-07 17:08 ` lh_mouse at 126 dot com
  6 siblings, 0 replies; 9+ messages in thread
From: hubicka at ucw dot cz @ 2024-03-07 15:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #6 from Jan Hubicka <hubicka at ucw dot cz> ---
> Note GCC has not retuned its -Os heurstics for a long time because it has been
> decent enough for most folks and corner cases like this is almost never come
> up.
There were quite few changes to -Os heuristics :)
One of bigger challenges is that we do see more and more C++ code built
with -Os which relies on certain functions to be inlined and optimized
in context, so we had to get more optimistic in a hope that inlined code
will optimize well.

COMDAT functions are more likely inlined because statistics shows that
many of them are not really shared between translations units
(see -param=comdat-sharing-probability parameter). This was necessary to
get reasonable code for Firefox approx 15 years ago.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function
  2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
                   ` (5 preceding siblings ...)
  2024-03-07 15:55 ` hubicka at ucw dot cz
@ 2024-03-07 17:08 ` lh_mouse at 126 dot com
  6 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2024-03-07 17:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114262

--- Comment #7 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Jan Hubicka from comment #6)
> > Note GCC has not retuned its -Os heurstics for a long time because it has been
> > decent enough for most folks and corner cases like this is almost never come
> > up.
> There were quite few changes to -Os heuristics :)
> One of bigger challenges is that we do see more and more C++ code built
> with -Os which relies on certain functions to be inlined and optimized
> in context, so we had to get more optimistic in a hope that inlined code
> will optimize well.
> 
> COMDAT functions are more likely inlined because statistics shows that
> many of them are not really shared between translations units
> (see -param=comdat-sharing-probability parameter). This was necessary to
> get reasonable code for Firefox approx 15 years ago.

So is there no way to get the C99 extern inline behavior? i.e. sibling calls to
gnu_inline functions are inlined even when optimizing for size.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-07 17:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07  2:00 [Bug tree-optimization/114262] New: Over-inlining when optimizing for size? lh_mouse at 126 dot com
2024-03-07  2:07 ` [Bug tree-optimization/114262] " pinskia at gcc dot gnu.org
2024-03-07  2:33 ` [Bug ipa/114262] Over-inlining when optimizing for size with gnu_inline function lh_mouse at 126 dot com
2024-03-07  2:57 ` pinskia at gcc dot gnu.org
2024-03-07  3:20 ` lh_mouse at 126 dot com
2024-03-07  3:25 ` pinskia at gcc dot gnu.org
2024-03-07 15:55   ` Jan Hubicka
2024-03-07 15:55 ` hubicka at ucw dot cz
2024-03-07 17:08 ` lh_mouse at 126 dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).