public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/67435] New: Large performance drop on apparently unrelated changes (probable cause : strange inlining side-effect)
@ 2015-09-02 13:24 yann.collet.73 at gmail dot com
  2015-09-02 14:01 ` [Bug c/67435] " trippels at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: yann.collet.73 at gmail dot com @ 2015-09-02 13:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435

            Bug ID: 67435
           Summary: Large performance drop on apparently unrelated changes
                    (probable cause : strange inlining side-effect)
           Product: gcc
           Version: 4.8.4
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yann.collet.73 at gmail dot com
  Target Milestone: ---

Some weird effect with gcc (tested version : 4.8.4).

I've got a performance oriented code, which runs pretty fast. Its speed depends
for a large part on inlining many small functions.
There is no inline statement. All functions are either normal or static.
Automatic inlining decision is solely within compiler's realm, which has worked
fine so far (functions to inline are very small, typically from 1 to 5 lines).

Since inlining across multiple .c files is difficult (-flto is not yet widely
available), I've kept a lot of small functions into a single `.c` file, into
which I'm also developing a codec, and its associated decoder. It's
"relatively" large by my standard (about ~2000 lines, although a lot of them
are mere comments and blank lines), but breaking it into smaller parts opens
new problems, so I would prefer to avoid that, if that is possible.

Encoder and Decoder are related, since they are inverse operations. But from a
programming perspective, they are completely separated, sharing nothing in
common, except a few typedef and very low-level functions (such as reading from
unaligned memory position).

The strange effect is this one :

I recently added a new function fnew to the encoder side. It's a new "entry
point". It's not used nor called from anywhere within the .c file.

The simple fact that it exists makes the performance of the decoder function
fdec drops substantially, by more than 20%, which is way too much to be
ignored.

Now, keep in mind that encoding and decoding operations are completely
separated, they share almost nothing, save some minor typedef (u32, u16 and
such) and associated operations (read/write).

When defining the new encoding function fnew as static, performance of the
decoder fdec increases back to normal. Since fnew isn't called from the .c, I
guess it's the same as if it was not there (dead code elimination).

If static fnew is now called from the encoder side, performance of fdec remains
good.
But as soon as fnew is modified, fdec performance just drops substantially.

Presuming fnew modifications crossed a threshold, I increased the following gcc
parameter : --param max-inline-insns-auto=60 (by default, its value is supposed
to be 40.) And it worked : performance of fdec is now back to normal.

But I guess this game will continue forever with each little modification of
fnew or anything else similar, requiring further tweak on some customized
advance parameter. So I want to avoid that.

I tried another variant : I'm adding another completely useless function, just
to play with. Its content is strictly exactly a copy-paste of fnew, but the
name of the function is obviously different, so let's call it wtf.

When wtf exists (on top of fnew), it doesn't matter if fnew is static or not,
nor what is the value of max-inline-insns-auto : performance of fdec is just
back to normal. Even though wtf is not used nor called from anywhere... :'(


All these effects look plain weird. There is no logical reason for some little
modification in function fnew to have knock-on effect on completely unrelated
function fdec, which only relation is to be in the same file.

I'm trying to understand what could be going on, in order to develop the codec
more reliably. 
For the time being, any modification in function A can have large ripple
effects (positive or negative) on completely unrelated function B, making each
step a tedious process with random outcome. A developer's nightmare.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-20 13:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-02 13:24 [Bug c/67435] New: Large performance drop on apparently unrelated changes (probable cause : strange inlining side-effect) yann.collet.73 at gmail dot com
2015-09-02 14:01 ` [Bug c/67435] " trippels at gcc dot gnu.org
2015-09-02 14:03 ` pinskia at gcc dot gnu.org
2015-09-02 14:51 ` yann.collet.73 at gmail dot com
2015-09-03  2:44 ` yann.collet.73 at gmail dot com
2015-09-03  7:19 ` [Bug c/67435] Large performance drop on apparently unrelated changes (potential cause : critical loop instruction alignment) trippels at gcc dot gnu.org
2015-09-03 18:47 ` yann.collet.73 at gmail dot com
2015-09-04 10:23 ` [Bug c/67435] Large performance drop on apparently unrelated changes (potential cause : hot loop alignment) trippels at gcc dot gnu.org
2015-09-04 10:34 ` [Bug c/67435] Feature request: Implement align-loops attribute trippels at gcc dot gnu.org
2015-10-20 13:09 ` yann.collet.73 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).