public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Jeff Law <law@redhat.com>
Cc: "cmdLP #CODE" <mccmdlp@gmail.com>, GCC Development <gcc@gcc.gnu.org>
Subject: Re: __attribute__((early_branch))
Date: Thu, 02 May 2019 12:18:00 -0000	[thread overview]
Message-ID: <CAFiYyc2B21pwzDzj-a7_xsOiOrojdrhcBmE9X8O9kKccKmbwFA@mail.gmail.com> (raw)
In-Reply-To: <9c490839-079f-d87b-ace6-85f75a0a3258@redhat.com>

On Tue, Apr 30, 2019 at 9:53 PM Jeff Law <law@redhat.com> wrote:
>
> On 4/30/19 12:34 PM, cmdLP #CODE wrote:
> > Hello GCC-team,
> >
> > I use GCC for all my C and C++ programs. I know how to use GCC, but I am
> > not a contributor to GCC (yet). I often discover some problems C and C++
> > code have in general. There is often the choice between fast or readable
> > code. Some code I have seen performs well but looks ugly (goto, etc.);
> > other code is simple, but has some performance problems. What if we could
> > make the simple code perform well?
> >
> > There is a common problem with conditional branches inside loops. This can
> > decrease the performance of a program. To fix this issue, the conditional
> > branch should be moved outside of the loop. Sometimes this optimization is
> > done by the compiler, but guessing on optimizations done by the compiler is
> > really bad. Often it is not easy to transform the source code to have the
> > conditional branching outside the loop. Instead I propose a new attribute,
> > which forces the compiler to do a conditional branch (based on the
> > annotated parameter) at the beginning of a function. It branches to the
> > corresponding code of the function compiled with the value being constant.
> >
> > Here is a code example, which contains this issue.
> >
> >     enum reduce_op
> >     {
> >         REDUCE_ADD,
> >         REDUCE_MULT,
> >         REDUCE_MIN,
> >         REDUCE_MAX
> >     };
> >
> >     /* performance critical function */
> >     void reduce_data(enum reduce_op reduce,
> >                      unsigned const* data,
> >                      unsigned data_size)
> >     {
> >         unsigned i, result, item;
> >
> >         result = reduce == REDUCE_MULT ?  1u
> >                : reduce == REDUCE_MIN  ? ~0u // ~0u is UINT_MAX
> >                :                          0u;
> >
> >         for(i = 0; i < data_size; ++i)
> >         {
> >             item = data[i];
> >
> >             switch(reduce)
> >             {
> >                 case REDUCE_ADD:
> >                     result += item;
> >                 break;
> >
> >                 case REDUCE_MULT:
> >                     result *= item;
> >                 break;
> >
> >                 case REDUCE_MIN:
> >                     if(item < result) result = item;
> >                     // RIP: result <?= item;
> >                 break;
> >
> >                 case REDUCE_MAX:
> >                     if(item > result) result = item;
> >                     // RIP: result >?= item;
> >                 break;
> >             }
> >         }
> >
> >         return result;
> >     }
> >
> > The value of  reduce  does not change inside the function. For this
> > example, the optimization is trivial. But consider more complex examples.
> > The function should be optimized to:
> [ .... ]
> This is loop unswitching.  It's a standard GCC optimization.  If it's
> not working as well as it should, we're far better off determining why
> and fixing the automatic transformation rather than relying on
> attributes to drive the transformation.

It's currently not implemented for switch () stmts, just for conditionals.
This also hurts SPEC cactusADM.  There might be a missed-optimization
bug about this.  A simple recursive implementation might be possible;
unswitch one case at a time - maybe order by profile probability.  We
already recurse on the unswitched bodies (in case multiple conditions
can be unswitched)

>
> > What if the variable changes?
> >
> > When the variable changes there should be a jump to the corresponding
> > parallel code of the compiled code with new value.
> >
> > Unoptimized
> >
> >     /* removes comments starting with # and ending in a newline */
> >     void remove_comment(char* dst,
> >                         char const* src)
> >     {
> >         // initialization nessecary
> >         int state __attribute__((early_branch(0, 1))) = 0;
> >
> >         char c;
> >
> >         while(*src)
> >         {
> >             c = *src++;
> >
> >             switch(state)
> >             {
> >                 case 0:
> >                     if(c == '#')
> >                         state = 1;
> >                     else
> >                         *dst++ = c;
> >
> >                     break;
> >
> >                 case 1:
> >                     if(c == '\n')
> >                     {
> >                         *dst++ = '\n';
> >                         state = 0;
> >                     }
> >
> >                     break;
> >             }
> >         }
> >         *dst = '\0';
> >     }
> >
> > changed to
> >
> >     void remove_comment(char* dst,
> >                         char const* src)
> >     {
> >         char c;
> >
> >         switch(0)
> >         {
> >             case 0:
> >                 while(*src)
> >                 {
> >                     c = *src++;
> >                     if(c == '#')
> >                         goto branch1;
> >                     else
> >                         *dst++ = c;
> >                     branch0:
> >                 }
> >                 break;
> >
> >             case 1:
> >                 while(*src)
> >                 {
> >                     if(*src++ == '\n')
> >                     {
> >                         *dst++ = '\n';
> >                         goto branch0;
> >                     }
> >                     branch1:
> >                 }
> >                 break;
> >         }
> >         *dst = '\0';
> >     }
> >
> > You see that the state is not tested in each iteration of the loop(s). For
> > complex cases (like parsers), this can improve performance. This attribute
> > is needed to avoid such a goto usage. I have seen such code, which is
> > unmaintainable but performs better than the first solution.
> This is backwards jump threading which was primarily designed to help
> state machines by avoiding evaluation of the switch at each iteration of
> the loop and instead jumping directly to the desired target.
>
> Again, if it's not working for a particular case we're far better off
> figuring out why than relying on attributes to drive the transformation.
>
> jeff
>

  reply	other threads:[~2019-05-02 12:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-30 18:35 __attribute__((early_branch)) cmdLP #CODE
2019-04-30 19:53 ` __attribute__((early_branch)) Jeff Law
2019-05-02 12:18   ` Richard Biener [this message]
2019-05-02 16:16     ` __attribute__((early_branch)) Segher Boessenkool
2019-05-03  7:11       ` __attribute__((early_branch)) Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc2B21pwzDzj-a7_xsOiOrojdrhcBmE9X8O9kKccKmbwFA@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=mccmdlp@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).