public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Hongyu Wang <wwwhhhyyy333@gmail.com>
Cc: Hongyu Wang <hongyu.wang@intel.com>,
	gcc-patches@gcc.gnu.org, hongtao.liu@intel.com
Subject: Re: [PATCH] i386: Relax inline requirement for functions with different target attrs
Date: Wed, 28 Jun 2023 10:39:25 +0200	[thread overview]
Message-ID: <CAFULd4bo_XZtOSkERy20O4q0zL2ps-ox7htPEhpPC8X0nQWQbA@mail.gmail.com> (raw)
In-Reply-To: <CA+OydWnZDuHsw6q3f9RiYQPhynwRuc0_viXUEsMbuFdsdd3ngQ@mail.gmail.com>

On Wed, Jun 28, 2023 at 10:20 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > If the user specified a different arch for callee than the caller,
> > then the compiler will switch on different ISAs (-march is just a
> > shortcut for different ISA packs), and the programmer is aware that
> > inlining isn't intended here (we have -mtune, which is not as strong
> > as -march, but even functions with different -mtune are not inlined
> > without always_inline attribute). This is documented as:
>
> The original issue comes from a case like
>
> float callee (float a, float b, float c, float d,
>             float e, float f, float g, float h)
> {
>     return a * b + c * d + e * f + g + h + a * c + b * c
>             + a * d + b * e + a * f + c * h +
>             b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> }
>
> __attribute__((target_clones("default","arch=icelake-server")))
> void caller (int n, float *a,
>             float c1, float c2, float c3,
>             float c4, float c5, float c6,
>             float c7)
> {
>   for (int i = 0; i < n; i++)
>   {
>     a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
>   }
> }
>
> For current gcc, the .icelake_server clone fails to inline callee due
> to target specific option mismatch, while the .default clone
> succeeded and the loop get vectorized. I think it is not reasonable
> that the specific clone with higher arch cannot produce better code.
> So I think at least we can decide to inline those callee without any
> arch/tune specified, but for now they are rejected by the strict arch=
> and tune= check.

Yes, I think it is reasonable to inline callee without an arch/tune
specified. We expect "default" callee to have properties that allow
inlining it into all callers, independent of callers arch/tune target
attribute.

Uros.

>
> Uros Bizjak <ubizjak@gmail.com> 于2023年6月28日周三 14:43写道:
> >
> > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > >
> > > > I don't think this is desirable. If we inline something with different
> > > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > > OTOH - we already inline with mismatched tune flags if the function is
> > > > marked with always_inline.
> > >
> > > Previously ix86_can_inline_p has
> > >
> > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> > >      != callee_opts->x_ix86_isa_flags)
> > >     || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> > >         != callee_opts->x_ix86_isa_flags2))
> > >   ret = false;
> > >
> > > It make sure caller ISA is a super set of callee, and the inlined one
> > > should follow caller's ISA specification.
> > >
> > > IMHO I cannot give a real example that after inline the caller's
> > > performance get harmed, I added PVW since there might
> > > be some callee want to limit its vector size and caller may have
> > > larger preferred vector size. At least with current change
> > > we get more optimization opportunity for different target_clones.
> > >
> > > But I agree the tuning setting may be a factor that affect the
> > > performance. One possible choice is that if the
> > > tune for callee is unspecified or default, just inline it to the
> > > caller with specified arch and tune.
> >
> > If the user specified a different arch for callee than the caller,
> > then the compiler will switch on different ISAs (-march is just a
> > shortcut for different ISA packs), and the programmer is aware that
> > inlining isn't intended here (we have -mtune, which is not as strong
> > as -march, but even functions with different -mtune are not inlined
> > without always_inline attribute). This is documented as:
> >
> > --q--
> > On the x86, the inliner does not inline a function that has different
> > target options than the caller, unless the callee has a subset of the
> > target options of the caller. For example a function declared with
> > target("sse3") can inline a function with target("sse2"), since -msse3
> > implies -msse2.
> > --/q--
> >
> > I don't think arch=skylake can be considered as a subset of arch=icelake-server.
> >
> > I agree that the compiler should reject functions with different PVW.
> > This is also in accordance with the documentation.
> >
> > Uros.
> >
> > >
> > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:
> > >
> > >
> > >
> > > >
> > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > For function with different target attributes, current logic rejects to
> > > > > inline the callee when any arch or tune is mismatched. Relax the
> > > > > condition to honor just prefer_vecotr_width_type and other flags that
> > > > > may cause safety issue so caller can get more optimization opportunity.
> > > >
> > > > I don't think this is desirable. If we inline something with different
> > > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > > OTOH - we already inline with mismatched tune flags if the function is
> > > > marked with always_inline.
> > > >
> > > > Uros.
> > > >
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > > > >
> > > > > Ok for trunk?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > > > >         tune directly, just check prefer_vector_width_type and make sure
> > > > >         not to inline if they mismatch.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.target/i386/inline-target-attr.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386.cc                       | 11 +++++----
> > > > >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> > > > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > index 0761965344b..1d86384ac06 100644
> > > > > --- a/gcc/config/i386/i386.cc
> > > > > +++ b/gcc/config/i386/i386.cc
> > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > > > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > > > >      ret = false;
> > > > >
> > > > > -  /* See if arch, tune, etc. are the same.  */
> > > > > -  else if (caller_opts->arch != callee_opts->arch)
> > > > > -    ret = false;
> > > > > -
> > > > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > > > +  /* Do not inline when specified perfer-vector-width mismatched between
> > > > > +     callee and caller.  */
> > > > > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > > > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > > > +          && callee_opts->x_prefer_vector_width_type
> > > > > +             != caller_opts->x_prefer_vector_width_type)
> > > > >      ret = false;
> > > > >
> > > > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > > new file mode 100644
> > > > > index 00000000000..995502165f0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > > > > +
> > > > > +__attribute__((target("arch=skylake")))
> > > > > +int callee (int n)
> > > > > +{
> > > > > +  int sum = 0;
> > > > > +  for (int i = 0; i < n; i++)
> > > > > +    {
> > > > > +      if (i % 2 == 0)
> > > > > +       sum +=i;
> > > > > +      else
> > > > > +       sum += (i - 1);
> > > > > +    }
> > > > > +  return sum + n;
> > > > > +}
> > > > > +
> > > > > +__attribute__((target("arch=icelake-server")))
> > > > > +int caller (int n)
> > > > > +{
> > > > > +  return callee (n) + n;
> > > > > +}
> > > > > +
> > > > > --
> > > > > 2.31.1
> > > > >

      reply	other threads:[~2023-06-28  8:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26  2:34 Hongyu Wang
2023-06-27  9:16 ` Uros Bizjak
2023-06-28  1:49   ` Hongyu Wang
2023-06-28  6:42     ` Uros Bizjak
2023-06-28  8:13       ` Hongyu Wang
2023-06-28  8:39         ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4bo_XZtOSkERy20O4q0zL2ps-ox7htPEhpPC8X0nQWQbA@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongtao.liu@intel.com \
    --cc=hongyu.wang@intel.com \
    --cc=wwwhhhyyy333@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).