public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2] i386: Inline function with default arch/tune to caller
@ 2023-07-04  3:12 Hongyu Wang
  2023-07-04  6:18 ` Uros Bizjak
  0 siblings, 1 reply; 5+ messages in thread
From: Hongyu Wang @ 2023-07-04  3:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak, hongtao.liu

Hi,

For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to allow callee with default arch/tune to be inlined.

Boostrapped/regtested on x86-64-linux-gnu{-m32,}.

Ok for trunk?

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_can_inline_p): If callee has
	default arch=x86-64 and tune=generic, do not block the
	inlining to its caller.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/inline_target_clones.c: New test.
---
 gcc/config/i386/i386.cc                       | 22 +++++++++++------
 .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
 2 files changed, 39 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8989985700a..4741c9b5364 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
 	       != (callee_opts->x_target_flags & ~always_inline_safe_mask))
     ret = false;
 
-  /* See if arch, tune, etc. are the same.  */
-  else if (caller_opts->arch != callee_opts->arch)
-    ret = false;
-
-  else if (!always_inline && caller_opts->tune != callee_opts->tune)
-    ret = false;
-
   else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
 	   /* If the calle doesn't use FP expressions differences in
 	      ix86_fpmath can be ignored.  We are called from FEs
@@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
 	       || ipa_fn_summaries->get (callee_node)->fp_expressions))
     ret = false;
 
+  /* At this point we cannot identify whether arch or tune setting
+     comes from target attribute or not. So the most conservative way
+     is to allow the callee that uses default arch and tune string to
+     be inlined.  */
+  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
+	   && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
+    ret = true;
+
+  /* See if arch, tune, etc. are the same.  */
+  else if (caller_opts->arch != callee_opts->arch)
+    ret = false;
+
+  else if (!always_inline && caller_opts->tune != callee_opts->tune)
+    ret = false;
+
   else if (!always_inline
 	   && caller_opts->branch_cost != callee_opts->branch_cost)
     ret = false;
diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
new file mode 100644
index 00000000000..53db1600ce5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O3 -march=x86-64" } */
+/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
+
+float callee (float a, float b, float c, float d,
+	      float e, float f, float g, float h)
+{
+  return a * b + c * d + e * f + g + h + a * c + b * c
+    + a * d + b * e + a * f + c * h + 
+    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
+}
+
+__attribute__((target_clones("default","arch=icelake-server")))
+void caller (int n, float *a,
+	     float c1, float c2, float c3,
+	     float c4, float c5, float c6,
+	     float c7)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
+    }
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] i386: Inline function with default arch/tune to caller
  2023-07-04  3:12 [PATCH V2] i386: Inline function with default arch/tune to caller Hongyu Wang
@ 2023-07-04  6:18 ` Uros Bizjak
  2023-07-04  8:25   ` Hongyu Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2023-07-04  6:18 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: gcc-patches, hongtao.liu

On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> For function with different target attributes, current logic rejects to
> inline the callee when any arch or tune is mismatched. Relax the
> condition to allow callee with default arch/tune to be inlined.
>
> Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_can_inline_p): If callee has
>         default arch=x86-64 and tune=generic, do not block the
>         inlining to its caller.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/inline_target_clones.c: New test.

OK.

In a follow-up patch, can you please document inlining rules involving
-march and -mtune to "x86 Function Attributes" section? Currently, the
inlining rules at the end of "target function attribute" section does
not even mention -march and -mtune. Maybe a subsubsection "Inlining
rules" should be added (like AArch64 has) to mention that only default
arch and tune are inlined by default (but inline can be forced with
always_inline for different mtune flags).

Looking at the above, perhaps inlining of different arches can also be
forced with always_inline? This would allow developers some control of
inlining, and would not be surprising.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc                       | 22 +++++++++++------
>  .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
>  2 files changed, 39 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 8989985700a..4741c9b5364 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
>                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
>      ret = false;
>
> -  /* See if arch, tune, etc. are the same.  */
> -  else if (caller_opts->arch != callee_opts->arch)
> -    ret = false;
> -
> -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> -    ret = false;
> -
>    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
>            /* If the calle doesn't use FP expressions differences in
>               ix86_fpmath can be ignored.  We are called from FEs
> @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
>                || ipa_fn_summaries->get (callee_node)->fp_expressions))
>      ret = false;
>
> +  /* At this point we cannot identify whether arch or tune setting
> +     comes from target attribute or not. So the most conservative way
> +     is to allow the callee that uses default arch and tune string to
> +     be inlined.  */
> +  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
> +          && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
> +    ret = true;
> +
> +  /* See if arch, tune, etc. are the same.  */
> +  else if (caller_opts->arch != callee_opts->arch)
> +    ret = false;
> +
> +  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> +    ret = false;
> +
>    else if (!always_inline
>            && caller_opts->branch_cost != callee_opts->branch_cost)
>      ret = false;
> diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> new file mode 100644
> index 00000000000..53db1600ce5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-options "-O3 -march=x86-64" } */
> +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
> +
> +float callee (float a, float b, float c, float d,
> +             float e, float f, float g, float h)
> +{
> +  return a * b + c * d + e * f + g + h + a * c + b * c
> +    + a * d + b * e + a * f + c * h +
> +    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> +}
> +
> +__attribute__((target_clones("default","arch=icelake-server")))
> +void caller (int n, float *a,
> +            float c1, float c2, float c3,
> +            float c4, float c5, float c6,
> +            float c7)
> +{
> +  for (int i = 0; i < n; i++)
> +    {
> +      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
> +    }
> +}
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] i386: Inline function with default arch/tune to caller
  2023-07-04  6:18 ` Uros Bizjak
@ 2023-07-04  8:25   ` Hongyu Wang
  2023-07-04  8:57     ` Uros Bizjak
  0 siblings, 1 reply; 5+ messages in thread
From: Hongyu Wang @ 2023-07-04  8:25 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongyu Wang, gcc-patches, hongtao.liu

> In a follow-up patch, can you please document inlining rules involving
> -march and -mtune to "x86 Function Attributes" section? Currently, the
> inlining rules at the end of "target function attribute" section does
> not even mention -march and -mtune. Maybe a subsubsection "Inlining
> rules" should be added (like AArch64 has) to mention that only default
> arch and tune are inlined by default (but inline can be forced with
> always_inline for different mtune flags).

The document has below at the end of 'target (OPTIONS)' section

On the x86, the inliner does not inline a function that has
different target options than the caller, unless the callee
has a subset of the target options of the caller.  For example
a function declared with 'target("sse3")' can inline a
function with 'target("sse2")', since '-msse3' implies
'-msse2'.

Do we need to move this part to a new section and combine with -march and
-mtune rule description to the new subsubsection?

> Looking at the above, perhaps inlining of different arches can also be
> forced with always_inline? This would allow developers some control of
> inlining, and would not be surprising.

If so, I'd like to add the always_inline change on arch to current
patch and leave the
document change alone in the next patch.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道:
>
> On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Hi,
> >
> > For function with different target attributes, current logic rejects to
> > inline the callee when any arch or tune is mismatched. Relax the
> > condition to allow callee with default arch/tune to be inlined.
> >
> > Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_can_inline_p): If callee has
> >         default arch=x86-64 and tune=generic, do not block the
> >         inlining to its caller.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/inline_target_clones.c: New test.
>
> OK.
>
> In a follow-up patch, can you please document inlining rules involving
> -march and -mtune to "x86 Function Attributes" section? Currently, the
> inlining rules at the end of "target function attribute" section does
> not even mention -march and -mtune. Maybe a subsubsection "Inlining
> rules" should be added (like AArch64 has) to mention that only default
> arch and tune are inlined by default (but inline can be forced with
> always_inline for different mtune flags).
>
> Looking at the above, perhaps inlining of different arches can also be
> forced with always_inline? This would allow developers some control of
> inlining, and would not be surprising.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.cc                       | 22 +++++++++++------
> >  .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
> >  2 files changed, 39 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 8989985700a..4741c9b5364 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
> >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> >      ret = false;
> >
> > -  /* See if arch, tune, etc. are the same.  */
> > -  else if (caller_opts->arch != callee_opts->arch)
> > -    ret = false;
> > -
> > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > -    ret = false;
> > -
> >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> >            /* If the calle doesn't use FP expressions differences in
> >               ix86_fpmath can be ignored.  We are called from FEs
> > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
> >                || ipa_fn_summaries->get (callee_node)->fp_expressions))
> >      ret = false;
> >
> > +  /* At this point we cannot identify whether arch or tune setting
> > +     comes from target attribute or not. So the most conservative way
> > +     is to allow the callee that uses default arch and tune string to
> > +     be inlined.  */
> > +  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
> > +          && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
> > +    ret = true;
> > +
> > +  /* See if arch, tune, etc. are the same.  */
> > +  else if (caller_opts->arch != callee_opts->arch)
> > +    ret = false;
> > +
> > +  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > +    ret = false;
> > +
> >    else if (!always_inline
> >            && caller_opts->branch_cost != callee_opts->branch_cost)
> >      ret = false;
> > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > new file mode 100644
> > index 00000000000..53db1600ce5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-ifunc "" } */
> > +/* { dg-options "-O3 -march=x86-64" } */
> > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
> > +
> > +float callee (float a, float b, float c, float d,
> > +             float e, float f, float g, float h)
> > +{
> > +  return a * b + c * d + e * f + g + h + a * c + b * c
> > +    + a * d + b * e + a * f + c * h +
> > +    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> > +}
> > +
> > +__attribute__((target_clones("default","arch=icelake-server")))
> > +void caller (int n, float *a,
> > +            float c1, float c2, float c3,
> > +            float c4, float c5, float c6,
> > +            float c7)
> > +{
> > +  for (int i = 0; i < n; i++)
> > +    {
> > +      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
> > +    }
> > +}
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] i386: Inline function with default arch/tune to caller
  2023-07-04  8:25   ` Hongyu Wang
@ 2023-07-04  8:57     ` Uros Bizjak
  2023-07-06  0:37       ` Hongyu Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2023-07-04  8:57 UTC (permalink / raw)
  To: Hongyu Wang; +Cc: Hongyu Wang, gcc-patches, hongtao.liu

On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > In a follow-up patch, can you please document inlining rules involving
> > -march and -mtune to "x86 Function Attributes" section? Currently, the
> > inlining rules at the end of "target function attribute" section does
> > not even mention -march and -mtune. Maybe a subsubsection "Inlining
> > rules" should be added (like AArch64 has) to mention that only default
> > arch and tune are inlined by default (but inline can be forced with
> > always_inline for different mtune flags).
>
> The document has below at the end of 'target (OPTIONS)' section
>
> On the x86, the inliner does not inline a function that has
> different target options than the caller, unless the callee
> has a subset of the target options of the caller.  For example
> a function declared with 'target("sse3")' can inline a
> function with 'target("sse2")', since '-msse3' implies
> '-msse2'.
>
> Do we need to move this part to a new section and combine with -march and
> -mtune rule description to the new subsubsection?
>
> > Looking at the above, perhaps inlining of different arches can also be
> > forced with always_inline? This would allow developers some control of
> > inlining, and would not be surprising.
>
> If so, I'd like to add the always_inline change on arch to current
> patch and leave the
> document change alone in the next patch.

Yes, this is OK.

Thanks,
Uros.
>
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道:
> >
> > On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > >
> > > Hi,
> > >
> > > For function with different target attributes, current logic rejects to
> > > inline the callee when any arch or tune is mismatched. Relax the
> > > condition to allow callee with default arch/tune to be inlined.
> > >
> > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386.cc (ix86_can_inline_p): If callee has
> > >         default arch=x86-64 and tune=generic, do not block the
> > >         inlining to its caller.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/inline_target_clones.c: New test.
> >
> > OK.
> >
> > In a follow-up patch, can you please document inlining rules involving
> > -march and -mtune to "x86 Function Attributes" section? Currently, the
> > inlining rules at the end of "target function attribute" section does
> > not even mention -march and -mtune. Maybe a subsubsection "Inlining
> > rules" should be added (like AArch64 has) to mention that only default
> > arch and tune are inlined by default (but inline can be forced with
> > always_inline for different mtune flags).
> >
> > Looking at the above, perhaps inlining of different arches can also be
> > forced with always_inline? This would allow developers some control of
> > inlining, and would not be surprising.
> >
> > Thanks,
> > Uros.
> >
> > > ---
> > >  gcc/config/i386/i386.cc                       | 22 +++++++++++------
> > >  .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
> > >  2 files changed, 39 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 8989985700a..4741c9b5364 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
> > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > >      ret = false;
> > >
> > > -  /* See if arch, tune, etc. are the same.  */
> > > -  else if (caller_opts->arch != callee_opts->arch)
> > > -    ret = false;
> > > -
> > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > -    ret = false;
> > > -
> > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > >            /* If the calle doesn't use FP expressions differences in
> > >               ix86_fpmath can be ignored.  We are called from FEs
> > > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
> > >                || ipa_fn_summaries->get (callee_node)->fp_expressions))
> > >      ret = false;
> > >
> > > +  /* At this point we cannot identify whether arch or tune setting
> > > +     comes from target attribute or not. So the most conservative way
> > > +     is to allow the callee that uses default arch and tune string to
> > > +     be inlined.  */
> > > +  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
> > > +          && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
> > > +    ret = true;
> > > +
> > > +  /* See if arch, tune, etc. are the same.  */
> > > +  else if (caller_opts->arch != callee_opts->arch)
> > > +    ret = false;
> > > +
> > > +  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > +    ret = false;
> > > +
> > >    else if (!always_inline
> > >            && caller_opts->branch_cost != callee_opts->branch_cost)
> > >      ret = false;
> > > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > > new file mode 100644
> > > index 00000000000..53db1600ce5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-ifunc "" } */
> > > +/* { dg-options "-O3 -march=x86-64" } */
> > > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
> > > +
> > > +float callee (float a, float b, float c, float d,
> > > +             float e, float f, float g, float h)
> > > +{
> > > +  return a * b + c * d + e * f + g + h + a * c + b * c
> > > +    + a * d + b * e + a * f + c * h +
> > > +    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> > > +}
> > > +
> > > +__attribute__((target_clones("default","arch=icelake-server")))
> > > +void caller (int n, float *a,
> > > +            float c1, float c2, float c3,
> > > +            float c4, float c5, float c6,
> > > +            float c7)
> > > +{
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
> > > +    }
> > > +}
> > > --
> > > 2.31.1
> > >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH V2] i386: Inline function with default arch/tune to caller
  2023-07-04  8:57     ` Uros Bizjak
@ 2023-07-06  0:37       ` Hongyu Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Hongyu Wang @ 2023-07-06  0:37 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Hongyu Wang, gcc-patches, hongtao.liu

[-- Attachment #1: Type: text/plain, Size: 6962 bytes --]

Thanks, this is the updated patch I'm going to check in.

Uros Bizjak <ubizjak@gmail.com> 于2023年7月4日周二 16:57写道:
>
> On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> >
> > > In a follow-up patch, can you please document inlining rules involving
> > > -march and -mtune to "x86 Function Attributes" section? Currently, the
> > > inlining rules at the end of "target function attribute" section does
> > > not even mention -march and -mtune. Maybe a subsubsection "Inlining
> > > rules" should be added (like AArch64 has) to mention that only default
> > > arch and tune are inlined by default (but inline can be forced with
> > > always_inline for different mtune flags).
> >
> > The document has below at the end of 'target (OPTIONS)' section
> >
> > On the x86, the inliner does not inline a function that has
> > different target options than the caller, unless the callee
> > has a subset of the target options of the caller.  For example
> > a function declared with 'target("sse3")' can inline a
> > function with 'target("sse2")', since '-msse3' implies
> > '-msse2'.
> >
> > Do we need to move this part to a new section and combine with -march and
> > -mtune rule description to the new subsubsection?
> >
> > > Looking at the above, perhaps inlining of different arches can also be
> > > forced with always_inline? This would allow developers some control of
> > > inlining, and would not be surprising.
> >
> > If so, I'd like to add the always_inline change on arch to current
> > patch and leave the
> > document change alone in the next patch.
>
> Yes, this is OK.
>
> Thanks,
> Uros.
> >
> > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道:
> > >
> > > On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > For function with different target attributes, current logic rejects to
> > > > inline the callee when any arch or tune is mismatched. Relax the
> > > > condition to allow callee with default arch/tune to be inlined.
> > > >
> > > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
> > > >
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/i386/i386.cc (ix86_can_inline_p): If callee has
> > > >         default arch=x86-64 and tune=generic, do not block the
> > > >         inlining to its caller.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.target/i386/inline_target_clones.c: New test.
> > >
> > > OK.
> > >
> > > In a follow-up patch, can you please document inlining rules involving
> > > -march and -mtune to "x86 Function Attributes" section? Currently, the
> > > inlining rules at the end of "target function attribute" section does
> > > not even mention -march and -mtune. Maybe a subsubsection "Inlining
> > > rules" should be added (like AArch64 has) to mention that only default
> > > arch and tune are inlined by default (but inline can be forced with
> > > always_inline for different mtune flags).
> > >
> > > Looking at the above, perhaps inlining of different arches can also be
> > > forced with always_inline? This would allow developers some control of
> > > inlining, and would not be surprising.
> > >
> > > Thanks,
> > > Uros.
> > >
> > > > ---
> > > >  gcc/config/i386/i386.cc                       | 22 +++++++++++------
> > > >  .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
> > > >  2 files changed, 39 insertions(+), 7 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index 8989985700a..4741c9b5364 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
> > > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > > >      ret = false;
> > > >
> > > > -  /* See if arch, tune, etc. are the same.  */
> > > > -  else if (caller_opts->arch != callee_opts->arch)
> > > > -    ret = false;
> > > > -
> > > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > > -    ret = false;
> > > > -
> > > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > >            /* If the calle doesn't use FP expressions differences in
> > > >               ix86_fpmath can be ignored.  We are called from FEs
> > > > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
> > > >                || ipa_fn_summaries->get (callee_node)->fp_expressions))
> > > >      ret = false;
> > > >
> > > > +  /* At this point we cannot identify whether arch or tune setting
> > > > +     comes from target attribute or not. So the most conservative way
> > > > +     is to allow the callee that uses default arch and tune string to
> > > > +     be inlined.  */
> > > > +  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
> > > > +          && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
> > > > +    ret = true;
> > > > +
> > > > +  /* See if arch, tune, etc. are the same.  */
> > > > +  else if (caller_opts->arch != callee_opts->arch)
> > > > +    ret = false;
> > > > +
> > > > +  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > > +    ret = false;
> > > > +
> > > >    else if (!always_inline
> > > >            && caller_opts->branch_cost != callee_opts->branch_cost)
> > > >      ret = false;
> > > > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > > > new file mode 100644
> > > > index 00000000000..53db1600ce5
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-ifunc "" } */
> > > > +/* { dg-options "-O3 -march=x86-64" } */
> > > > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
> > > > +
> > > > +float callee (float a, float b, float c, float d,
> > > > +             float e, float f, float g, float h)
> > > > +{
> > > > +  return a * b + c * d + e * f + g + h + a * c + b * c
> > > > +    + a * d + b * e + a * f + c * h +
> > > > +    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> > > > +}
> > > > +
> > > > +__attribute__((target_clones("default","arch=icelake-server")))
> > > > +void caller (int n, float *a,
> > > > +            float c1, float c2, float c3,
> > > > +            float c4, float c5, float c6,
> > > > +            float c7)
> > > > +{
> > > > +  for (int i = 0; i < n; i++)
> > > > +    {
> > > > +      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
> > > > +    }
> > > > +}
> > > > --
> > > > 2.31.1
> > > >

[-- Attachment #2: 0001-i386-Inline-function-with-default-arch-tune-to-calle.patch --]
[-- Type: text/x-patch, Size: 4992 bytes --]

From 2f62b4e7a06f4e1b3488de394bd2c5dac4f19741 Mon Sep 17 00:00:00 2001
From: Hongyu Wang <hongyu.wang@intel.com>
Date: Fri, 30 Jun 2023 09:44:56 +0800
Subject: [PATCH] i386: Inline function with default arch/tune to caller

For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to allow callee with default arch/tune to be inlined.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_can_inline_p): If callee has
	default arch=x86-64 and tune=generic, do not block the
	inlining to its caller. Also allow callee with different
	arch= to be inlined if it has always_inline attribute and
	it's ISA is subset of caller's.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/inline_attr_arch.c: New test.
	* gcc.target/i386/inline_target_clones.c: Ditto.
---
 gcc/config/i386/i386.cc                       | 24 ++++++++++++------
 .../gcc.target/i386/inline_attr_arch.c        | 25 +++++++++++++++++++
 .../gcc.target/i386/inline_target_clones.c    | 24 ++++++++++++++++++
 3 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/inline_attr_arch.c
 create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8989985700a..7fb2b92b075 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
 	       != (callee_opts->x_target_flags & ~always_inline_safe_mask))
     ret = false;
 
-  /* See if arch, tune, etc. are the same.  */
-  else if (caller_opts->arch != callee_opts->arch)
-    ret = false;
-
-  else if (!always_inline && caller_opts->tune != callee_opts->tune)
-    ret = false;
-
   else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
 	   /* If the calle doesn't use FP expressions differences in
 	      ix86_fpmath can be ignored.  We are called from FEs
@@ -622,6 +615,23 @@ ix86_can_inline_p (tree caller, tree callee)
 	       || ipa_fn_summaries->get (callee_node)->fp_expressions))
     ret = false;
 
+  /* At this point we cannot identify whether arch or tune setting
+     comes from target attribute or not. So the most conservative way
+     is to allow the callee that uses default arch and tune string to
+     be inlined.  */
+  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
+	   && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
+    ret = true;
+
+  /* See if arch, tune, etc. are the same. As previous ISA flags already
+     checks if callee's ISA is subset of caller's, do not block 
+     always_inline attribute for callee even it has different arch. */
+  else if (!always_inline && caller_opts->arch != callee_opts->arch)
+    ret = false;
+
+  else if (!always_inline && caller_opts->tune != callee_opts->tune)
+    ret = false;
+
   else if (!always_inline
 	   && caller_opts->branch_cost != callee_opts->branch_cost)
     ret = false;
diff --git a/gcc/testsuite/gcc.target/i386/inline_attr_arch.c b/gcc/testsuite/gcc.target/i386/inline_attr_arch.c
new file mode 100644
index 00000000000..1fab485922a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/inline_attr_arch.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O3 -march=x86-64" } */
+/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
+
+__attribute__((always_inline,target("arch=haswell")))
+inline float callee (float a, float b, float c, float d,
+	      float e, float f, float g, float h)
+{
+  return a * b + c * d + e * f + g + h + a * c + b * c
+    + a * d + b * e + a * f + c * h + 
+    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
+}
+
+__attribute__((target("arch=icelake-server")))
+void caller (int n, float *a,
+	     float c1, float c2, float c3,
+	     float c4, float c5, float c6,
+	     float c7)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
+    }
+}
diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
new file mode 100644
index 00000000000..53db1600ce5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O3 -march=x86-64" } */
+/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
+
+float callee (float a, float b, float c, float d,
+	      float e, float f, float g, float h)
+{
+  return a * b + c * d + e * f + g + h + a * c + b * c
+    + a * d + b * e + a * f + c * h + 
+    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
+}
+
+__attribute__((target_clones("default","arch=icelake-server")))
+void caller (int n, float *a,
+	     float c1, float c2, float c3,
+	     float c4, float c5, float c6,
+	     float c7)
+{
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
+    }
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-06  0:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-04  3:12 [PATCH V2] i386: Inline function with default arch/tune to caller Hongyu Wang
2023-07-04  6:18 ` Uros Bizjak
2023-07-04  8:25   ` Hongyu Wang
2023-07-04  8:57     ` Uros Bizjak
2023-07-06  0:37       ` Hongyu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).