public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
@ 2018-10-25 23:27 Kugan Vivekanandarajah
  2018-10-26  5:12 ` Jeff Law
  0 siblings, 1 reply; 9+ messages in thread
From: Kugan Vivekanandarajah @ 2018-10-25 23:27 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

Hi,

PR87528 showed a case where libgcc generated popcount is causing
regression for Skylake.
We also have PR86677 where kernel build is failing because the kernel
does not use the libgcc (when backend is not defining popcount
pattern).  While I agree that the kernel should implement its own
functionality when it is not using the libgcc, I am afraid that the
implementation can have the same performance issues reported for
Skylake in PR87528.

Therefore, I would like to propose that we disable popcount detection
when we don't have a pattern for that. The attached patch (based on
previous discussions) does this.

Bootstrapped and regression tested on x86_64-linux-gnu with no new
regressions. We need to disable the popcount* testcases. I will have
to define a effective_target_with_popcount in
gcc/testsuite/lib/target-supports.exp if this patch is OK?

Thanks,
Kugan


gcc/ChangeLog:

2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
    as expensive when backend does not define it.


gcc/testsuite/ChangeLog:

2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gcc.target/aarch64/popcount4.c: New test.

[-- Attachment #2: 0001-fix-kernel-build.patch --]
[-- Type: application/x-patch, Size: 2458 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-10-25 23:27 [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it Kugan Vivekanandarajah
@ 2018-10-26  5:12 ` Jeff Law
  2018-10-26  9:18   ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Law @ 2018-10-26  5:12 UTC (permalink / raw)
  To: Kugan Vivekanandarajah, GCC Patches

On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> Hi,
> 
> PR87528 showed a case where libgcc generated popcount is causing
> regression for Skylake.
> We also have PR86677 where kernel build is failing because the kernel
> does not use the libgcc (when backend is not defining popcount
> pattern).  While I agree that the kernel should implement its own
> functionality when it is not using the libgcc, I am afraid that the
> implementation can have the same performance issues reported for
> Skylake in PR87528.
> 
> Therefore, I would like to propose that we disable popcount detection
> when we don't have a pattern for that. The attached patch (based on
> previous discussions) does this.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions. We need to disable the popcount* testcases. I will have
> to define a effective_target_with_popcount in
> gcc/testsuite/lib/target-supports.exp if this patch is OK?
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> 
>     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
>     as expensive when backend does not define it.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> 
>     * gcc.target/aarch64/popcount4.c: New test.
> 
FWIW, I've been disabling by checking direct_optab_handler elsewhere
(number_of_iterations_popcount) in my tester.  It may in fact be an old
patch from you.

Richi argued that it's the kernel team's responsibility to provide a
popcount since they don't link with libgcc.  And I'm generally in
agreement with that position, though it does tend to generate some
friction with the kernel developers.  We also run the real risk of GCC 9
not being able to build the kernel which, IMHO, would be a disaster from
a PR standpoint.

I'd like to hear from others here.  I fully realize we're beyond the
realm of what is strictly technically correct here from a review standpoint.

Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-10-26  5:12 ` Jeff Law
@ 2018-10-26  9:18   ` Richard Biener
  2018-10-29  8:40     ` Kugan Vivekanandarajah
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2018-10-26  9:18 UTC (permalink / raw)
  To: Jeff Law, Richard Sandiford; +Cc: Kugan Vivekanandarajah, GCC Patches

On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
>
> On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > Hi,
> >
> > PR87528 showed a case where libgcc generated popcount is causing
> > regression for Skylake.
> > We also have PR86677 where kernel build is failing because the kernel
> > does not use the libgcc (when backend is not defining popcount
> > pattern).  While I agree that the kernel should implement its own
> > functionality when it is not using the libgcc, I am afraid that the
> > implementation can have the same performance issues reported for
> > Skylake in PR87528.
> >
> > Therefore, I would like to propose that we disable popcount detection
> > when we don't have a pattern for that. The attached patch (based on
> > previous discussions) does this.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > regressions. We need to disable the popcount* testcases. I will have
> > to define a effective_target_with_popcount in
> > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > Thanks,
> > Kugan
> >
> >
> > gcc/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> >
> >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> >     as expensive when backend does not define it.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> >
> >     * gcc.target/aarch64/popcount4.c: New test.
> >
> FWIW, I've been disabling by checking direct_optab_handler elsewhere
> (number_of_iterations_popcount) in my tester.  It may in fact be an old
> patch from you.
>
> Richi argued that it's the kernel team's responsibility to provide a
> popcount since they don't link with libgcc.  And I'm generally in
> agreement with that position, though it does tend to generate some
> friction with the kernel developers.  We also run the real risk of GCC 9
> not being able to build the kernel which, IMHO, would be a disaster from
> a PR standpoint.
>
> I'd like to hear from others here.  I fully realize we're beyond the
> realm of what is strictly technically correct here from a review standpoint.

As said final value replacement to a library call is probably not wanted
for optimization purpose, so adjusting expression_expensive_p is OK with
me.  It might not fully solve the (non-)issue in case another optimization pass
chooses to materialize niter computation result.

Few comments on the patch:

+      tree fndecl = get_callee_fndecl (expr);
+
+      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+       {
+         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));

  combined_fn cfn = gimple_call_combined_fn (expr);
  switch (cfn)
    {
...

cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
offline but eventually he knows whether there is a better way to query

+           CASE_CFN_POPCOUNT:
+             /* Check if opcode for popcount is available.  */
+             if (optab_handler (popcount_optab,
+                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
(expr, 0))))
+                 == CODE_FOR_nothing)
+               return true;

note that we currently generate builtin calls rather than IFN calls
(when a direct
optab is supported).

Another comment on the patch is that you probably have to adjust existing
popcount testcases to add architecture specific flags enabling suport for
the instructions, otherwise you won't see loop replacement.

Also I think that the expression is only expensive (for final value
replacement!)
if you consider optimizing for speed.  When optimizing for size getting rid of
the loop is probably beneificial unconditionally.  That would leave the
possibility to switch said testcases to -Os.  It would require adding a
bool size_p flag to expression_expensive and passing down
optimize_loop_for_size_p ().

_NOTE_ that expression_expensive_p is also used by IVOPTs and there
replacing sth with an expression based on the niter analysis result doesn't
mean we get rid of the loop (but only of an IV), so maybe that reasoning
doesn't apply there.

Richard.

> Jeff
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-10-26  9:18   ` Richard Biener
@ 2018-10-29  8:40     ` Kugan Vivekanandarajah
  2018-10-29 14:32       ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Kugan Vivekanandarajah @ 2018-10-29  8:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, rdsandiford, GCC Patches

Hi Richard and Jeff,

Thanks for your comments.

On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> >
> > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > Hi,
> > >
> > > PR87528 showed a case where libgcc generated popcount is causing
> > > regression for Skylake.
> > > We also have PR86677 where kernel build is failing because the kernel
> > > does not use the libgcc (when backend is not defining popcount
> > > pattern).  While I agree that the kernel should implement its own
> > > functionality when it is not using the libgcc, I am afraid that the
> > > implementation can have the same performance issues reported for
> > > Skylake in PR87528.
> > >
> > > Therefore, I would like to propose that we disable popcount detection
> > > when we don't have a pattern for that. The attached patch (based on
> > > previous discussions) does this.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > regressions. We need to disable the popcount* testcases. I will have
> > > to define a effective_target_with_popcount in
> > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > Thanks,
> > > Kugan
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > >
> > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > >     as expensive when backend does not define it.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > >
> > >     * gcc.target/aarch64/popcount4.c: New test.
> > >
> > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > patch from you.
> >
> > Richi argued that it's the kernel team's responsibility to provide a
> > popcount since they don't link with libgcc.  And I'm generally in
> > agreement with that position, though it does tend to generate some
> > friction with the kernel developers.  We also run the real risk of GCC 9
> > not being able to build the kernel which, IMHO, would be a disaster from
> > a PR standpoint.
> >
> > I'd like to hear from others here.  I fully realize we're beyond the
> > realm of what is strictly technically correct here from a review standpoint.
>
> As said final value replacement to a library call is probably not wanted
> for optimization purpose, so adjusting expression_expensive_p is OK with
> me.  It might not fully solve the (non-)issue in case another optimization pass
> chooses to materialize niter computation result.
>
> Few comments on the patch:
>
> +      tree fndecl = get_callee_fndecl (expr);
> +
> +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> +       {
> +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
>
>   combined_fn cfn = gimple_call_combined_fn (expr);
>   switch (cfn)
>     {

Did you mean:
combined_fn cfn = get_call_combined_fn (expr);

> ...
>
> cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> offline but eventually he knows whether there is a better way to query
>
> +           CASE_CFN_POPCOUNT:
> +             /* Check if opcode for popcount is available.  */
> +             if (optab_handler (popcount_optab,
> +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> (expr, 0))))
> +                 == CODE_FOR_nothing)
> +               return true;
>
> note that we currently generate builtin calls rather than IFN calls
> (when a direct
> optab is supported).
>
> Another comment on the patch is that you probably have to adjust existing
> popcount testcases to add architecture specific flags enabling suport for
> the instructions, otherwise you won't see loop replacement.
Indeed.
In lib/target-supports.exp, I will try to add support for
check_effective_target_popcount_long.
When I grep for the popcount pattern in md files, I see it is defined for:

tilegx
tilepro
alpha
aarch64  when TARGET_SIMD
ia64
rs6000
i386  when TARGET_POPCOUNT
popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
s390  when TARGET_Z916 && TARGET_64BIT
sparc when TARGET_POPC
arm when TARGET_NEON
mips when ISA_HAS_POP
spu
avr

I can check these targets with the condition.
Another possibility is to check with a sample code and see if we are
getting a libcall in the asm. Not sure if that is straightforward. Are
there any example for such.

We could also move these test to a primary target that is tested often
tested which also defines popcount pattern. I dont think these tests
change for targets and if we can test in one target that could be
enough,

I am happy to implement what is appropriate.

Thanks,
Kugan



>
> Also I think that the expression is only expensive (for final value
> replacement!)
> if you consider optimizing for speed.  When optimizing for size getting rid of
> the loop is probably beneificial unconditionally.  That would leave the
> possibility to switch said testcases to -Os.  It would require adding a
> bool size_p flag to expression_expensive and passing down
> optimize_loop_for_size_p ().
>
> _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> replacing sth with an expression based on the niter analysis result doesn't
> mean we get rid of the loop (but only of an IV), so maybe that reasoning
> doesn't apply there.
>
> Richard.
>
> > Jeff
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-10-29  8:40     ` Kugan Vivekanandarajah
@ 2018-10-29 14:32       ` Richard Biener
  2018-11-02  9:02         ` Kugan Vivekanandarajah
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2018-10-29 14:32 UTC (permalink / raw)
  To: Kugan Vivekanandarajah; +Cc: Jeff Law, Richard Sandiford, GCC Patches

On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>
> Hi Richard and Jeff,
>
> Thanks for your comments.
>
> On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > >
> > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > Hi,
> > > >
> > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > regression for Skylake.
> > > > We also have PR86677 where kernel build is failing because the kernel
> > > > does not use the libgcc (when backend is not defining popcount
> > > > pattern).  While I agree that the kernel should implement its own
> > > > functionality when it is not using the libgcc, I am afraid that the
> > > > implementation can have the same performance issues reported for
> > > > Skylake in PR87528.
> > > >
> > > > Therefore, I would like to propose that we disable popcount detection
> > > > when we don't have a pattern for that. The attached patch (based on
> > > > previous discussions) does this.
> > > >
> > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > regressions. We need to disable the popcount* testcases. I will have
> > > > to define a effective_target_with_popcount in
> > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > Thanks,
> > > > Kugan
> > > >
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > >
> > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > >     as expensive when backend does not define it.
> > > >
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > >
> > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > >
> > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > patch from you.
> > >
> > > Richi argued that it's the kernel team's responsibility to provide a
> > > popcount since they don't link with libgcc.  And I'm generally in
> > > agreement with that position, though it does tend to generate some
> > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > not being able to build the kernel which, IMHO, would be a disaster from
> > > a PR standpoint.
> > >
> > > I'd like to hear from others here.  I fully realize we're beyond the
> > > realm of what is strictly technically correct here from a review standpoint.
> >
> > As said final value replacement to a library call is probably not wanted
> > for optimization purpose, so adjusting expression_expensive_p is OK with
> > me.  It might not fully solve the (non-)issue in case another optimization pass
> > chooses to materialize niter computation result.
> >
> > Few comments on the patch:
> >
> > +      tree fndecl = get_callee_fndecl (expr);
> > +
> > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > +       {
> > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> >
> >   combined_fn cfn = gimple_call_combined_fn (expr);
> >   switch (cfn)
> >     {
>
> Did you mean:
> combined_fn cfn = get_call_combined_fn (expr);

Yes.

> > ...
> >
> > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> > offline but eventually he knows whether there is a better way to query
> >
> > +           CASE_CFN_POPCOUNT:
> > +             /* Check if opcode for popcount is available.  */
> > +             if (optab_handler (popcount_optab,
> > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > (expr, 0))))
> > +                 == CODE_FOR_nothing)
> > +               return true;
> >
> > note that we currently generate builtin calls rather than IFN calls
> > (when a direct
> > optab is supported).
> >
> > Another comment on the patch is that you probably have to adjust existing
> > popcount testcases to add architecture specific flags enabling suport for
> > the instructions, otherwise you won't see loop replacement.
> Indeed.
> In lib/target-supports.exp, I will try to add support for
> check_effective_target_popcount_long.
> When I grep for the popcount pattern in md files, I see it is defined for:
>
> tilegx
> tilepro
> alpha
> aarch64  when TARGET_SIMD
> ia64
> rs6000
> i386  when TARGET_POPCOUNT
> popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> s390  when TARGET_Z916 && TARGET_64BIT
> sparc when TARGET_POPC
> arm when TARGET_NEON
> mips when ISA_HAS_POP
> spu
> avr
>
> I can check these targets with the condition.
> Another possibility is to check with a sample code and see if we are
> getting a libcall in the asm. Not sure if that is straightforward. Are
> there any example for such.

You could try linking w/o libgcc ...

> We could also move these test to a primary target that is tested often
> tested which also defines popcount pattern. I dont think these tests
> change for targets and if we can test in one target that could be
> enough,
>
> I am happy to implement what is appropriate.

How about the -Os idea?

Richard.

> Thanks,
> Kugan
>
>
>
> >
> > Also I think that the expression is only expensive (for final value
> > replacement!)
> > if you consider optimizing for speed.  When optimizing for size getting rid of
> > the loop is probably beneificial unconditionally.  That would leave the
> > possibility to switch said testcases to -Os.  It would require adding a
> > bool size_p flag to expression_expensive and passing down
> > optimize_loop_for_size_p ().
> >
> > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > replacing sth with an expression based on the niter analysis result doesn't
> > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > doesn't apply there.
> >
> > Richard.
> >
> > > Jeff
> > >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-10-29 14:32       ` Richard Biener
@ 2018-11-02  9:02         ` Kugan Vivekanandarajah
  2018-11-07 13:04           ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Kugan Vivekanandarajah @ 2018-11-02  9:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, rdsandiford, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 7003 bytes --]

Hi Richard,
Thanks for the review.
On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> <kugan.vivekanandarajah@linaro.org> wrote:
> >
> > Hi Richard and Jeff,
> >
> > Thanks for your comments.
> >
> > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > > >
> > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > Hi,
> > > > >
> > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > regression for Skylake.
> > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > does not use the libgcc (when backend is not defining popcount
> > > > > pattern).  While I agree that the kernel should implement its own
> > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > implementation can have the same performance issues reported for
> > > > > Skylake in PR87528.
> > > > >
> > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > previous discussions) does this.
> > > > >
> > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > to define a effective_target_with_popcount in
> > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > Thanks,
> > > > > Kugan
> > > > >
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > >
> > > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > > >     as expensive when backend does not define it.
> > > > >
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > >
> > > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > > >
> > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > > patch from you.
> > > >
> > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > agreement with that position, though it does tend to generate some
> > > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > a PR standpoint.
> > > >
> > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > realm of what is strictly technically correct here from a review standpoint.
> > >
> > > As said final value replacement to a library call is probably not wanted
> > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > me.  It might not fully solve the (non-)issue in case another optimization pass
> > > chooses to materialize niter computation result.
> > >
> > > Few comments on the patch:
> > >
> > > +      tree fndecl = get_callee_fndecl (expr);
> > > +
> > > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > +       {
> > > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > >
> > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > >   switch (cfn)
> > >     {
> >
> > Did you mean:
> > combined_fn cfn = get_call_combined_fn (expr);
>
> Yes.
>
> > > ...
> > >
> > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> > > offline but eventually he knows whether there is a better way to query
> > >
> > > +           CASE_CFN_POPCOUNT:
> > > +             /* Check if opcode for popcount is available.  */
> > > +             if (optab_handler (popcount_optab,
> > > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > (expr, 0))))
> > > +                 == CODE_FOR_nothing)
> > > +               return true;
> > >
> > > note that we currently generate builtin calls rather than IFN calls
> > > (when a direct
> > > optab is supported).
> > >
> > > Another comment on the patch is that you probably have to adjust existing
> > > popcount testcases to add architecture specific flags enabling suport for
> > > the instructions, otherwise you won't see loop replacement.
> > Indeed.
> > In lib/target-supports.exp, I will try to add support for
> > check_effective_target_popcount_long.
> > When I grep for the popcount pattern in md files, I see it is defined for:
> >
> > tilegx
> > tilepro
> > alpha
> > aarch64  when TARGET_SIMD
> > ia64
> > rs6000
> > i386  when TARGET_POPCOUNT
> > popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> > s390  when TARGET_Z916 && TARGET_64BIT
> > sparc when TARGET_POPC
> > arm when TARGET_NEON
> > mips when ISA_HAS_POP
> > spu
> > avr
> >
> > I can check these targets with the condition.
> > Another possibility is to check with a sample code and see if we are
> > getting a libcall in the asm. Not sure if that is straightforward. Are
> > there any example for such.
>
> You could try linking w/o libgcc ...
I realized that there are some examples already and I  have based it
on that. Attached patch
0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
tested on aarch64-linux-gnu with no new regression. Is this OK?

>
> > We could also move these test to a primary target that is tested often
> > tested which also defines popcount pattern. I dont think these tests
> > change for targets and if we can test in one target that could be
> > enough,
> >
> > I am happy to implement what is appropriate.
>
> How about the -Os idea?
Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
this. As you mentioned, this will again break the kernel. While I am
trying to provide the popcount implementation for the newer kernel, it
will still be a problem for existing kernel versions. May I request
that we provide a flag to disable this if we decide to go with the
patch please?

Thanks,
Kugan

>
> Richard.
>
> > Thanks,
> > Kugan
> >
> >
> >
> > >
> > > Also I think that the expression is only expensive (for final value
> > > replacement!)
> > > if you consider optimizing for speed.  When optimizing for size getting rid of
> > > the loop is probably beneificial unconditionally.  That would leave the
> > > possibility to switch said testcases to -Os.  It would require adding a
> > > bool size_p flag to expression_expensive and passing down
> > > optimize_loop_for_size_p ().
> > >
> > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > replacing sth with an expression based on the niter analysis result doesn't
> > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > doesn't apply there.
> > >
> > > Richard.
> > >
> > > > Jeff
> > > >

[-- Attachment #2: 0001-fix-kernel-build-v3.patch --]
[-- Type: application/x-patch, Size: 4895 bytes --]

[-- Attachment #3: 0002-allow-builtin-popcount-for-size_p.patch --]
[-- Type: application/x-patch, Size: 5201 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-11-02  9:02         ` Kugan Vivekanandarajah
@ 2018-11-07 13:04           ` Richard Biener
  2018-11-12  5:21             ` Kugan Vivekanandarajah
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2018-11-07 13:04 UTC (permalink / raw)
  To: Kugan Vivekanandarajah; +Cc: Jeff Law, Richard Sandiford, GCC Patches

On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>
> Hi Richard,
> Thanks for the review.
> On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > <kugan.vivekanandarajah@linaro.org> wrote:
> > >
> > > Hi Richard and Jeff,
> > >
> > > Thanks for your comments.
> > >
> > > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > > > >
> > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > Hi,
> > > > > >
> > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > regression for Skylake.
> > > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > pattern).  While I agree that the kernel should implement its own
> > > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > > implementation can have the same performance issues reported for
> > > > > > Skylake in PR87528.
> > > > > >
> > > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > > previous discussions) does this.
> > > > > >
> > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > > to define a effective_target_with_popcount in
> > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > Thanks,
> > > > > > Kugan
> > > > > >
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > >
> > > > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > > > >     as expensive when backend does not define it.
> > > > > >
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > >
> > > > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > > > >
> > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > > > patch from you.
> > > > >
> > > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > agreement with that position, though it does tend to generate some
> > > > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > > a PR standpoint.
> > > > >
> > > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > > realm of what is strictly technically correct here from a review standpoint.
> > > >
> > > > As said final value replacement to a library call is probably not wanted
> > > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > > me.  It might not fully solve the (non-)issue in case another optimization pass
> > > > chooses to materialize niter computation result.
> > > >
> > > > Few comments on the patch:
> > > >
> > > > +      tree fndecl = get_callee_fndecl (expr);
> > > > +
> > > > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > > +       {
> > > > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > > >
> > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > >   switch (cfn)
> > > >     {
> > >
> > > Did you mean:
> > > combined_fn cfn = get_call_combined_fn (expr);
> >
> > Yes.
> >
> > > > ...
> > > >
> > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> > > > offline but eventually he knows whether there is a better way to query
> > > >
> > > > +           CASE_CFN_POPCOUNT:
> > > > +             /* Check if opcode for popcount is available.  */
> > > > +             if (optab_handler (popcount_optab,
> > > > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > > (expr, 0))))
> > > > +                 == CODE_FOR_nothing)
> > > > +               return true;
> > > >
> > > > note that we currently generate builtin calls rather than IFN calls
> > > > (when a direct
> > > > optab is supported).
> > > >
> > > > Another comment on the patch is that you probably have to adjust existing
> > > > popcount testcases to add architecture specific flags enabling suport for
> > > > the instructions, otherwise you won't see loop replacement.
> > > Indeed.
> > > In lib/target-supports.exp, I will try to add support for
> > > check_effective_target_popcount_long.
> > > When I grep for the popcount pattern in md files, I see it is defined for:
> > >
> > > tilegx
> > > tilepro
> > > alpha
> > > aarch64  when TARGET_SIMD
> > > ia64
> > > rs6000
> > > i386  when TARGET_POPCOUNT
> > > popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> > > s390  when TARGET_Z916 && TARGET_64BIT
> > > sparc when TARGET_POPC
> > > arm when TARGET_NEON
> > > mips when ISA_HAS_POP
> > > spu
> > > avr
> > >
> > > I can check these targets with the condition.
> > > Another possibility is to check with a sample code and see if we are
> > > getting a libcall in the asm. Not sure if that is straightforward. Are
> > > there any example for such.
> >
> > You could try linking w/o libgcc ...
> I realized that there are some examples already and I  have based it
> on that. Attached patch
> 0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
> tested on aarch64-linux-gnu with no new regression. Is this OK?

I suspect using sth like the hard-float test is better, thus

check_no_messages_and_pattern popcountl "!\\(call" rtl-expand

instead of looking for !__popcount in assembly since depending on
assembler syntax this might nor might not match spuriously...

@@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr)
       tree arg;
       call_expr_arg_iterator iter;

+      combined_fn cfn = get_call_combined_fn (expr);
+      switch (cfn)
+       {
+       CASE_CFN_POPCOUNT:
+         /* Check if opcode for popcount is available.  */
+         if (optab_handler (popcount_optab,
+                            TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
+             == CODE_FOR_nothing)
+           return true;
+       default:
+         break;
+       }
+
       if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
        return true;
       FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)

note that we can handle double-word popcount by emitting two single-word
popcount instructions.  So if the mode is of 2 * UNITS_PER_WORD size
you want to check for mode == word_mode.  See expand_unop.

I think you want to add a comment before the code explaining that even
though is_inexpensive_builtin says true we may get a library call which
we consider expensive.

> >
> > > We could also move these test to a primary target that is tested often
> > > tested which also defines popcount pattern. I dont think these tests
> > > change for targets and if we can test in one target that could be
> > > enough,
> > >
> > > I am happy to implement what is appropriate.
> >
> > How about the -Os idea?
> Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
> this. As you mentioned, this will again break the kernel. While I am
> trying to provide the popcount implementation for the newer kernel, it
> will still be a problem for existing kernel versions. May I request
> that we provide a flag to disable this if we decide to go with the
> patch please?

Let's consider this only for GCC 10 and get the first patch ready.
expression_expensive_p shouldn't consider division as expensive
for -Os either so there's more churn here and expression_expensive_p
doesn't consider the overall size of the expression anyways.

So let's drop this for now.

Richard.

> Thanks,
> Kugan
>
> >
> > Richard.
> >
> > > Thanks,
> > > Kugan
> > >
> > >
> > >
> > > >
> > > > Also I think that the expression is only expensive (for final value
> > > > replacement!)
> > > > if you consider optimizing for speed.  When optimizing for size getting rid of
> > > > the loop is probably beneificial unconditionally.  That would leave the
> > > > possibility to switch said testcases to -Os.  It would require adding a
> > > > bool size_p flag to expression_expensive and passing down
> > > > optimize_loop_for_size_p ().
> > > >
> > > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > > replacing sth with an expression based on the niter analysis result doesn't
> > > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > > doesn't apply there.
> > > >
> > > > Richard.
> > > >
> > > > > Jeff
> > > > >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-11-07 13:04           ` Richard Biener
@ 2018-11-12  5:21             ` Kugan Vivekanandarajah
  2018-11-12 14:44               ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Kugan Vivekanandarajah @ 2018-11-12  5:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, rdsandiford, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 9773 bytes --]

Hi Richard,

Thanks for the review.
On Thu, 8 Nov 2018 at 00:03, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
> <kugan.vivekanandarajah@linaro.org> wrote:
> >
> > Hi Richard,
> > Thanks for the review.
> > On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > > <kugan.vivekanandarajah@linaro.org> wrote:
> > > >
> > > > Hi Richard and Jeff,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > > > > >
> > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > > regression for Skylake.
> > > > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > > pattern).  While I agree that the kernel should implement its own
> > > > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > > > implementation can have the same performance issues reported for
> > > > > > > Skylake in PR87528.
> > > > > > >
> > > > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > > > previous discussions) does this.
> > > > > > >
> > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > > > to define a effective_target_with_popcount in
> > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > > Thanks,
> > > > > > > Kugan
> > > > > > >
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > > >
> > > > > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > > > > >     as expensive when backend does not define it.
> > > > > > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > > >
> > > > > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > > > > >
> > > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > > > > patch from you.
> > > > > >
> > > > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > > agreement with that position, though it does tend to generate some
> > > > > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > > > a PR standpoint.
> > > > > >
> > > > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > > > realm of what is strictly technically correct here from a review standpoint.
> > > > >
> > > > > As said final value replacement to a library call is probably not wanted
> > > > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > > > me.  It might not fully solve the (non-)issue in case another optimization pass
> > > > > chooses to materialize niter computation result.
> > > > >
> > > > > Few comments on the patch:
> > > > >
> > > > > +      tree fndecl = get_callee_fndecl (expr);
> > > > > +
> > > > > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > > > +       {
> > > > > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > > > >
> > > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > > >   switch (cfn)
> > > > >     {
> > > >
> > > > Did you mean:
> > > > combined_fn cfn = get_call_combined_fn (expr);
> > >
> > > Yes.
> > >
> > > > > ...
> > > > >
> > > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> > > > > offline but eventually he knows whether there is a better way to query
> > > > >
> > > > > +           CASE_CFN_POPCOUNT:
> > > > > +             /* Check if opcode for popcount is available.  */
> > > > > +             if (optab_handler (popcount_optab,
> > > > > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > > > (expr, 0))))
> > > > > +                 == CODE_FOR_nothing)
> > > > > +               return true;
> > > > >
> > > > > note that we currently generate builtin calls rather than IFN calls
> > > > > (when a direct
> > > > > optab is supported).
> > > > >
> > > > > Another comment on the patch is that you probably have to adjust existing
> > > > > popcount testcases to add architecture specific flags enabling suport for
> > > > > the instructions, otherwise you won't see loop replacement.
> > > > Indeed.
> > > > In lib/target-supports.exp, I will try to add support for
> > > > check_effective_target_popcount_long.
> > > > When I grep for the popcount pattern in md files, I see it is defined for:
> > > >
> > > > tilegx
> > > > tilepro
> > > > alpha
> > > > aarch64  when TARGET_SIMD
> > > > ia64
> > > > rs6000
> > > > i386  when TARGET_POPCOUNT
> > > > popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> > > > s390  when TARGET_Z916 && TARGET_64BIT
> > > > sparc when TARGET_POPC
> > > > arm when TARGET_NEON
> > > > mips when ISA_HAS_POP
> > > > spu
> > > > avr
> > > >
> > > > I can check these targets with the condition.
> > > > Another possibility is to check with a sample code and see if we are
> > > > getting a libcall in the asm. Not sure if that is straightforward. Are
> > > > there any example for such.
> > >
> > > You could try linking w/o libgcc ...
> > I realized that there are some examples already and I  have based it
> > on that. Attached patch
> > 0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
> > tested on aarch64-linux-gnu with no new regression. Is this OK?
>
> I suspect using sth like the hard-float test is better, thus
>
> check_no_messages_and_pattern popcountl "!\\(call" rtl-expand
>
> instead of looking for !__popcount in assembly since depending on
> assembler syntax this might nor might not match spuriously...
Ok.
>
> @@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr)
>        tree arg;
>        call_expr_arg_iterator iter;
>
> +      combined_fn cfn = get_call_combined_fn (expr);
> +      switch (cfn)
> +       {
> +       CASE_CFN_POPCOUNT:
> +         /* Check if opcode for popcount is available.  */
> +         if (optab_handler (popcount_optab,
> +                            TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
> +             == CODE_FOR_nothing)
> +           return true;
> +       default:
> +         break;
> +       }
> +
>        if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
>         return true;
>        FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
>
> note that we can handle double-word popcount by emitting two single-word
> popcount instructions.  So if the mode is of 2 * UNITS_PER_WORD size
> you want to check for mode == word_mode.  See expand_unop.
Ok
>
> I think you want to add a comment before the code explaining that even
> though is_inexpensive_builtin says true we may get a library call which
> we consider expensive.

Ok.
>
> > >
> > > > We could also move these test to a primary target that is tested often
> > > > tested which also defines popcount pattern. I dont think these tests
> > > > change for targets and if we can test in one target that could be
> > > > enough,
> > > >
> > > > I am happy to implement what is appropriate.
> > >
> > > How about the -Os idea?
> > Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
> > this. As you mentioned, this will again break the kernel. While I am
> > trying to provide the popcount implementation for the newer kernel, it
> > will still be a problem for existing kernel versions. May I request
> > that we provide a flag to disable this if we decide to go with the
> > patch please?
>
> Let's consider this only for GCC 10 and get the first patch ready.
> expression_expensive_p shouldn't consider division as expensive
> for -Os either so there's more churn here and expression_expensive_p
> doesn't consider the overall size of the expression anyways.
>
> So let's drop this for now.
Ok.

I am attaching the modifies patch. Is this OK now?
Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Thanks,
Kugan
>
> Richard.
>
> > Thanks,
> > Kugan
> >
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Kugan
> > > >
> > > >
> > > >
> > > > >
> > > > > Also I think that the expression is only expensive (for final value
> > > > > replacement!)
> > > > > if you consider optimizing for speed.  When optimizing for size getting rid of
> > > > > the loop is probably beneificial unconditionally.  That would leave the
> > > > > possibility to switch said testcases to -Os.  It would require adding a
> > > > > bool size_p flag to expression_expensive and passing down
> > > > > optimize_loop_for_size_p ().
> > > > >
> > > > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > > > replacing sth with an expression based on the niter analysis result doesn't
> > > > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > > > doesn't apply there.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Jeff
> > > > > >

[-- Attachment #2: 0001-fix-kernel-build-v3.patch --]
[-- Type: text/x-patch, Size: 5617 bytes --]

From 8dbe11f716fe1d8cbe69e6879027ef6ed4db2334 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Wed, 24 Oct 2018 20:33:50 +1100
Subject: [PATCH] fix kernel build [v3]

Change-Id: I0748f522fa83ffbc627b190e0b6cf3c4389338c5
---
 gcc/testsuite/g++.dg/tree-ssa/pr86544.C      |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c     |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c    |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c    |  1 +
 gcc/testsuite/gcc.target/aarch64/popcount4.c | 14 ++++++++++++
 gcc/testsuite/lib/target-supports.exp        | 11 ++++++++++
 gcc/tree-scalar-evolution.c                  | 33 ++++++++++++++++++++++++++++
 7 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcount4.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
index fd844b4..ef43891 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fdump-tree-phiopt4 -fdump-tree-optimized" } */
 
 int PopCount (long b) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
index a5ec3b3..b469410 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-tree-ch" } */
 
 extern int foo (int);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
index 9096c6b..ef73e34 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
 
 int
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
index fd844b4..ef43891 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fdump-tree-phiopt4 -fdump-tree-optimized" } */
 
 int PopCount (long b) {
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c b/gcc/testsuite/gcc.target/aarch64/popcount4.c
new file mode 100644
index 0000000..ee55b2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -mgeneral-regs-only" } */
+
+int PopCount (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index e0c5801..8e16efc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6522,6 +6522,17 @@ proc check_effective_target_sync_long_long { } {
     }
 }
 
+# Return 1 if the target supports popcount on long.
+
+proc check_effective_target_popcountl { } {
+    return [check_no_messages_and_pattern popcountl "!\\(call" rtl-expand {
+	int foo (long b)
+	  {
+	    return __builtin_popcountl (b);
+	  }
+    } "" ]
+}
+
 # Return 1 if the target supports atomic operations on "long long"
 # and can execute them.
 #
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 02174b1..964712c 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -257,7 +257,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "rtl.h"
+#include "optabs-query.h"
 #include "tree.h"
 #include "gimple.h"
 #include "ssa.h"
@@ -282,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
 #include "builtins.h"
+#include "case-cfn-macros.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -3500,6 +3503,36 @@ expression_expensive_p (tree expr)
     {
       tree arg;
       call_expr_arg_iterator iter;
+      /* Even though is_inexpensive_builtin might say true, we will get a
+	 library call for popcount when backend does not have an instruction
+	 to do so.  We consider this to be expenseive and generate
+	 __builtin_popcount only when backend defines it.  */
+      combined_fn cfn = get_call_combined_fn (expr);
+      switch (cfn)
+	{
+	CASE_CFN_POPCOUNT:
+	  /* Check if opcode for popcount is available in the mode required.  */
+	  if (optab_handler (popcount_optab,
+			     TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
+	      == CODE_FOR_nothing)
+	    {
+	      machine_mode mode;
+	      mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0)));
+	      scalar_int_mode int_mode;
+
+	      /* If the mode is of 2 * UNITS_PER_WORD size, we can handle
+		 double-word popcount by emitting two single-word popcount
+		 instructions.  */
+	      if (is_a <scalar_int_mode> (mode, &int_mode)
+		  && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
+		  && (optab_handler (popcount_optab, word_mode)
+		      != CODE_FOR_nothing))
+		  break;
+	      return true;
+	    }
+	default:
+	  break;
+	}
 
       if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
 	return true;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
  2018-11-12  5:21             ` Kugan Vivekanandarajah
@ 2018-11-12 14:44               ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2018-11-12 14:44 UTC (permalink / raw)
  To: Kugan Vivekanandarajah; +Cc: Jeff Law, Richard Sandiford, GCC Patches

On Mon, Nov 12, 2018 at 6:21 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>
> Hi Richard,
>
> Thanks for the review.
> On Thu, 8 Nov 2018 at 00:03, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
> > <kugan.vivekanandarajah@linaro.org> wrote:
> > >
> > > Hi Richard,
> > > Thanks for the review.
> > > On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > > > <kugan.vivekanandarajah@linaro.org> wrote:
> > > > >
> > > > > Hi Richard and Jeff,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > > > > > >
> > > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > > > regression for Skylake.
> > > > > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > > > pattern).  While I agree that the kernel should implement its own
> > > > > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > > > > implementation can have the same performance issues reported for
> > > > > > > > Skylake in PR87528.
> > > > > > > >
> > > > > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > > > > previous discussions) does this.
> > > > > > > >
> > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > > > > to define a effective_target_with_popcount in
> > > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > > > Thanks,
> > > > > > > > Kugan
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > > > >
> > > > > > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > > > > > >     as expensive when backend does not define it.
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
> > > > > > > >
> > > > > > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > > > > > >
> > > > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > > > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > > > > > patch from you.
> > > > > > >
> > > > > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > > > agreement with that position, though it does tend to generate some
> > > > > > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > > > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > > > > a PR standpoint.
> > > > > > >
> > > > > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > > > > realm of what is strictly technically correct here from a review standpoint.
> > > > > >
> > > > > > As said final value replacement to a library call is probably not wanted
> > > > > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > > > > me.  It might not fully solve the (non-)issue in case another optimization pass
> > > > > > chooses to materialize niter computation result.
> > > > > >
> > > > > > Few comments on the patch:
> > > > > >
> > > > > > +      tree fndecl = get_callee_fndecl (expr);
> > > > > > +
> > > > > > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > > > > +       {
> > > > > > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > > > > >
> > > > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > > > >   switch (cfn)
> > > > > >     {
> > > > >
> > > > > Did you mean:
> > > > > combined_fn cfn = get_call_combined_fn (expr);
> > > >
> > > > Yes.
> > > >
> > > > > > ...
> > > > > >
> > > > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
> > > > > > offline but eventually he knows whether there is a better way to query
> > > > > >
> > > > > > +           CASE_CFN_POPCOUNT:
> > > > > > +             /* Check if opcode for popcount is available.  */
> > > > > > +             if (optab_handler (popcount_optab,
> > > > > > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > > > > (expr, 0))))
> > > > > > +                 == CODE_FOR_nothing)
> > > > > > +               return true;
> > > > > >
> > > > > > note that we currently generate builtin calls rather than IFN calls
> > > > > > (when a direct
> > > > > > optab is supported).
> > > > > >
> > > > > > Another comment on the patch is that you probably have to adjust existing
> > > > > > popcount testcases to add architecture specific flags enabling suport for
> > > > > > the instructions, otherwise you won't see loop replacement.
> > > > > Indeed.
> > > > > In lib/target-supports.exp, I will try to add support for
> > > > > check_effective_target_popcount_long.
> > > > > When I grep for the popcount pattern in md files, I see it is defined for:
> > > > >
> > > > > tilegx
> > > > > tilepro
> > > > > alpha
> > > > > aarch64  when TARGET_SIMD
> > > > > ia64
> > > > > rs6000
> > > > > i386  when TARGET_POPCOUNT
> > > > > popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> > > > > s390  when TARGET_Z916 && TARGET_64BIT
> > > > > sparc when TARGET_POPC
> > > > > arm when TARGET_NEON
> > > > > mips when ISA_HAS_POP
> > > > > spu
> > > > > avr
> > > > >
> > > > > I can check these targets with the condition.
> > > > > Another possibility is to check with a sample code and see if we are
> > > > > getting a libcall in the asm. Not sure if that is straightforward. Are
> > > > > there any example for such.
> > > >
> > > > You could try linking w/o libgcc ...
> > > I realized that there are some examples already and I  have based it
> > > on that. Attached patch
> > > 0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
> > > tested on aarch64-linux-gnu with no new regression. Is this OK?
> >
> > I suspect using sth like the hard-float test is better, thus
> >
> > check_no_messages_and_pattern popcountl "!\\(call" rtl-expand
> >
> > instead of looking for !__popcount in assembly since depending on
> > assembler syntax this might nor might not match spuriously...
> Ok.
> >
> > @@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr)
> >        tree arg;
> >        call_expr_arg_iterator iter;
> >
> > +      combined_fn cfn = get_call_combined_fn (expr);
> > +      switch (cfn)
> > +       {
> > +       CASE_CFN_POPCOUNT:
> > +         /* Check if opcode for popcount is available.  */
> > +         if (optab_handler (popcount_optab,
> > +                            TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
> > +             == CODE_FOR_nothing)
> > +           return true;
> > +       default:
> > +         break;
> > +       }
> > +
> >        if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
> >         return true;
> >        FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
> >
> > note that we can handle double-word popcount by emitting two single-word
> > popcount instructions.  So if the mode is of 2 * UNITS_PER_WORD size
> > you want to check for mode == word_mode.  See expand_unop.
> Ok
> >
> > I think you want to add a comment before the code explaining that even
> > though is_inexpensive_builtin says true we may get a library call which
> > we consider expensive.
>
> Ok.
> >
> > > >
> > > > > We could also move these test to a primary target that is tested often
> > > > > tested which also defines popcount pattern. I dont think these tests
> > > > > change for targets and if we can test in one target that could be
> > > > > enough,
> > > > >
> > > > > I am happy to implement what is appropriate.
> > > >
> > > > How about the -Os idea?
> > > Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
> > > this. As you mentioned, this will again break the kernel. While I am
> > > trying to provide the popcount implementation for the newer kernel, it
> > > will still be a problem for existing kernel versions. May I request
> > > that we provide a flag to disable this if we decide to go with the
> > > patch please?
> >
> > Let's consider this only for GCC 10 and get the first patch ready.
> > expression_expensive_p shouldn't consider division as expensive
> > for -Os either so there's more churn here and expression_expensive_p
> > doesn't consider the overall size of the expression anyways.
> >
> > So let's drop this for now.
> Ok.
>
> I am attaching the modifies patch. Is this OK now?

OK.

Thanks,
Richard.

> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.
>
> Thanks,
> Kugan
> >
> > Richard.
> >
> > > Thanks,
> > > Kugan
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Kugan
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Also I think that the expression is only expensive (for final value
> > > > > > replacement!)
> > > > > > if you consider optimizing for speed.  When optimizing for size getting rid of
> > > > > > the loop is probably beneificial unconditionally.  That would leave the
> > > > > > possibility to switch said testcases to -Os.  It would require adding a
> > > > > > bool size_p flag to expression_expensive and passing down
> > > > > > optimize_loop_for_size_p ().
> > > > > >
> > > > > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > > > > replacing sth with an expression based on the niter analysis result doesn't
> > > > > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > > > > doesn't apply there.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Jeff
> > > > > > >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-11-12 14:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 23:27 [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it Kugan Vivekanandarajah
2018-10-26  5:12 ` Jeff Law
2018-10-26  9:18   ` Richard Biener
2018-10-29  8:40     ` Kugan Vivekanandarajah
2018-10-29 14:32       ` Richard Biener
2018-11-02  9:02         ` Kugan Vivekanandarajah
2018-11-07 13:04           ` Richard Biener
2018-11-12  5:21             ` Kugan Vivekanandarajah
2018-11-12 14:44               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).