public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Add ops_num to targetm.sched.reassociation_width hook
@ 2021-08-04  0:07 Aaron Sawdey
  2021-08-04  9:54 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Aaron Sawdey @ 2021-08-04  0:07 UTC (permalink / raw)
  To: gcc; +Cc: Richard Biener, Segher Boessenkool, Bill Schmidt

Richard,

So, I’m noticing that in get_reassociation_width() we know how many ops (ops_num) are in the expression being considered for parallel reassociation, but this is not passed to the target hook. In my testing this seems like it might be useful to have. If you determine the maximum width that gives additional speedup for a large number of terms, and then use that as the width from the target hook, get_reassociation_width() is more aggressive than you would like for small expressions with maybe 4-16 terms and produces code that is slower than optimal. For example in many cases you want to continue using a width of 1 until you get to 16 terms or so. My testing shows this to be the case for power8, power9, and power10 processors. 

So, I’m wondering how it might be received if I posted a patch that adds this to the reassociation_width target hook (and of course fixes all uses of that target hook)?

Thanks!
   Aaron


Aaron Sawdey, Ph.D. sawdey@linux.ibm.com
IBM Linux on POWER Toolchain
 


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Add ops_num to targetm.sched.reassociation_width hook
  2021-08-04  0:07 Add ops_num to targetm.sched.reassociation_width hook Aaron Sawdey
@ 2021-08-04  9:54 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2021-08-04  9:54 UTC (permalink / raw)
  To: Aaron Sawdey; +Cc: gcc, Segher Boessenkool, Bill Schmidt

On Wed, Aug 4, 2021 at 2:07 AM Aaron Sawdey <acsawdey@linux.ibm.com> wrote:
>
> Richard,
>
> So, I’m noticing that in get_reassociation_width() we know how many ops (ops_num) are in the expression being considered for parallel reassociation, but this is not passed to the target hook. In my testing this seems like it might be useful to have. If you determine the maximum width that gives additional speedup for a large number of terms, and then use that as the width from the target hook, get_reassociation_width() is more aggressive than you would like for small expressions with maybe 4-16 terms and produces code that is slower than optimal. For example in many cases you want to continue using a width of 1 until you get to 16 terms or so. My testing shows this to be the case for power8, power9, and power10 processors.
>
> So, I’m wondering how it might be received if I posted a patch that adds this to the reassociation_width target hook (and of course fixes all uses of that target hook)?

You probably saw that get_reassociation_width already tries to
optimize things.  So what exactly
would you change and why is it slower for 4-16 terms but not for 17+
ones?  I suppose "is slower"
is --param mining on some benchmarks on your side and eventually you
manage to pick the
best threshold to not run into register pressure issues (by luck) for
those benchmarks?

That said, I question you can explain why it is slower, right?

Richard.

> Thanks!
>    Aaron
>
>
> Aaron Sawdey, Ph.D. sawdey@linux.ibm.com
> IBM Linux on POWER Toolchain
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-04  9:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04  0:07 Add ops_num to targetm.sched.reassociation_width hook Aaron Sawdey
2021-08-04  9:54 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).