public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
@ 2020-07-04  1:51 zhoukaipeng3 at huawei dot com
  2020-07-06  7:13 ` [Bug tree-optimization/96053] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: zhoukaipeng3 at huawei dot com @ 2020-07-04  1:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

            Bug ID: 96053
           Summary: Miss optimization:Finding SLP sequences from
                    reductions sometimes is better than finding from
                    reduction chains
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhoukaipeng3 at huawei dot com
  Target Milestone: ---

command:
gcc -S -O2 -ftree-vectorize test.c -funsafe-math-optimizations 
-fno-tree-reassoc -march=armv8.2-a+sve -msve-vector-bits=128

gcc version 11.0.0 20200629

In vectorization, finding SLP sequences from reduction chains has priority over
from reductions.  But sometimes, finding SLP sequences from reductions is a
better way to do vectorization than from reduction chains.

testcase:
double f(double *a, double *b)
{
  double res1 = 0;
  double res0 = 0;
  for (int i = 0 ; i < 1000; i+=4) {
    res0 += a[i] * b[i];
    res1 += a[i+1] * b[i*1];
    res0 += a[i+2] * b[i+2];
    res1 += a[i+3] * b[i+3];
  }
  return res0 + res1;
}

I have two imperfect solutions, one is to add a control option, and the other
is to use the cost model to evaluate which is better.  The first one is very
difficult for users to use, and the second one is difficult to implement.

Does anyone have a better suggestion?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
  2020-07-04  1:51 [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains zhoukaipeng3 at huawei dot com
@ 2020-07-06  7:13 ` rguenth at gcc dot gnu.org
  2020-07-06  8:38 ` zhoukaipeng3 at huawei dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-06  7:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-07-06
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
                 CC|                            |avieira at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
In the end it is indeed a costing issue (also finding SLP sequences from
reductions is quite ad-hoc - either all reductions form a SLP sequence or
none).  There's epilogue cost which for SLP reductions is usually cheaper
than from reduction chains and then there's cost of the participating loads
and required permutations which depends very much on the actual case ...

For the immediate benefit I think giving more control to the user sometimes
makes sense and if then I'd go a route like

#pragma GCC vect [no-]reduc-chain

and document those as hints.

But as you say, basing the decision on costing would be way better.

Note ILP for the reduction chain is probably higher since both reductions
can execute in parallel, so for the simple testcase I'd expect the reduction
chain variant to be faster.

Note for some reason your testcase vectorizes as a SLP reduction and not
as reduction chains for me on x86_64, association seems off vectorizers
expectation.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
  2020-07-04  1:51 [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains zhoukaipeng3 at huawei dot com
  2020-07-06  7:13 ` [Bug tree-optimization/96053] " rguenth at gcc dot gnu.org
@ 2020-07-06  8:38 ` zhoukaipeng3 at huawei dot com
  2020-07-20  8:55 ` zhoukaipeng3 at huawei dot com
  2020-07-20  9:02 ` zhoukaipeng3 at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: zhoukaipeng3 at huawei dot com @ 2020-07-06  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

--- Comment #2 from Kaipeng Zhou <zhoukaipeng3 at huawei dot com> ---
For now, I will try to make a patch to give more control option like

#pragma GCC vect [no-]reduc-chain

to the user.

If there is any new progress or problem, I will update here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
  2020-07-04  1:51 [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains zhoukaipeng3 at huawei dot com
  2020-07-06  7:13 ` [Bug tree-optimization/96053] " rguenth at gcc dot gnu.org
  2020-07-06  8:38 ` zhoukaipeng3 at huawei dot com
@ 2020-07-20  8:55 ` zhoukaipeng3 at huawei dot com
  2020-07-20  9:02 ` zhoukaipeng3 at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: zhoukaipeng3 at huawei dot com @ 2020-07-20  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

--- Comment #3 from Kaipeng Zhou <zhoukaipeng3 at huawei dot com> ---
Created attachment 48896
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48896&action=edit
Patch to add #pragma GCC no_reduc_chain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/96053] Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains
  2020-07-04  1:51 [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains zhoukaipeng3 at huawei dot com
                   ` (2 preceding siblings ...)
  2020-07-20  8:55 ` zhoukaipeng3 at huawei dot com
@ 2020-07-20  9:02 ` zhoukaipeng3 at huawei dot com
  3 siblings, 0 replies; 5+ messages in thread
From: zhoukaipeng3 at huawei dot com @ 2020-07-20  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96053

--- Comment #4 from Kaipeng Zhou <zhoukaipeng3 at huawei dot com> ---
This patch add #pragma GCC no_reduc_chain and only completes the front end of C
language.

For the testcase, it successfully skipped doing slp by finding sequences from
reduction chains.  Without #pragma GCC no_reduc_chain, it will fail to do
vectorization.

Please help to check if there is any problem. If there is no problem, I will
continue to complete the front end of the remaining languages.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-20  9:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-04  1:51 [Bug tree-optimization/96053] New: Miss optimization:Finding SLP sequences from reductions sometimes is better than finding from reduction chains zhoukaipeng3 at huawei dot com
2020-07-06  7:13 ` [Bug tree-optimization/96053] " rguenth at gcc dot gnu.org
2020-07-06  8:38 ` zhoukaipeng3 at huawei dot com
2020-07-20  8:55 ` zhoukaipeng3 at huawei dot com
2020-07-20  9:02 ` zhoukaipeng3 at huawei dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).