public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition
@ 2020-06-23 12:26 gabravier at gmail dot com
  2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: gabravier at gmail dot com @ 2020-06-23 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

            Bug ID: 95839
           Summary: Failure to optimize addition of vector elements to
                    vector addition
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef float __attribute__((vector_size(8))) v2f32;

v2f32 f(v2f32 a, v2f32 b)
{
    return (v2f32){a[0] + b[0], a[1] + b[1]};
}

This can be optimized to `return a + b;`. This transformation is done by LLVM,
but not by GCC.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
@ 2020-06-23 14:26 ` rguenth at gcc dot gnu.org
  2020-06-24 15:00 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-23 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |avieira at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
             Blocks|                            |53947
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-06-23

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC does not yet vectorize stmts without loads (and explicitely rejects
vector types somewhere).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
  2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
@ 2020-06-24 15:00 ` ubizjak at gmail dot com
  2020-06-24 16:05 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-06-24 15:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
What I find interesting is a similar case with the division instead of the
addition. Clang compiles it to:

        divps   %xmm1, %xmm0
        retq

Considering that we have [a0, a1, 0, 0] / [b0, b1, 0, 0], this will surely fire
invalid operation exception. I have explicitly avoided generation of division
using 4-element DIVPS for v2sf operands exactly due to this issue.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
  2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
  2020-06-24 15:00 ` ubizjak at gmail dot com
@ 2020-06-24 16:05 ` rguenth at gcc dot gnu.org
  2020-06-24 16:10 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-24 16:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> GCC does not yet vectorize stmts without loads (and explicitely rejects
> vector types somewhere).

But this particular case might be easy since we already vectorize from CTORs,
we likely just disregard the BB because it doesn't contain any datarefs:

  _1 = BIT_FIELD_REF <a_7(D), 32, 0>;
  _2 = BIT_FIELD_REF <b_8(D), 32, 0>;
  _3 = _1 + _2;
  _4 = BIT_FIELD_REF <a_7(D), 32, 32>;
  _5 = BIT_FIELD_REF <b_8(D), 32, 32>;
  _6 = _4 + _5;
  _9 = {_3, _6};

vectorization might turn this into

 _10 = {_1, _4 };
 _11 = {_2, _5 };
 _9 = _10 + _11;

and then forwprop CTOR "folding" will get rid of the
_10 and _11 CTORs (until the vectorizer handles BIT_FIELD_REFs
of existing vectors).

So kind-of "easy hack" - Martin, your branch might already do this
(not give up on <= 1 datarefs).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-06-24 16:05 ` rguenth at gcc dot gnu.org
@ 2020-06-24 16:10 ` rguenth at gcc dot gnu.org
  2020-06-25 10:52 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-24 16:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, with some pending patch applied and 

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ca6bedc9cc8..3d5de39383c 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3130,7 +3130,7 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts,
bool &fatal)
       return false;
     }

-  if (BB_VINFO_DATAREFS (bb_vinfo).length () < 2)
+  if (0 && BB_VINFO_DATAREFS (bb_vinfo).length () < 2)
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

this only fails to vectorize due to cost considerations (considering to
code-generate as I wrote):

0x53b15a0 _1 + _2 1 times vector_stmt costs 12 in body
0x53b15a0 <unknown> 1 times vec_construct costs 8 in prologue
0x53b15a0 <unknown> 1 times vec_construct costs 8 in prologue
0x5412d10 _1 + _2 1 times scalar_stmt costs 12 in body
0x5412d10 _4 + _5 1 times scalar_stmt costs 12 in body

but with -fno-vect-cost-model it "works" as I guessed.  For division
we correctly do not vectorize.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-06-24 16:10 ` rguenth at gcc dot gnu.org
@ 2020-06-25 10:52 ` rguenth at gcc dot gnu.org
  2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-25 10:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I am testing a patch to remove the premature check.  For the original testcase
you'd still have to disable cost modeling so handling of BIT_FIELD_REFs of
vectors is still missing (but I think we have duplicate PRs about that).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2020-06-25 10:52 ` rguenth at gcc dot gnu.org
@ 2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
  2020-06-25 14:54 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-25 13:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:628b78f9794a2eefcbc578011806bfa8e09b9ef7

commit r11-1653-g628b78f9794a2eefcbc578011806bfa8e09b9ef7
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Jun 25 12:47:20 2020 +0200

    tree-optimization/95839 - allow CTOR vectorization without loads

    This removes a premature check for enough datarefs in a basic-block
    before we consider vectorizing it which leaves basic-blocks with
    just vectorizable vector constructors unvectorized.  The check
    is effectively done by the following check for store groups
    which then also include constructors.

    2020-06-25  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/95839
            * tree-vect-slp.c (vect_slp_analyze_bb_1): Remove premature
            check on the number of datarefs.

            * gcc.dg/vect/bb-slp-pr95839.c: New testcase.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
@ 2020-06-25 14:54 ` rguenth at gcc dot gnu.org
  2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
  2020-07-01 11:34 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-25 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, I have a patch doing BIT_FIELD_REFs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2020-06-25 14:54 ` rguenth at gcc dot gnu.org
@ 2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
  2020-07-01 11:34 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-07-01 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:7b3adfa7bb47e4ebde91634caa5a7e13175558f1

commit r11-1757-g7b3adfa7bb47e4ebde91634caa5a7e13175558f1
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jun 26 11:18:19 2020 +0200

    tree-optimization/95839 - teach SLP vectorization about vector inputs

    This teaches SLP analysis about vector typed externals that are
    fed into the SLP operations via lane extracting BIT_FIELD_REFs.
    It shows that there's currently no good representation for
    vector code on the SLP side so I went a half way and represent
    such vector externals uses always using a SLP permutation node
    with a single external SLP child which has a non-standard
    representation of no scalar defs but only a vector def.  That
    works best for shielding the rest of the vectorizer from it.

    2020-06-26  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/95839
            * tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
            vectors are not uniform.
            (vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
            vector registers.
            (vect_build_slp_tree_2): For groups of lane extracts
            from a vector register generate a permute node
            with a special child representing the pre-existing vector.
            (vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
            (vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
            (vectorizable_slp_permutation): Do not generate or cost identity
            permutes.
            (vect_schedule_slp_instance): Handle pre-existing vector
            that are function arguments.

            * gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
  2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
@ 2020-07-01 11:34 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-01 11:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-01 11:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
2020-06-24 15:00 ` ubizjak at gmail dot com
2020-06-24 16:05 ` rguenth at gcc dot gnu.org
2020-06-24 16:10 ` rguenth at gcc dot gnu.org
2020-06-25 10:52 ` rguenth at gcc dot gnu.org
2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
2020-06-25 14:54 ` rguenth at gcc dot gnu.org
2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
2020-07-01 11:34 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).