* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
@ 2020-06-23 14:26 ` rguenth at gcc dot gnu.org
2020-06-24 15:00 ` ubizjak at gmail dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-23 14:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |avieira at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
Status|UNCONFIRMED |NEW
Blocks| |53947
Ever confirmed|0 |1
Last reconfirmed| |2020-06-23
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC does not yet vectorize stmts without loads (and explicitely rejects
vector types somewhere).
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
@ 2020-06-24 15:00 ` ubizjak at gmail dot com
2020-06-24 16:05 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: ubizjak at gmail dot com @ 2020-06-24 15:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
What I find interesting is a similar case with the division instead of the
addition. Clang compiles it to:
divps %xmm1, %xmm0
retq
Considering that we have [a0, a1, 0, 0] / [b0, b1, 0, 0], this will surely fire
invalid operation exception. I have explicitly avoided generation of division
using 4-element DIVPS for v2sf operands exactly due to this issue.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
2020-06-23 14:26 ` [Bug tree-optimization/95839] " rguenth at gcc dot gnu.org
2020-06-24 15:00 ` ubizjak at gmail dot com
@ 2020-06-24 16:05 ` rguenth at gcc dot gnu.org
2020-06-24 16:10 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-24 16:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |marxin at gcc dot gnu.org
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> GCC does not yet vectorize stmts without loads (and explicitely rejects
> vector types somewhere).
But this particular case might be easy since we already vectorize from CTORs,
we likely just disregard the BB because it doesn't contain any datarefs:
_1 = BIT_FIELD_REF <a_7(D), 32, 0>;
_2 = BIT_FIELD_REF <b_8(D), 32, 0>;
_3 = _1 + _2;
_4 = BIT_FIELD_REF <a_7(D), 32, 32>;
_5 = BIT_FIELD_REF <b_8(D), 32, 32>;
_6 = _4 + _5;
_9 = {_3, _6};
vectorization might turn this into
_10 = {_1, _4 };
_11 = {_2, _5 };
_9 = _10 + _11;
and then forwprop CTOR "folding" will get rid of the
_10 and _11 CTORs (until the vectorizer handles BIT_FIELD_REFs
of existing vectors).
So kind-of "easy hack" - Martin, your branch might already do this
(not give up on <= 1 datarefs).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (2 preceding siblings ...)
2020-06-24 16:05 ` rguenth at gcc dot gnu.org
@ 2020-06-24 16:10 ` rguenth at gcc dot gnu.org
2020-06-25 10:52 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-24 16:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, with some pending patch applied and
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ca6bedc9cc8..3d5de39383c 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3130,7 +3130,7 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts,
bool &fatal)
return false;
}
- if (BB_VINFO_DATAREFS (bb_vinfo).length () < 2)
+ if (0 && BB_VINFO_DATAREFS (bb_vinfo).length () < 2)
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
this only fails to vectorize due to cost considerations (considering to
code-generate as I wrote):
0x53b15a0 _1 + _2 1 times vector_stmt costs 12 in body
0x53b15a0 <unknown> 1 times vec_construct costs 8 in prologue
0x53b15a0 <unknown> 1 times vec_construct costs 8 in prologue
0x5412d10 _1 + _2 1 times scalar_stmt costs 12 in body
0x5412d10 _4 + _5 1 times scalar_stmt costs 12 in body
but with -fno-vect-cost-model it "works" as I guessed. For division
we correctly do not vectorize.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (3 preceding siblings ...)
2020-06-24 16:10 ` rguenth at gcc dot gnu.org
@ 2020-06-25 10:52 ` rguenth at gcc dot gnu.org
2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-25 10:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I am testing a patch to remove the premature check. For the original testcase
you'd still have to disable cost modeling so handling of BIT_FIELD_REFs of
vectors is still missing (but I think we have duplicate PRs about that).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (4 preceding siblings ...)
2020-06-25 10:52 ` rguenth at gcc dot gnu.org
@ 2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
2020-06-25 14:54 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-25 13:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:628b78f9794a2eefcbc578011806bfa8e09b9ef7
commit r11-1653-g628b78f9794a2eefcbc578011806bfa8e09b9ef7
Author: Richard Biener <rguenther@suse.de>
Date: Thu Jun 25 12:47:20 2020 +0200
tree-optimization/95839 - allow CTOR vectorization without loads
This removes a premature check for enough datarefs in a basic-block
before we consider vectorizing it which leaves basic-blocks with
just vectorizable vector constructors unvectorized. The check
is effectively done by the following check for store groups
which then also include constructors.
2020-06-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/95839
* tree-vect-slp.c (vect_slp_analyze_bb_1): Remove premature
check on the number of datarefs.
* gcc.dg/vect/bb-slp-pr95839.c: New testcase.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (5 preceding siblings ...)
2020-06-25 13:56 ` cvs-commit at gcc dot gnu.org
@ 2020-06-25 14:54 ` rguenth at gcc dot gnu.org
2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
2020-07-01 11:34 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-25 14:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, I have a patch doing BIT_FIELD_REFs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (6 preceding siblings ...)
2020-06-25 14:54 ` rguenth at gcc dot gnu.org
@ 2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
2020-07-01 11:34 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-07-01 11:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:7b3adfa7bb47e4ebde91634caa5a7e13175558f1
commit r11-1757-g7b3adfa7bb47e4ebde91634caa5a7e13175558f1
Author: Richard Biener <rguenther@suse.de>
Date: Fri Jun 26 11:18:19 2020 +0200
tree-optimization/95839 - teach SLP vectorization about vector inputs
This teaches SLP analysis about vector typed externals that are
fed into the SLP operations via lane extracting BIT_FIELD_REFs.
It shows that there's currently no good representation for
vector code on the SLP side so I went a half way and represent
such vector externals uses always using a SLP permutation node
with a single external SLP child which has a non-standard
representation of no scalar defs but only a vector def. That
works best for shielding the rest of the vectorizer from it.
2020-06-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/95839
* tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
vectors are not uniform.
(vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
vector registers.
(vect_build_slp_tree_2): For groups of lane extracts
from a vector register generate a permute node
with a special child representing the pre-existing vector.
(vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
(vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
(vectorizable_slp_permutation): Do not generate or cost identity
permutes.
(vect_schedule_slp_instance): Handle pre-existing vector
that are function arguments.
* gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/95839] Failure to optimize addition of vector elements to vector addition
2020-06-23 12:26 [Bug tree-optimization/95839] New: Failure to optimize addition of vector elements to vector addition gabravier at gmail dot com
` (7 preceding siblings ...)
2020-07-01 11:32 ` cvs-commit at gcc dot gnu.org
@ 2020-07-01 11:34 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-01 11:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95839
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 10+ messages in thread