public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/54400] New: recognize haddpd
@ 2012-08-29 6:10 glisse at gcc dot gnu.org
2012-09-01 9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-08-29 6:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Bug #: 54400
Summary: recognize haddpd
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: glisse@gcc.gnu.org
Target: x86_64-linux-gnu
Hello,
for this program:
#include <x86intrin.h>
double f(__m128d v){return v[1]+v[0];}
gcc -O3 -msse4 (same with -Os) generates:
movapd %xmm0, %xmm2
unpckhpd %xmm2, %xmm2
movapd %xmm2, %xmm1
addsd %xmm0, %xmm1
movapd %xmm1, %xmm0
(yes, the number of mov instructions is a bit high...)
Looking at the x86 backend, it can expand reduc_splus_v2df and
__builtin_ia32_haddpd, but it doesn't provide any pattern that could be
recognized. hsubpd is even less present.
It seems to me that, considering only the low part of the result of haddpd, the
pattern should be small enough to be matched: (plus (vec_select (match_operand
1) const_a) (vec_select (match_dup 1) const_b)) where a and b are 0 and 1 in
any order.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/54400] recognize haddpd
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
@ 2012-09-01 9:40 ` glisse at gcc dot gnu.org
2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-09-01 9:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-01 09:40:14 UTC ---
The code below seems to optimize v[0]-v[1] and v[1]+v[0]. It doesn't recognize
v[0]+v[1], but that would not be too hard to add I guess. Compared to the true
hadd insn, I removed the setattr "type" "sseadd" because it crashed the
compiler (in cost computation maybe). Apart from the things left in here that
may not make sense, I don't know if a peephole would be more relevant. Maybe
the insn helps more if I want to recognize dot products (dppd) later on? At
least thanks to it {v[0]-v[1],w[0]-w[1]} is now recognized as a hsub (although
it doesn't work if v==w because vec_duplicate doesn't match vec_concat).
(define_insn "*sse3_h<plusminus_insn>v2df3_low_MARC"
[(set (match_operand:DF 0 "register_operand" "=x,x")
(plusminus:DF
(vec_select:DF
(match_operand:V2DF 1 "register_operand" "0,x")
(parallel [(const_int 0)]))
(vec_select:DF
(match_dup 1)
(parallel [(const_int 1)]))))]
"TARGET_SSE3"
"@
h<plusminus_mnemonic>pd\t{%0, %0|%0, %0}
vh<plusminus_mnemonic>pd\t{%1, %1, %0|%0, %1, %1}"
[(set_attr "isa" "noavx,avx")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "V2DF")])
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize haddpd
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
2012-09-01 9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
@ 2012-09-03 10:13 ` rguenth at gcc dot gnu.org
2012-09-03 10:22 ` glisse at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-03 10:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-09-03
Component|target |middle-end
Blocks| |53947
Ever Confirmed|0 |1
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-03 10:12:36 UTC ---
The basic-block vectorizer does not handle reductions and it would be another
natural place to perform this optimization.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize haddpd
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
2012-09-01 9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
@ 2012-09-03 10:22 ` glisse at gcc dot gnu.org
2012-10-08 20:46 ` glisse at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-09-03 10:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-03 10:21:48 UTC ---
(In reply to comment #2)
> The basic-block vectorizer does not handle reductions and it would be another
> natural place to perform this optimization.
I thought about turning a PLUS_EXPR of BIT_FIELD_REF into a REDUC_PLUS_EXPR (in
forwprop), but that wouldn't handle the MINUS_EXPR case (can still be worth
doing though, especially if the code is common to the other reductions).
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize haddpd
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (2 preceding siblings ...)
2012-09-03 10:22 ` glisse at gcc dot gnu.org
@ 2012-10-08 20:46 ` glisse at gcc dot gnu.org
2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-08 20:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-08 20:46:04 UTC ---
Author: glisse
Date: Mon Oct 8 20:45:56 2012
New Revision: 192223
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192223
Log:
2012-10-08 Marc Glisse <marc.glisse@inria.fr>
gcc/
PR target/54400
* config/i386/i386.md (type attribute): Add sseadd1.
(unit attribute): Add support for sseadd1.
(memory attribute): Likewise.
* config/i386/athlon.md: Likewise.
* config/i386/core2.md: Likewise.
* config/i386/atom.md: Likewise.
* config/i386/ppro.md: Likewise.
* config/i386/bdver1.md: Likewise.
* config/i386/sse.md (sse3_h<plusminus_insn>v2df3): split into...
(sse3_haddv2df3): ... expander.
(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
(sse3_hsubv2df3): ... define_insn.
(*sse3_haddv2df3_low): New define_insn.
(*sse3_hsubv2df3_low): New define_insn.
gcc/testsuite/
PR target/54400
* gcc.target/i386/pr54400.c: New testcase.
Added:
trunk/gcc/testsuite/gcc.target/i386/pr54400.c (with props)
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/athlon.md
trunk/gcc/config/i386/atom.md
trunk/gcc/config/i386/bdver1.md
trunk/gcc/config/i386/core2.md
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/ppro.md
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog
Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
('svn:eol-style' added)
Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
('svn:keywords' added)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (3 preceding siblings ...)
2012-10-08 20:46 ` glisse at gcc dot gnu.org
@ 2012-10-08 21:03 ` glisse at gcc dot gnu.org
2012-10-25 8:40 ` vincenzo.innocente at cern dot ch
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-08 21:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|recognize haddpd |recognize vector reductions
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-08 21:02:56 UTC ---
Renaming since the specific x86 case is done, and this is now a middle-end PR.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (4 preceding siblings ...)
2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
@ 2012-10-25 8:40 ` vincenzo.innocente at cern dot ch
2021-06-08 13:36 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-25 8:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vincenzo.innocente at cern
| |dot ch
--- Comment #6 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-25 08:40:27 UTC ---
*** Bug 55071 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (5 preceding siblings ...)
2012-10-25 8:40 ` vincenzo.innocente at cern dot ch
@ 2021-06-08 13:36 ` rguenth at gcc dot gnu.org
2021-06-08 22:04 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-08 13:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, with vectorizer support we get
<bb 2> [local count: 1073741824]:
_7 = .REDUC_PLUS (v_3(D)); [tail call]
return _7;
but
f:
.LFB5669:
.cfi_startproc
movapd %xmm0, %xmm1
unpckhpd %xmm0, %xmm0
addpd %xmm1, %xmm0
ret
(note avoiding hadd in the reduc pattern was intended). Not sure if
two element reduction vectorization is worthwhile - my current patch
bails w/o -fassociative-math (which is supposedly unnecessary for
two elements).
But mine anyway.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (6 preceding siblings ...)
2021-06-08 13:36 ` rguenth at gcc dot gnu.org
@ 2021-06-08 22:04 ` glisse at gcc dot gnu.org
2021-06-17 7:52 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-08 22:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #8 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> (note avoiding hadd in the reduc pattern was intended).
Indeed. Except with -Os, or if a processor with a fast hadd appears,
vectorising this doesn't bring anything. It doesn't hurt either though.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (7 preceding siblings ...)
2021-06-08 22:04 ` glisse at gcc dot gnu.org
@ 2021-06-17 7:52 ` cvs-commit at gcc dot gnu.org
2021-06-17 8:00 ` rguenth at gcc dot gnu.org
2021-09-11 14:19 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-06-17 7:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c
commit r12-1551-g3dfa4fe9f1a089b2b3906c83e22a1b39c49d937c
Author: Richard Biener <rguenther@suse.de>
Date: Tue Jun 8 15:10:45 2021 +0200
Vectorization of BB reductions
This adds a simple reduction vectorization capability to the
non-loop vectorizer. Simple meaning it lacks any of the fancy
ways to generate the reduction epilogue but only supports
those we can handle via a direct internal function reducing
a vector to a scalar. One of the main reasons is to avoid
massive refactoring at this point but also that more complex
epilogue operations are hardly profitable.
Mixed sign reductions are for now fend off and I'm not finally
settled with whether we want an explicit SLP node for the
reduction epilogue operation. Handling mixed signs could be
done by multiplying with a { 1, -1, .. } vector. Fend off
are also reductions with non-internal operands (constants
or register parameters for example).
Costing is done by accounting the original scalar participating
stmts for the scalar cost and log2 permutes and operations for
the vectorized epilogue.
--
SPEC CPU 2017 FP with rate workload measurements show (picked
fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
538.imagick_r (1.5%). This is with -Ofast -march=znver2 on a Zen2.
Statistics on CPU 2017 shows that the overwhelming number of seeds
we find are reductions of two lanes (well - that's basically every
associative operation). That means we put a quite high pressure
on the SLP discovery process this way.
In total we find 583218 seeds we put to SLP discovery out of which
66205 pass that and only 6185 of those make it through
code generation checks. 796 of those are discarded because the reduction
is part of a larger SLP instance. 4195 of the remaining
are deemed not profitable to vectorize and 1194 are finally
vectorized. That's a poor 0.2% rate.
Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
28181 four (5%), 4808 five, 909 six and there are instances up to 120
lanes.
There's a set of 54086 candidate seeds we reject because
they contain a constant or invariant (not implemented yet) but still
have two or more lanes that could be put to SLP discovery.
2021-06-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/54400
* tree-vectorizer.h (enum slp_instance_kind): Add
slp_inst_kind_bb_reduc.
(reduction_fn_for_scalar_code): Declare.
* tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
Check SLP_INSTANCE_KIND instead of looking at the
representative.
(vect_slp_analyze_instance_alignment): Likewise.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
* tree-vect-slp.c (vect_slp_linearize_chain): Split out
chain linearization from vect_build_slp_tree_2 and generalize
for the use of BB reduction vectorization.
(vect_build_slp_tree_2): Adjust accordingly.
(vect_optimize_slp): Elide permutes at the root of BB reduction
instances.
(vectorizable_bb_reduc_epilogue): New function.
(vect_slp_prune_covered_roots): Likewise.
(vect_slp_analyze_operations): Use them.
(vect_slp_check_for_constructors): Recognize associatable
chains for BB reduction vectorization.
(vectorize_slp_instance_root_stmt): Generate code for the
BB reduction epilogue.
* gcc.dg/vect/bb-slp-pr54400.c: New testcase.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (8 preceding siblings ...)
2021-06-17 7:52 ` cvs-commit at gcc dot gnu.org
@ 2021-06-17 8:00 ` rguenth at gcc dot gnu.org
2021-09-11 14:19 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-17 8:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Declaring fixed.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/54400] recognize vector reductions
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
` (9 preceding siblings ...)
2021-06-17 8:00 ` rguenth at gcc dot gnu.org
@ 2021-09-11 14:19 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-11 14:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-09-11 14:19 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-29 6:10 [Bug target/54400] New: recognize haddpd glisse at gcc dot gnu.org
2012-09-01 9:40 ` [Bug target/54400] " glisse at gcc dot gnu.org
2012-09-03 10:13 ` [Bug middle-end/54400] " rguenth at gcc dot gnu.org
2012-09-03 10:22 ` glisse at gcc dot gnu.org
2012-10-08 20:46 ` glisse at gcc dot gnu.org
2012-10-08 21:03 ` [Bug middle-end/54400] recognize vector reductions glisse at gcc dot gnu.org
2012-10-25 8:40 ` vincenzo.innocente at cern dot ch
2021-06-08 13:36 ` rguenth at gcc dot gnu.org
2021-06-08 22:04 ` glisse at gcc dot gnu.org
2021-06-17 7:52 ` cvs-commit at gcc dot gnu.org
2021-06-17 8:00 ` rguenth at gcc dot gnu.org
2021-09-11 14:19 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).