public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask"
@ 2013-04-03 11:51 vincenzo.innocente at cern dot ch
2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-04-03 11:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
Bug #: 56829
Summary: Feature request: "generic" builtin for "movemask"
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
I would like to ask if is possible to add a builtin for "movemask" instructions
supporting vectors of any size (on the same line of __builtin_shuffle)
One could call it __builtin_ballot following CUDA syntax
implementing any,all,popcnt is then rather trivial using the existing builtins
The rational is described in PR55645
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
@ 2014-07-07 12:23 ` vincenzo.innocente at cern dot ch
2015-07-07 18:17 ` peter at cordes dot ca
2024-03-10 6:01 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2014-07-07 12:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Feature request: "generic" |Feature request: "generic"
|builtin for "movemask" |builtin to support control
| |flow in vectorized code
| |("movemask",
| |"vec_any/all_*")
--- Comment #1 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
as gcc 4.9 is now out I would like to come back to this request.
As more support for it
I have found this interesting talk
http://llvm.org/devmtg/2012-04-12/Slides/Ralf_Karrenberg.pdf
that from slide 17 addresses the issue of "divergent control flow" and its
implementation on cpu (in the contest of OpenCL, still the argument is fully
valid for other type of implementations) including a praise for a "a way to
express predication in IR" in slide 25.
For a general discussion and implementation see also
http://www.mcs.anl.gov/publication/introducing-control-flow-vectorized-code
and reference therein
My preference is still for a builtin that converts a mask into an integer
(movemask behavior). one can then use
_builtin_popcount, __builtin_ctz etc to "cast" it in an bool.
for altivec, gcc implements vec_any_"cpm" and vec_all_"cpm" set of functions
that combine the comparison and the mask->int conversion.
This is a possible alternative syntax.
My understanding it that neon does not support any form of predication in its
instruction set.
(see
http://stackoverflow.com/questions/11870910/sse-mm-movemask-epi8-equivalent-method-for-arm-neon
for instance).
This is an even more compelling reason for the compiler to provide a "generic"
builtin!
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
@ 2015-07-07 18:17 ` peter at cordes dot ca
2024-03-10 6:01 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: peter at cordes dot ca @ 2015-07-07 18:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
Peter Cordes <peter at cordes dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |peter at cordes dot ca
--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
x86 has packed-compare and movemask instructions, but it also has a PTEST
instruction that sets flags directly from the result of a vector op. In some
cases it's more efficient than movemsk + test/jcc (esp. if you can make use of
the AND / ANDN ops it does, instead of just testing a vector against itself).
I recently wrote an answer on stackoverflow comparing PTEST vs. PCMPEQB /
PMOVMSKB for comparing two vectors for equality. Lower latency, but only equal
or fewer uops in this case that was ideal for PTEST.
http://stackoverflow.com/a/31198132/224132
Just something to keep in mind when designing gcc's arch-agnostic vector
support, that at least x86 can branch on vector PTEST, without needing any
compare / movemask. Requiring things to be written in terms of a movemask
wouldn't be horrible for x86, though.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
2015-07-07 18:17 ` peter at cordes dot ca
@ 2024-03-10 6:01 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-10 6:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note GCC 14 adds the ability to auto-vectorize early exist loops. I am not sure
if this helps this issue here though.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-10 6:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
2015-07-07 18:17 ` peter at cordes dot ca
2024-03-10 6:01 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).