[Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask"

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask"
@ 2013-04-03 11:51 vincenzo.innocente at cern dot ch
  2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-04-03 11:51 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829

             Bug #: 56829
           Summary: Feature request: "generic" builtin for "movemask"
    Classification: Unclassified
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


I would like to ask if is possible to add a builtin for "movemask" instructions
supporting vectors of any size (on the same line of  __builtin_shuffle)
One could call it __builtin_ballot following CUDA syntax
implementing any,all,popcnt is then rather trivial using the existing builtins

The rational is described in PR55645


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
  2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
@ 2014-07-07 12:23 ` vincenzo.innocente at cern dot ch
  2015-07-07 18:17 ` peter at cordes dot ca
  2024-03-10  6:01 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2014-07-07 12:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829

vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Feature request: "generic"  |Feature request: "generic"
                   |builtin for "movemask"      |builtin to support control
                   |                            |flow in vectorized code
                   |                            |("movemask",
                   |                            |"vec_any/all_*")

--- Comment #1 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
as gcc 4.9 is now out  I would like to come back to this request.
As more support for it
I have found this interesting talk
http://llvm.org/devmtg/2012-04-12/Slides/Ralf_Karrenberg.pdf
that from slide 17 addresses the issue of "divergent control flow" and its
implementation on cpu (in the contest of OpenCL, still the argument is fully
valid for other type of implementations) including a praise for a "a way to
express predication in IR" in slide 25.
For a general discussion and implementation see also
http://www.mcs.anl.gov/publication/introducing-control-flow-vectorized-code
and reference therein

My preference is still for a builtin that converts a mask into an integer
(movemask behavior). one can then use
_builtin_popcount, __builtin_ctz etc  to "cast" it in an bool.
for altivec, gcc implements vec_any_"cpm" and vec_all_"cpm" set of functions
that combine the comparison and the mask->int conversion.
This is a possible alternative syntax.

My understanding it that neon does not support any form of predication in its
instruction set.
(see
http://stackoverflow.com/questions/11870910/sse-mm-movemask-epi8-equivalent-method-for-arm-neon
for instance).
This is an even more compelling reason for the compiler to provide a "generic"
builtin!


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
  2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
  2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
@ 2015-07-07 18:17 ` peter at cordes dot ca
  2024-03-10  6:01 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: peter at cordes dot ca @ 2015-07-07 18:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
x86 has packed-compare and movemask instructions, but it also has a PTEST
instruction that sets flags directly from the result of a vector op.  In some
cases it's more efficient than movemsk + test/jcc (esp. if you can make use of
the AND / ANDN ops it does, instead of just testing a vector against itself).

I recently wrote an answer on stackoverflow comparing PTEST vs. PCMPEQB /
PMOVMSKB for comparing two vectors for equality.  Lower latency, but only equal
or fewer uops in this case that was ideal for PTEST.

http://stackoverflow.com/a/31198132/224132

Just something to keep in mind when designing gcc's arch-agnostic vector
support, that at least x86 can branch on vector PTEST, without needing any
compare / movemask.  Requiring things to be written in terms of a movemask
wouldn't be horrible for x86, though.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*")
  2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
  2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
  2015-07-07 18:17 ` peter at cordes dot ca
@ 2024-03-10  6:01 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-10  6:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56829

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note GCC 14 adds the ability to auto-vectorize early exist loops. I am not sure
if this helps this issue here though.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-10  6:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-03 11:51 [Bug tree-optimization/56829] New: Feature request: "generic" builtin for "movemask" vincenzo.innocente at cern dot ch
2014-07-07 12:23 ` [Bug tree-optimization/56829] Feature request: "generic" builtin to support control flow in vectorized code ("movemask", "vec_any/all_*") vincenzo.innocente at cern dot ch
2015-07-07 18:17 ` peter at cordes dot ca
2024-03-10  6:01 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).