public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL
@ 2024-02-29 11:28 rguenth at gcc dot gnu.org
  2024-02-29 11:30 ` [Bug tree-optimization/114164] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-29 11:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

            Bug ID: 114164
           Summary: simdclone vectorization creates unsupported IL
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

For g++.dg/vect/pr68762-1.cc the simdclone vectorization with -mavx creates

   mask__18.266_50 = vect__8.265_48 == { 0, 0, 0, 0 };
   _52 = VEC_COND_EXPR <mask__18.266_50, { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 }, {
0.0, 0.0, 0.0, 0.0 }>;
   _53 = _Z3bazd.simdclone.5 (_51, _52);

but this isn't supported with AVX since here are integer vectors involved
but AVX only has FP support.

This causes the later vector lowering pass to lower this to

-  _35 = BIT_FIELD_REF <vect__8.265_48, 32, 0>;
-  _40 = _35 == 0;
-  _43 = _40 ? 1.0e+0 : 0.0;
-  _59 = BIT_FIELD_REF <vect__8.265_48, 32, 32>;
-  _62 = _59 == 0;
-  _41 = _62 ? 1.0e+0 : 0.0;
-  _60 = BIT_FIELD_REF <vect__8.265_48, 32, 64>;
-  _64 = _60 == 0;
-  _65 = _64 ? 1.0e+0 : 0.0;
-  _30 = BIT_FIELD_REF <vect__8.265_48, 32, 96>;
-  _31 = _30 == 0;
-  _27 = _31 ? 1.0e+0 : 0.0;
-  _52 = {_43, _41, _65, _27};

which is quite inefficient.

The vectorizer fails to verify the VEC_COND_EXPRs it creates are actually
supported by the target.

And the x86 target, for -mavx might actually support creating mask arguments
for in-branch OMP simd or if not it should probably not present them as
usable.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
@ 2024-02-29 11:30 ` rguenth at gcc dot gnu.org
  2024-02-29 17:42 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-29 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*
                 CC|                            |crazylht at gmail dot com,
                   |                            |jakub at gcc dot gnu.org
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm not sure who's responsible to reject this, whether the vectorizer can
expect there's a way to create the mask arguments when the simdclone is marked
usable by the target or whether it has to verify that itself.

This becomes an ICE if we move vector lowering before vectorization.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
  2024-02-29 11:30 ` [Bug tree-optimization/114164] " rguenth at gcc dot gnu.org
@ 2024-02-29 17:42 ` jakub at gcc dot gnu.org
  2024-03-01  2:29 ` liuhongt at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-29 17:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> I'm not sure who's responsible to reject this, whether the vectorizer can
> expect there's a way to create the mask arguments when the simdclone is
> marked usable by the target or whether it has to verify that itself.
> 
> This becomes an ICE if we move vector lowering before vectorization.

Wasn't this valid when VEC_COND_EXPR allowed the comparison directly in the
operand?
Or maybe I misremember.  Certainly I believe -mavx -mno-avx2 should be able to
do
256-bit conditional moves of float/double elements.
When it is in a separate statement, there is always a risk something CSEs it or
moves it away from the single user etc. such that expansion couldn't consider
it together.
I don't see any VEC_COND_EXPRs anywhere in GCC 7 pr68762-*.cc.* dumps though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
  2024-02-29 11:30 ` [Bug tree-optimization/114164] " rguenth at gcc dot gnu.org
  2024-02-29 17:42 ` jakub at gcc dot gnu.org
@ 2024-03-01  2:29 ` liuhongt at gcc dot gnu.org
  2024-03-01  9:32 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-03-01  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org

--- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Richard Biener from comment #1)
> > I'm not sure who's responsible to reject this, whether the vectorizer can
> > expect there's a way to create the mask arguments when the simdclone is
> > marked usable by the target or whether it has to verify that itself.
> > 
> > This becomes an ICE if we move vector lowering before vectorization.
> 
> Wasn't this valid when VEC_COND_EXPR allowed the comparison directly in the
> operand?
> Or maybe I misremember.  Certainly I believe -mavx -mno-avx2 should be able
> to do
> 256-bit conditional moves of float/double elements.

Here, mask is v4si which is 128-bit, and vector is v4df which is 256-bit
w/o avx512, x86 backend only supports vcond/vcond_mask with same size
(vcond{,_mask}v4sfv4si or vcond{,_mask}v4dfv4di), but not
vcond{,_mask}v4dfv4si.

BTW, we may get v4di mask from v4si mask by

        vshufps xmm1, xmm0, xmm0, 80            # xmm1 = xmm0[0,0,1,1]
        vshufps xmm0, xmm0, xmm0, 250           # xmm0 = xmm0[2,2,3,3]
        vinsertf128     ymm0, ymm1, xmm0, 1


under AVX, under AVX2 we can just use pmovsxdq

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-03-01  2:29 ` liuhongt at gcc dot gnu.org
@ 2024-03-01  9:32 ` rguenth at gcc dot gnu.org
  2024-03-01 10:12 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-01  9:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-03-01

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, yes - we're not trying to do anything special if the mask and arg mask
types have different size - we can handle different number of lanes but we
don't
try any widening/shortening tricks to make the size match.

That said, it looks like we definitely should verify in the vectorizer that
we can handle the VEC_COND_EXPR.

As of trick missing, we could do v4si ? v4sf : v4sf and then widen to v4df
or widen v4si to v4di.  Possibly the target could know what's more efficient
here.

Mine for adding the missing support check.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-03-01  9:32 ` rguenth at gcc dot gnu.org
@ 2024-03-01 10:12 ` rguenth at gcc dot gnu.org
  2024-03-04 12:41 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-01 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following fixes this, it also shows that even with -mavx2 we don't support
this (as was expected after the analysis).  Note since we emit
mask ? {true,..} : {false,...} we only support in-branch clones when the
target has corresponding vcond_mask expanders.  For vcondeq we'd need to
emit a redundant mask != mask_false_cst compare

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index be0e1a9c69d..14a3ffb5f02 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4210,6 +4210,16 @@ vectorizable_simd_clone_call (vec_info *vinfo,
stmt_vec_info stmt_info,
                                     " supported for mismatched vector
sizes.\n");
                  return false;
                }
+             if (!expand_vec_cond_expr_p (clone_arg_vectype,
+                                          arginfo[i].vectype, ERROR_MARK))
+               {
+                 if (dump_enabled_p ())
+                   dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                                    vect_location,
+                                    "cannot compute mask argument for"
+                                    " in-branch vector clones.\n");
+                 return false;
+               }
            }
          else if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
            {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-03-01 10:12 ` rguenth at gcc dot gnu.org
@ 2024-03-04 12:41 ` cvs-commit at gcc dot gnu.org
  2024-03-04 12:43 ` rguenth at gcc dot gnu.org
  2024-03-31  9:48 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-04 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a19ab1c42aba47fbfb122a6160f504565aef0943

commit r14-9295-ga19ab1c42aba47fbfb122a6160f504565aef0943
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Mar 1 11:07:21 2024 +0100

    tree-optimization/114164 - unsupported SIMD clone call, unsupported
VEC_COND

    The following avoids creating unsupported VEC_COND_EXPRs as part of
    SIMD clone call mask argument setup during vectorization which results
    in inefficient decomposing of the operation during vector lowering.

            PR tree-optimization/114164
            * tree-vect-stmts.cc (vectorizable_simd_clone_call): Fail if
            the code generated for mask argument setup is not supported.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-03-04 12:41 ` cvs-commit at gcc dot gnu.org
@ 2024-03-04 12:43 ` rguenth at gcc dot gnu.org
  2024-03-31  9:48 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-04 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
The instance I spotted is fixed now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114164] simdclone vectorization creates unsupported IL
  2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-03-04 12:43 ` rguenth at gcc dot gnu.org
@ 2024-03-31  9:48 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-31  9:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114164

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-31  9:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 11:28 [Bug tree-optimization/114164] New: simdclone vectorization creates unsupported IL rguenth at gcc dot gnu.org
2024-02-29 11:30 ` [Bug tree-optimization/114164] " rguenth at gcc dot gnu.org
2024-02-29 17:42 ` jakub at gcc dot gnu.org
2024-03-01  2:29 ` liuhongt at gcc dot gnu.org
2024-03-01  9:32 ` rguenth at gcc dot gnu.org
2024-03-01 10:12 ` rguenth at gcc dot gnu.org
2024-03-04 12:41 ` cvs-commit at gcc dot gnu.org
2024-03-04 12:43 ` rguenth at gcc dot gnu.org
2024-03-31  9:48 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).