[Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
@ 2024-01-24 10:05 liuhongt at gcc dot gnu.org
  2024-01-24 10:07 ` [Bug tree-optimization/113576] " liuhongt at gcc dot gnu.org
                   ` (56 more replies)
  0 siblings, 57 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-24 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

            Bug ID: 113576
           Summary: [14 regression] 502.gcc_r hangs
                    r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57206
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57206&action=edit
reduced testcase attached.

reduced testcase attached. compiled with gcc -O3 -march=skylake-avx512.

Probably related to PR113539

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
@ 2024-01-24 10:07 ` liuhongt at gcc dot gnu.org
  2024-01-24 12:27 ` rguenth at gcc dot gnu.org
                   ` (55 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-24 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
int
__attribute__((noinline))
sbitmap_first_set_bit (const_sbitmap bmap)
{
  unsigned int n = 0;
  sbitmap_iterator sbi;

  EXECUTE_IF_SET_IN_SBITMAP (bmap, 0, n, sbi)
    return n;
  return -1;
}

hangs on this function, it's an vect early break case.

 78.L10:
 79        shrq    %rdx
 80        movl    %ecx, %eax
 81        incl    %ecx
 82        testb   $1, %dl
 83        je      .L10
 84        ret

hangs on this scalar part.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
  2024-01-24 10:07 ` [Bug tree-optimization/113576] " liuhongt at gcc dot gnu.org
@ 2024-01-24 12:27 ` rguenth at gcc dot gnu.org
  2024-01-24 13:37 ` rguenth at gcc dot gnu.org
                   ` (54 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-24 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-01-24
           Priority|P3                          |P1
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |wrong-code

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  Let me have a look.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
  2024-01-24 10:07 ` [Bug tree-optimization/113576] " liuhongt at gcc dot gnu.org
  2024-01-24 12:27 ` rguenth at gcc dot gnu.org
@ 2024-01-24 13:37 ` rguenth at gcc dot gnu.org
  2024-01-24 13:47 ` rguenth at gcc dot gnu.org
                   ` (53 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-24 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the change enables early exit vectorization since may_be_zero is _10 == 0
here, resulting in an overall

number_of_iterationsm1 == _10 != 0 ? _10 + 4294967295 : 0

and

number_of_iterations = MAX_EXPR <_10, 1>

We're vectorizing the induction to re-start the iteration after exit which
I think is all OK (but it must be broken...).  But we're stuck in a
not vectorized loop,

94        /* Skip bits that are zero.  */
95        for (; (i->word & 1) == 0; i->word >>= 1)
96          i->bit_num++;

   0x0000000000400890 <+528>:   shr    %rdx
   0x0000000000400893 <+531>:   mov    %ecx,%eax
   0x0000000000400895 <+533>:   inc    %ecx
=> 0x0000000000400897 <+535>:   test   $0x1,%dl
   0x000000000040089a <+538>:   je     0x400890 <sbitmap_first_set_bit+528>

and %rdx is zero.

I think the vector IL is sound.

Disabling cunroll allows the testcase to pass, so it might be an error on
the upper bound of its iterations (but I think that's OK, too).

What looks a bit odd is the condition for skipping the epilogue which
we should never do for LOOP_VINFO_EARLY_BREAKS_VECT_PEELED, we're using

      /* If we have a peeled vector iteration we will never skip the epilog
loop
         and we can simplify the cfg a lot by not doing the edge split.  */
      if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
        {
          guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
                                    niters, niters_vector_mult_vf);

here, but I think niters_vector_mult_vf is wrong (but it doesn't matter
here, still it looks bogus).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-01-24 13:37 ` rguenth at gcc dot gnu.org
@ 2024-01-24 13:47 ` rguenth at gcc dot gnu.org
  2024-01-24 13:50 ` rguenth at gcc dot gnu.org
                   ` (52 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-24 13:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |tnfchris at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
So with the niter analysis part what's different is how we deal with the
case of 'may_be_zero', for a loop with a non-empty latch a zero means
we do not execute the latch but the traditional 'niter' means the number
of latch executions.

That means the vectorizer simply implementing may_be_zero as
niter = may_be_zero ? 0 : niter is flawed (if it were that easy we
wouldn't have this extra field).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-01-24 13:47 ` rguenth at gcc dot gnu.org
@ 2024-01-24 13:50 ` rguenth at gcc dot gnu.org
  2024-01-25  1:03 ` liuhongt at gcc dot gnu.org
                   ` (51 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-24 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fe631252dc2..28ad03e0b8a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -991,8 +991,12 @@ vec_init_loop_exit_info (class loop *loop)
        {
          tree may_be_zero = niter_desc.may_be_zero;
          if ((integer_zerop (may_be_zero)
-              || integer_nonzerop (may_be_zero)
-              || COMPARISON_CLASS_P (may_be_zero))
+              /* As we are handling may_be_zero that's not false by
+                 rewriting niter to may_be_zero ? 0 : niter we require
+                 an empty latch.  */
+              || (exit->src == single_pred (loop->latch)
+                  && (integer_nonzerop (may_be_zero)
+                      || COMPARISON_CLASS_P (may_be_zero))))
              && (!candidate
                  || dominated_by_p (CDI_DOMINATORS, exit->src,
                                     candidate->src)))

fixes it, I'm testing this.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-01-24 13:50 ` rguenth at gcc dot gnu.org
@ 2024-01-25  1:03 ` liuhongt at gcc dot gnu.org
  2024-01-25  6:28 ` liuhongt at gcc dot gnu.org
                   ` (50 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  1:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Another potential buggy place is 

240  vexit_reduc_67 = mask_patt_43.28_62 & mask_patt_43.28_63;
241  if (vexit_reduc_67 == { -1, -1, -1, -1 })
242    goto <bb 6>; [94.50%]
243  else


is expanded to 

 319(insn 69 68 70 8 (set (reg:CCZ 17 flags)
 320        (compare:CCZ (reg:QI 189 [ vexit_reduc_67 ])
 321            (const_int -1 [0xffffffffffffffff]))) "test.c":83:18 discrim 1
9 {*cmpqi_1}

But it should only test the lower 4 bits, the higher part is zeroed by avx512
comparison instructions.

 293(insn 65 64 66 8 (set (reg:QI 186 [ mask_patt_43.28_62 ])
 294        (unspec:QI [
 295                (reg:V4DI 124 [ vect__29.26 ])
 296                (reg:V4DI 185)
 297                (const_int 0 [0])
 298            ] UNSPEC_PCMP)) "test.c":83:18 discrim 1 2811
{avx512vl_cmpv4di3}
 299     (nil))
 300(insn 66 65 67 8 (set (reg:V4DI 187)
 301        (const_vector:V4DI [
 302                (const_int 0 [0]) repeated x4
 303            ])) "test.c":83:18 discrim 1 2021 {movv4di_internal}
 304     (nil))
 305(insn 67 66 68 8 (set (reg:QI 188 [ mask_patt_43.28_63 ])
 306        (unspec:QI [
 307                (reg:V4DI 125 [ vect__29.27 ])
 308                (reg:V4DI 187)
 309                (const_int 0 [0])
 310            ] UNSPEC_PCMP)) "test.c":83:18 discrim 1 2811
{avx512vl_cmpv4di3}
 311     (nil))
 312(insn 68 67 69 8 (parallel [
 313            (set (reg:QI 189 [ vexit_reduc_67 ])
 314                (and:QI (reg:QI 186 [ mask_patt_43.28_62 ])
 315                    (reg:QI 188 [ mask_patt_43.28_63 ])))
 316            (clobber (reg:CC 17 flags))
 317        ]) "test.c":83:18 discrim 1 618 {*andqi_1}
 318     (nil))

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-01-25  1:03 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  6:28 ` liuhongt at gcc dot gnu.org
  2024-01-25  7:02 ` liuhongt at gcc dot gnu.org
                   ` (49 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  6:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #7 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 1fd957288d4..33a8d539b4d 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8032,7 +8032,7 @@ native_encode_vector_part (const_tree expr, unsigned char
*ptr, int len,

       unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits;
       unsigned int first_elt = off * elts_per_byte;
-      unsigned int extract_elts = extract_bytes * elts_per_byte;
+      unsigned int extract_elts = count;
       for (unsigned int i = 0; i < extract_elts; ++i)
        {
          tree elt = VECTOR_CST_ELT (expr, first_elt + i);

Shouldn't we use count here?(it also fixed the hanged issue).

Also even vector_boolean_type has only 4 elements, VECTOR_CST_ELT (expr, 5)
still return -1, not sure if it's reasonable.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-01-25  6:28 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  7:02 ` liuhongt at gcc dot gnu.org
  2024-01-25  7:39 ` cvs-commit at gcc dot gnu.org
                   ` (48 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  7:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

maybe 

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 1fd957288d4..6d321f9baef 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8035,6 +8035,9 @@ native_encode_vector_part (const_tree expr, unsigned char
*ptr, int len,
       unsigned int extract_elts = extract_bytes * elts_per_byte;
       for (unsigned int i = 0; i < extract_elts; ++i)
        {
+         /* Don't encode any bit beyond the range of the vector.  */
+         if (first_elt + i > count)
+           break;

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-01-25  7:02 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  7:39 ` cvs-commit at gcc dot gnu.org
  2024-01-25  8:34 ` liuhongt at gcc dot gnu.org
                   ` (47 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-25  7:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:578c7b91f418ebbef1bf169117815409e06f5197

commit r14-8413-g578c7b91f418ebbef1bf169117815409e06f5197
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 24 14:55:49 2024 +0100

    tree-optimization/113576 - non-empty latch and may_be_zero vectorization

    We can't support niters with may_be_zero when we end up with a
    non-empty latch due to early exit peeling.  At least not in
    the simplistic way the vectorizer handles this now.  Disallow
    it again for exits that are not the last one.

            PR tree-optimization/113576
            * tree-vect-loop.cc (vec_init_loop_exit_info): Only allow
            exits with may_be_zero niters when its the last one.

            * gcc.dg/vect/pr113576.c: New testcase.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-01-25  7:39 ` cvs-commit at gcc dot gnu.org
@ 2024-01-25  8:34 ` liuhongt at gcc dot gnu.org
  2024-01-25  8:52 ` rguenth at gcc dot gnu.org
                   ` (46 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-01-25  8:34 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  8:52 ` rguenth at gcc dot gnu.org
  2024-01-25  9:15 ` rguenther at suse dot de
                   ` (45 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-25  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #8)
> maybe 
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 1fd957288d4..6d321f9baef 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8035,6 +8035,9 @@ native_encode_vector_part (const_tree expr, unsigned
> char *ptr, int len,
>        unsigned int extract_elts = extract_bytes * elts_per_byte;
>        for (unsigned int i = 0; i < extract_elts; ++i)
>         {
> +         /* Don't encode any bit beyond the range of the vector.  */
> +         if (first_elt + i > count)
> +           break;

Hmm.  I think that VECTOR_CST_ELT should have ICEd for out-of-bound
element queries but it seems to make up elements for us here.  Richard?

But yes, we do

      unsigned int extract_elts = extract_bytes * elts_per_byte;

and since native_encode_* and native_interpret_* operate on bytes we have
difficulties dealing with bit-precision entities with padding.

There's either the possibility to fail encoding when that happens or
do something else.  Note that RTL expansion will do

    case VECTOR_CST:
      {
        tree tmp = NULL_TREE; 
        if (VECTOR_MODE_P (mode))
          return const_vector_from_tree (exp);
        scalar_int_mode int_mode;
        if (is_int_mode (mode, &int_mode))
          {
            tree type_for_mode = lang_hooks.types.type_for_mode (int_mode, 1);
            if (type_for_mode)
              tmp = fold_unary_loc (loc, VIEW_CONVERT_EXPR,
                                    type_for_mode, exp);

which I think should always succeed (otherwise it falls back to expanding
a CTOR).  That means failing to encode/interpret might get into
store_constructor which I think will zero a register destination and thus
fill padding with zeros.

So yeah, something like this looks OK, but I think instead of only
testing against 'count' we should also test against TYPE_VECTOR_SUBPARTS
(that might be variable, so with known_gt).

Would be interesting to see whether this fixes the issue without the
now installed patch.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-01-25  8:52 ` rguenth at gcc dot gnu.org
@ 2024-01-25  9:15 ` rguenther at suse dot de
  2024-01-25  9:26 ` rsandifo at gcc dot gnu.org
                   ` (44 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenther at suse dot de @ 2024-01-25  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 25 Jan 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576
> 
> --- Comment #7 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 1fd957288d4..33a8d539b4d 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8032,7 +8032,7 @@ native_encode_vector_part (const_tree expr, unsigned char
> *ptr, int len,
> 
>        unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits;
>        unsigned int first_elt = off * elts_per_byte;
> -      unsigned int extract_elts = extract_bytes * elts_per_byte;
> +      unsigned int extract_elts = count;
>        for (unsigned int i = 0; i < extract_elts; ++i)
>         {
>           tree elt = VECTOR_CST_ELT (expr, first_elt + i);
> 
> Shouldn't we use count here?(it also fixed the hanged issue).

extract_bytes is capped by the buffer 'len':

      int total_bytes = CEIL (elt_bits * count, BITS_PER_UNIT);
..
      int extract_bytes = MIN (len, total_bytes - off);

we'd still need to effectively do that.  But yeah, using CEIL
makes extract_elts off.  Maybe we should simply calculate
extract_bits instead (but then use uint64 for that)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-01-25  9:15 ` rguenther at suse dot de
@ 2024-01-25  9:26 ` rsandifo at gcc dot gnu.org
  2024-01-25 10:09 ` liuhongt at gcc dot gnu.org
                   ` (43 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-25  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #13 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
I don't think there's any principle that upper bits must be zero.
How do we end up with a pattern that depends on that being the case?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-01-25  9:26 ` rsandifo at gcc dot gnu.org
@ 2024-01-25 10:09 ` liuhongt at gcc dot gnu.org
  2024-01-25 10:13 ` rguenth at gcc dot gnu.org
                   ` (42 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED

--- Comment #14 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
The testcase attached is ok now, but 502.gcc_r is miscompiled after the
installed patch.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-01-25 10:09 ` liuhongt at gcc dot gnu.org
@ 2024-01-25 10:13 ` rguenth at gcc dot gnu.org
  2024-01-25 12:24 ` rsandifo at gcc dot gnu.org
                   ` (41 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-25 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #13)
> I don't think there's any principle that upper bits must be zero.
> How do we end up with a pattern that depends on that being the case?

I think the problem is the cbranch pattern which looks at all of the
QImode mask - but of course it doesn't know it's really V4BImode it's
working on ...

If there's no principle that the upper bits should be zero I think we
need a way for the target to say so.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2024-01-25 10:13 ` rguenth at gcc dot gnu.org
@ 2024-01-25 12:24 ` rsandifo at gcc dot gnu.org
  2024-01-25 13:03 ` tnfchris at gcc dot gnu.org
                   ` (40 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-25 12:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #16 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #15)
> I think the problem is the cbranch pattern which looks at all of the
> QImode mask - but of course it doesn't know it's really V4BImode it's
> working on ...
Yeah.  Currently building an x86_64 toolchain to have a look, but I think
whatever code uses a cbranch with a higher precision than the inputs should
mask off the significant bits beforehand.  cbranch can also be used for
comparing two variable masks too.

I suppose we could add a hook to say that padding bits of an integer mask must
always be zero (and so it's expand's job to ensure that that holds for any mask
operation).  But it feels dangerously close to TRULY_NOOP_TRUNCATION for
integers.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2024-01-25 12:24 ` rsandifo at gcc dot gnu.org
@ 2024-01-25 13:03 ` tnfchris at gcc dot gnu.org
  2024-01-25 13:28 ` rsandifo at gcc dot gnu.org
                   ` (39 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-25 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #17 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Well the mid-end has generated the right precision. The type it generates is  
vector(4) <signed-boolean:1> vexit_reduc_67;
so it does say it's a single bit boolean.

Isn't this just an expand problem?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2024-01-25 13:03 ` tnfchris at gcc dot gnu.org
@ 2024-01-25 13:28 ` rsandifo at gcc dot gnu.org
  2024-01-25 14:12 ` rguenther at suse dot de
                   ` (38 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-25 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #18 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #17)
> Well the mid-end has generated the right precision. The type it generates is
> vector(4) <signed-boolean:1> vexit_reduc_67;
> so it does say it's a single bit boolean.
> 
> Isn't this just an expand problem?
That's what I meant.  expand is using a QImode comparison to compare things
with 4-bit precision, so I think the masking should happen at that point.

How about doing the masking in do_compare_and_jump?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2024-01-25 13:28 ` rsandifo at gcc dot gnu.org
@ 2024-01-25 14:12 ` rguenther at suse dot de
  2024-01-26  2:24 ` liuhongt at gcc dot gnu.org
                   ` (37 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenther at suse dot de @ 2024-01-25 14:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 25 Jan 2024, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576
> 
> --- Comment #18 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
> (In reply to Tamar Christina from comment #17)
> > Well the mid-end has generated the right precision. The type it generates is
> > vector(4) <signed-boolean:1> vexit_reduc_67;
> > so it does say it's a single bit boolean.
> > 
> > Isn't this just an expand problem?
> That's what I meant.  expand is using a QImode comparison to compare things
> with 4-bit precision, so I think the masking should happen at that point.
> 
> How about doing the masking in do_compare_and_jump?

That sounds sensible.

Note that I wonder how to eliminate redundant maskings?  I suppose
eventually combine tracking nonzero bits where obvious would do
that?  For example for cmp:V4SI we know the bits will be zero but
I wonder if the RTL IL is obvious enough to derive this (or whether
there's a target hook for extra nonzero bit discovery, say for
unspecs).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2024-01-25 14:12 ` rguenther at suse dot de
@ 2024-01-26  2:24 ` liuhongt at gcc dot gnu.org
  2024-01-26  3:12 ` liuhongt at gcc dot gnu.org
                   ` (36 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-26  2:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #20 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> Note that I wonder how to eliminate redundant maskings?  I suppose
> eventually combine tracking nonzero bits where obvious would do
> that?  For example for cmp:V4SI we know the bits will be zero but
> I wonder if the RTL IL is obvious enough to derive this (or whether
> there's a target hook for extra nonzero bit discovery, say for
> unspecs).

I guess we need extra patterns to make combine know, we already have those for
zero_extend.

3970;; Since vpcmpd implicitly clear the upper bits of dest, transform
 3971;; vpcmpd + zero_extend to vpcmpd since the instruction
 3972(define_insn_and_split
"*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>"
 3973  [(set (match_operand:SWI248x 0 "register_operand")
 3974        (zero_extend:SWI248x
 3975          (unspec:<V48H_AVX512VL:avx512fmaskmode>
 3976            [(match_operand:V48H_AVX512VL 1 "nonimmediate_operand")
 3977             (match_operand:V48H_AVX512VL 2 "nonimmediate_operand")
 3978             (match_operand:SI 3 "const_0_to_7_operand")]
 3979            UNSPEC_PCMP)))]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2024-01-26  2:24 ` liuhongt at gcc dot gnu.org
@ 2024-01-26  3:12 ` liuhongt at gcc dot gnu.org
  2024-01-26  5:43 ` liuhongt at gcc dot gnu.org
                   ` (35 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-26  3:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #21 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
typedef unsigned long mp_limb_t;
typedef long mp_size_t;
typedef unsigned long mp_bitcnt_t;

typedef mp_limb_t *mp_ptr;
typedef const mp_limb_t *mp_srcptr;

#define GMP_LIMB_BITS (sizeof(mp_limb_t) * 8)

#define GMP_LIMB_MAX (~ (mp_limb_t) 0)

mp_bitcnt_t
mpn_common_scan (mp_limb_t limb, mp_size_t i, mp_srcptr up, mp_size_t un,
                 mp_limb_t ux)
{
  unsigned cnt;

  while (limb == 0)
    {
      i++;
      if (i == un)
        return (ux == 0 ? ~(mp_bitcnt_t) 0 : un * GMP_LIMB_BITS);
      limb = ux ^ up[i];
    }
  return limb;
}

This one is miscompiled in 502.gcc_r

123  <bb 8> [local count: 862990464]:
124  _34 = ivtmp.20_20 * 32;
125  vect__5.15_59 = MEM <const vector(4) long unsigned int> [(const mp_limb_t
*)vectp.14_53 + _34 * 1];
126  mask_patt_9.16_61 = vect__5.15_59 == vect_cst__60;
127  ivtmp.20_32 = ivtmp.20_20 + 1;
128  if (mask_patt_9.16_61 == { -1, -1, -1, -1 })
129    goto <bb 5>; [94.50%]
130  else
131    goto <bb 9>; [5.50%]


is expanded to

 30.L18:
 31        movq    %rdi, %rdx
 32        incq    %rdi
 33        salq    $5, %rdx
 34        vpcmpeqq        (%rax,%rdx), %ymm3, %k0
 35        kmovb   %k0, %edx
 36        cmpb    $-1, %dl
 37        jne     .L21

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2024-01-26  3:12 ` liuhongt at gcc dot gnu.org
@ 2024-01-26  5:43 ` liuhongt at gcc dot gnu.org
  2024-01-29 17:28 ` tnfchris at gcc dot gnu.org
                   ` (34 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-26  5:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #22 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
typedef unsigned long mp_limb_t;
typedef long mp_size_t;
typedef unsigned long mp_bitcnt_t;

typedef mp_limb_t *mp_ptr;
typedef const mp_limb_t *mp_srcptr;

#define GMP_LIMB_BITS (sizeof(mp_limb_t) * 8)

#define GMP_LIMB_MAX (~ (mp_limb_t) 0)

mp_bitcnt_t
__attribute__((noipa))
mpn_common_scan (mp_limb_t limb, mp_size_t i, mp_srcptr up, mp_size_t un,
                 mp_limb_t ux)
{
  unsigned cnt;

  while (limb == 0)
    {
      i++;
      if (i == un)
        return (ux == 0 ? ~(mp_bitcnt_t) 0 : un * GMP_LIMB_BITS);
      limb = ux ^ up[i];
    }
  return limb;
}

int main ()
{
  mp_limb_t up[10000];
  for (int i = 0; i != 10000; i++)
    up[i] = 1 << 8;
  up[2000] = 1;
  mp_bitcnt_t res = mpn_common_scan (0, 0, up, 10000, 1 << 8);
  if (res != 257)
    __builtin_abort ();
  return 1;
}


aborted with -O3 -march=skylake-avx512.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2024-01-26  5:43 ` liuhongt at gcc dot gnu.org
@ 2024-01-29 17:28 ` tnfchris at gcc dot gnu.org
  2024-01-29 18:31 ` tnfchris at gcc dot gnu.org
                   ` (33 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-29 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |acoplan at gcc dot gnu.org

--- Comment #23 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
*** Bug 113661 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2024-01-29 17:28 ` tnfchris at gcc dot gnu.org
@ 2024-01-29 18:31 ` tnfchris at gcc dot gnu.org
  2024-01-30  3:29 ` liuhongt at gcc dot gnu.org
                   ` (32 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-29 18:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Just to avoid confusion, are you still working on this one Richi?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2024-01-29 18:31 ` tnfchris at gcc dot gnu.org
@ 2024-01-30  3:29 ` liuhongt at gcc dot gnu.org
  2024-01-30  7:46 ` tnfchris at gcc dot gnu.org
                   ` (31 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-30  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #25 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #24)
> Just to avoid confusion, are you still working on this one Richi?

I'm working on a patch to add a target hook as #c18 mentioned.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2024-01-30  3:29 ` liuhongt at gcc dot gnu.org
@ 2024-01-30  7:46 ` tnfchris at gcc dot gnu.org
  2024-01-30  8:26 ` rguenth at gcc dot gnu.org
                   ` (30 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-30  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #26 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Ah great, just checking it wasn't left unattended :)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2024-01-30  7:46 ` tnfchris at gcc dot gnu.org
@ 2024-01-30  8:26 ` rguenth at gcc dot gnu.org
  2024-01-30  8:34 ` liuhongt at gcc dot gnu.org
                   ` (29 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-30  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #25)
> (In reply to Tamar Christina from comment #24)
> > Just to avoid confusion, are you still working on this one Richi?
> 
> I'm working on a patch to add a target hook as #c18 mentioned.

Not sure a target hook was suggested - I think it was suggested that
do_compare_and_jump always masks excess bits for integer mode vector masks?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2024-01-30  8:26 ` rguenth at gcc dot gnu.org
@ 2024-01-30  8:34 ` liuhongt at gcc dot gnu.org
  2024-01-30  8:56 ` rguenth at gcc dot gnu.org
                   ` (28 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-30  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #28 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
I saw we already maskoff integral modes for vector mask in store_constructor

        /* Use sign-extension for uniform boolean vectors with
           integer modes and single-bit mask entries.
           Effectively "vec_duplicate" for bitmasks.  */
        if (elt_size == 1
            && !TREE_SIDE_EFFECTS (exp)
            && VECTOR_BOOLEAN_TYPE_P (type)
            && SCALAR_INT_MODE_P (TYPE_MODE (type))
            && (elt = uniform_vector_p (exp))
            && !VECTOR_TYPE_P (TREE_TYPE (elt)))
          {
            rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
                                 expand_normal (elt));
            rtx tmp = gen_reg_rtx (mode);
            convert_move (tmp, op0, 0);

            /* Ensure no excess bits are set.
               GCN needs this for nunits < 64.
               x86 needs this for nunits < 8.  */
            auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
            if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
              tmp = expand_binop (mode, and_optab, tmp,
                                  GEN_INT ((1 << nunits) - 1), target,
                                  true, OPTAB_WIDEN);
            if (tmp != target)
              emit_move_insn (target, tmp);
            break;
          }

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2024-01-30  8:34 ` liuhongt at gcc dot gnu.org
@ 2024-01-30  8:56 ` rguenth at gcc dot gnu.org
  2024-01-30 10:12 ` rsandifo at gcc dot gnu.org
                   ` (27 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-30  8:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #28)
> I saw we already maskoff integral modes for vector mask in store_constructor
> 
> 	/* Use sign-extension for uniform boolean vectors with
> 	   integer modes and single-bit mask entries.
> 	   Effectively "vec_duplicate" for bitmasks.  */
> 	if (elt_size == 1
> 	    && !TREE_SIDE_EFFECTS (exp)
> 	    && VECTOR_BOOLEAN_TYPE_P (type)
> 	    && SCALAR_INT_MODE_P (TYPE_MODE (type))
> 	    && (elt = uniform_vector_p (exp))
> 	    && !VECTOR_TYPE_P (TREE_TYPE (elt)))
> 	  {
> 	    rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
> 				 expand_normal (elt));
> 	    rtx tmp = gen_reg_rtx (mode);
> 	    convert_move (tmp, op0, 0);
> 
> 	    /* Ensure no excess bits are set.
> 	       GCN needs this for nunits < 64.
> 	       x86 needs this for nunits < 8.  */
> 	    auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> 	    if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
> 	      tmp = expand_binop (mode, and_optab, tmp,
> 				  GEN_INT ((1 << nunits) - 1), target,
> 				  true, OPTAB_WIDEN);
> 	    if (tmp != target)
> 	      emit_move_insn (target, tmp);
> 	    break;
> 	  }

But that's just for CONSTRUCTORs, we got the VIEW_CONVERT_EXPR path for
VECTOR_CSTs.  But yeah, that _might_ argue we should perform the same
masking for VECTOR_CST expansion as well, instead of trying to fixup
in do_compare_and_jump?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2024-01-30  8:56 ` rguenth at gcc dot gnu.org
@ 2024-01-30 10:12 ` rsandifo at gcc dot gnu.org
  2024-01-30 10:39 ` rguenther at suse dot de
                   ` (26 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-01-30 10:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #30 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #29)
> But that's just for CONSTRUCTORs, we got the VIEW_CONVERT_EXPR path for
> VECTOR_CSTs.  But yeah, that _might_ argue we should perform the same
> masking for VECTOR_CST expansion as well, instead of trying to fixup
> in do_compare_and_jump?
But then how would ~ be implemented for things like 4-bit masks?
If we use notqi2 then I assume the upper bits could be 1 rather than 0.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2024-01-30 10:12 ` rsandifo at gcc dot gnu.org
@ 2024-01-30 10:39 ` rguenther at suse dot de
  2024-02-01  8:59 ` rguenth at gcc dot gnu.org
                   ` (25 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenther at suse dot de @ 2024-01-30 10:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #31 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 30 Jan 2024, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576
> 
> --- Comment #30 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #29)
> > But that's just for CONSTRUCTORs, we got the VIEW_CONVERT_EXPR path for
> > VECTOR_CSTs.  But yeah, that _might_ argue we should perform the same
> > masking for VECTOR_CST expansion as well, instead of trying to fixup
> > in do_compare_and_jump?
> But then how would ~ be implemented for things like 4-bit masks?
> If we use notqi2 then I assume the upper bits could be 1 rather than 0.

Yeah, I guess it's similar to expand_expr_real_1 'reduce_bit_field'
handling - we'd need to insert fixup code in strathegic places
(or for ~ use xor with the proper mask).

The difficulty is that we can't make the backend do this unless
there are insn operands that allows it to infer the real precision
of the mode.  And for most insns the excess bits are irrelevant
anyway.

Still the CTOR case showed wrong-code issues with GCN, which possibly
means it has the same issue with VECTOR_CSTs as well.  IIRC that
was that all vectors are 1024bits, and its "fake" V4SImode insns
rely on accurate masked out upper bits.  That might hint that
compares are not enough here (but for non-compares the backend
might have a chance to fixup by infering the max. number of
active elements).

If we think that compares (but that would also be compares without
jump, aka a == b | c == d) are the only problematical case we can
also fixup at the uses rather than at the defs as 'reduce_bit_field'
tries to do.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2024-01-30 10:39 ` rguenther at suse dot de
@ 2024-02-01  8:59 ` rguenth at gcc dot gnu.org
  2024-02-01  9:06 ` jakub at gcc dot gnu.org
                   ` (24 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-01  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, AVX512 knotb will invert all 8 bits and there's no knot just affecting
the lowest 4 or 2 bits.

It all feels like desaster waiting to happen ;)

For example BIT_NOT_EXPR is RTL expanded like

    case BIT_NOT_EXPR:
      op0 = expand_expr (treeop0, subtarget,
                         VOIDmode, EXPAND_NORMAL);
      if (modifier == EXPAND_STACK_PARM)
        target = 0;
      /* In case we have to reduce the result to bitfield precision
         for unsigned bitfield expand this as XOR with a proper constant
         instead.  */
      if (reduce_bit_field && TYPE_UNSIGNED (type))
        {
          int_mode = SCALAR_INT_TYPE_MODE (type);
          wide_int mask = wi::mask (TYPE_PRECISION (type),
                                    false, GET_MODE_PRECISION (int_mode));

          temp = expand_binop (int_mode, xor_optab, op0,
                               immed_wide_int_const (mask, int_mode),
                               target, 1, OPTAB_LIB_WIDEN);

so we could, for VECTOR_BOOLEAN_TYPE_P with integer mode and
effective bit-precision set reduce_bit_field and fixup the fallout
(not sure why the above is only for TYPE_UNSIGNED).

At least it feels similar and doing things the opposite for vectors
(fixing up at uses) would be odd?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2024-02-01  8:59 ` rguenth at gcc dot gnu.org
@ 2024-02-01  9:06 ` jakub at gcc dot gnu.org
  2024-02-01  9:18 ` rsandifo at gcc dot gnu.org
                   ` (23 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-01  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I guess inverting just 2 or 4 bits can be done with kxorb or knotb + kandb, but
we need to have the mask forced in some mask register.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2024-02-01  9:06 ` jakub at gcc dot gnu.org
@ 2024-02-01  9:18 ` rsandifo at gcc dot gnu.org
  2024-02-01 10:04 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2024-02-01  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #34 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #32)
> Btw, AVX512 knotb will invert all 8 bits and there's no knot just affecting
> the lowest 4 or 2 bits.
> 
> It all feels like desaster waiting to happen ;)
Yes :)

> For example BIT_NOT_EXPR is RTL expanded like
> 
>     case BIT_NOT_EXPR:
>       op0 = expand_expr (treeop0, subtarget,
>                          VOIDmode, EXPAND_NORMAL);
>       if (modifier == EXPAND_STACK_PARM)
>         target = 0;
>       /* In case we have to reduce the result to bitfield precision
>          for unsigned bitfield expand this as XOR with a proper constant
>          instead.  */
>       if (reduce_bit_field && TYPE_UNSIGNED (type))
>         {
>           int_mode = SCALAR_INT_TYPE_MODE (type);
>           wide_int mask = wi::mask (TYPE_PRECISION (type),
>                                     false, GET_MODE_PRECISION (int_mode));
>   
>           temp = expand_binop (int_mode, xor_optab, op0,
>                                immed_wide_int_const (mask, int_mode),
>                                target, 1, OPTAB_LIB_WIDEN);
> 
> so we could, for VECTOR_BOOLEAN_TYPE_P with integer mode and
> effective bit-precision set reduce_bit_field and fixup the fallout
> (not sure why the above is only for TYPE_UNSIGNED).
>
> At least it feels similar and doing things the opposite for vectors
> (fixing up at uses) would be odd?
Do you know why we take this approach for integers?  Is it for
correctness?  Or is it supposed to be more optimal?

I can imagine that, for arithmetic types, there are going to many
more instances where upper bits matter (division, right shifts,
MIN/MAX, etc.).  So perhaps reducing every result is a good
trade-off there.

But there's an argument that it should be rare for the padding
bits in a vector to matter, since very few things would look at the
padding bits anyway.  So perhaps the cost should be borne by the
operations that need canonical integers.

Not a strong opinion though, more just devil's advocate.

There again, if e.g. the x86 API guarantees memcmp equality between
two masks whose significant bits are equal, then we probably have
no choice.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2024-02-01  9:18 ` rsandifo at gcc dot gnu.org
@ 2024-02-01 10:04 ` rguenth at gcc dot gnu.org
  2024-02-07 15:08 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-01 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ams at gcc dot gnu.org

--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #34)
> (In reply to Richard Biener from comment #32)
> > Btw, AVX512 knotb will invert all 8 bits and there's no knot just affecting
> > the lowest 4 or 2 bits.
> > 
> > It all feels like desaster waiting to happen ;)
> Yes :)
> 
> > For example BIT_NOT_EXPR is RTL expanded like
> > 
> >     case BIT_NOT_EXPR:
> >       op0 = expand_expr (treeop0, subtarget,
> >                          VOIDmode, EXPAND_NORMAL);
> >       if (modifier == EXPAND_STACK_PARM)
> >         target = 0;
> >       /* In case we have to reduce the result to bitfield precision
> >          for unsigned bitfield expand this as XOR with a proper constant
> >          instead.  */
> >       if (reduce_bit_field && TYPE_UNSIGNED (type))
> >         {
> >           int_mode = SCALAR_INT_TYPE_MODE (type);
> >           wide_int mask = wi::mask (TYPE_PRECISION (type),
> >                                     false, GET_MODE_PRECISION (int_mode));
> >   
> >           temp = expand_binop (int_mode, xor_optab, op0,
> >                                immed_wide_int_const (mask, int_mode),
> >                                target, 1, OPTAB_LIB_WIDEN);
> > 
> > so we could, for VECTOR_BOOLEAN_TYPE_P with integer mode and
> > effective bit-precision set reduce_bit_field and fixup the fallout
> > (not sure why the above is only for TYPE_UNSIGNED).
> >
> > At least it feels similar and doing things the opposite for vectors
> > (fixing up at uses) would be odd?
> Do you know why we take this approach for integers?  Is it for
> correctness?  Or is it supposed to be more optimal?

It's done for correctness.  The main thing was bitfields > int which
end up as bit-precision types in GIMPLE.

> I can imagine that, for arithmetic types, there are going to many
> more instances where upper bits matter (division, right shifts,
> MIN/MAX, etc.).  So perhaps reducing every result is a good
> trade-off there.

I think it was the easiest place to fix up and to make sure later
RTL opts and backends do not interfere.

> But there's an argument that it should be rare for the padding
> bits in a vector to matter, since very few things would look at the
> padding bits anyway.  So perhaps the cost should be borne by the
> operations that need canonical integers.

But what happens when an operation not needing canonical intergers
is transformed, say by combine or simplify-rtx to one that needs?
I think the reduce_bitfield code was trying to be safe here.

Actual define_insns not needing canonical padding could do like
we now require scalar shifts - they could match a variant with
the mask canonicalization op and hope for combine eliminating that.
Of course that will explode in case it's the majority of cases ...

> Not a strong opinion though, more just devil's advocate.

Yeah.  The important thing seems to be that the info this is a
bit-precision QImode isn't lost ... which points to that using
plain integer modes was a very bad choice ...

> There again, if e.g. the x86 API guarantees memcmp equality between
> two masks whose significant bits are equal, then we probably have
> no choice.

That's a good question, possibly most relevant for the OpenMP SIMD ABI.

The AVX512 APIs use integer types everywhere, there's no intrinsic
for ktest itself, but _mm512_kortestz and kortestc and also _mm512_knot
(oddly only for HImode).  So it at least seems to - from a quick look -
be "broken" in the intrinsic API as well.  At least we have
_mm_cmp_sd_mask producing a QImode mask, _mm512_knot inverting too
many bits.  So the user must be aware of the "padding".

One could argue it's the vectorizers job to fixup then, but of course
it doesn't get to see the correct vector types here.

Given we have the fixup in CTOR expansion the issue at hand could be
fixed by mirroring that in VECTOR_CST expansion for now.

Andrew - I suppose GCN also has cbranch, can you try to check what happens
for < 8 lane vector modes there?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2024-02-01 10:04 ` rguenth at gcc dot gnu.org
@ 2024-02-07 15:08 ` rguenth at gcc dot gnu.org
  2024-02-08  1:18 ` liuhongt at gcc dot gnu.org
                   ` (20 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-07 15:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
For example with AVX512VL and the following, using -O -fgimple -mavx512vl
we get simply

        notl    %esi
        orl     %esi, %edi
        cmpb    $15, %dil
        je      .L6

typedef long v4si __attribute__((vector_size(4*sizeof(long))));
typedef v4si v4sib __attribute__((vector_mask));
typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));

void __GIMPLE (ssa) foo (v4sib v1, v4sib v2)
{
  v4sib tem;

__BB(2):
  tem_5 = ~v2_2(D);
  tem_3 = v1_1(D) | tem_5;
  tem_4 = _Literal (v4sib) { _Literal (sbool1) -1, _Literal (sbool1) -1,
_Literal (sbool1) -1, _Literal (sbool1) -1 };
  if (tem_3 == tem_4)
    goto __BB3;
  else
    goto __BB4;

__BB(3):
  __builtin_abort ();

__BB(4):
  return;
}


the question is whether that matches the semantics of GIMPLE (the padding
is inverted, too), whether it invokes undefined behavior (don't do it - it
seems for people using intrinsics that's what it is?) or whether we
should avoid affecting padding.

Note after the patch I proposed on the mailing list the constant mask is
now expanded with zero padding.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2024-02-07 15:08 ` rguenth at gcc dot gnu.org
@ 2024-02-08  1:18 ` liuhongt at gcc dot gnu.org
  2024-02-08  1:58 ` liuhongt at gcc dot gnu.org
                   ` (19 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08  1:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #37 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #36)
> For example with AVX512VL and the following, using -O -fgimple -mavx512vl
> we get simply
> 
>         notl    %esi
>         orl     %esi, %edi
>         cmpb    $15, %dil
>         je      .L6
> 
> typedef long v4si __attribute__((vector_size(4*sizeof(long))));
> typedef v4si v4sib __attribute__((vector_mask));
> typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));
> 
> void __GIMPLE (ssa) foo (v4sib v1, v4sib v2)
> {
>   v4sib tem;
> 
> __BB(2):
>   tem_5 = ~v2_2(D);
>   tem_3 = v1_1(D) | tem_5;
>   tem_4 = _Literal (v4sib) { _Literal (sbool1) -1, _Literal (sbool1) -1,
> _Literal (sbool1) -1, _Literal (sbool1) -1 };
>   if (tem_3 == tem_4)
>     goto __BB3;
>   else
>     goto __BB4;
> 
> __BB(3):
>   __builtin_abort ();
> 
> __BB(4):
>   return;
> }
> 
> 
> the question is whether that matches the semantics of GIMPLE (the padding
> is inverted, too), whether it invokes undefined behavior (don't do it - it
> seems for people using intrinsics that's what it is?) or whether we
> should avoid affecting padding.
> 
> Note after the patch I proposed on the mailing list the constant mask is
> now expanded with zero padding.

I think we should also mask off the upper bits of variable mask?

        notl    %esi
        orl     %esi, %edi
        notl    %edi
        andl    $15, %edi
        je      .L3

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2024-02-08  1:18 ` liuhongt at gcc dot gnu.org
@ 2024-02-08  1:58 ` liuhongt at gcc dot gnu.org
  2024-02-08  4:22 ` liuhongt at gcc dot gnu.org
                   ` (18 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08  1:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #38 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> I think we should also mask off the upper bits of variable mask?
> 
>         notl    %esi
>         orl     %esi, %edi
>         notl    %edi
>         andl    $15, %edi
>         je      .L3

with -mbmi, it's 

        andn    %esi, %edi, %edi
        andl    $15, %edi
        je      .L3

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2024-02-08  1:58 ` liuhongt at gcc dot gnu.org
@ 2024-02-08  4:22 ` liuhongt at gcc dot gnu.org
  2024-02-08  7:36 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08  4:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #39 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > the question is whether that matches the semantics of GIMPLE (the padding
> > is inverted, too), whether it invokes undefined behavior (don't do it - it
> > seems for people using intrinsics that's what it is?)
For the intrinisc, the instructions only care about lower bits, so it's not big
issue? And it sounds like similar issue as _BitInt(4)/_BitInt(2), I assume
there're garbage in the upper bits.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2024-02-08  4:22 ` liuhongt at gcc dot gnu.org
@ 2024-02-08  7:36 ` jakub at gcc dot gnu.org
  2024-02-08 10:35 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-08  7:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #40 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For unsigned _BitInt(4) or unsigned _BitInt(2) we mask it whenever loading from
memory or function argument or whatever other ABI specific spot (and also when
storing because that is how RTL expects it; because of that we don't mask it
when using it from say automatic variables where we know we've initialized it
ourselves).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2024-02-08  7:36 ` jakub at gcc dot gnu.org
@ 2024-02-08 10:35 ` rguenth at gcc dot gnu.org
  2024-02-08 10:48 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-08 10:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #38)
> > I think we should also mask off the upper bits of variable mask?
> > 
> >         notl    %esi
> >         orl     %esi, %edi
> >         notl    %edi
> >         andl    $15, %edi
> >         je      .L3
> 
> with -mbmi, it's 
> 
>         andn    %esi, %edi, %edi
>         andl    $15, %edi
>         je      .L3

Well, yes, the discussion in this bug was whether to do this at consumers
(that's sth new) or with all mask operations (that's how we handle
bit-precision integer operations, so it might be relatively easy to
do that - specifically spot the places eventually needing adjustment).

There's do_store_flag to fixup for uses not in branches and
do_compare_and_jump for conditional jumps.

Note the AND is removed by combine if I add it:

Successfully matched this instruction:
(set (reg:CCZ 17 flags)
    (compare:CCZ (and:HI (not:HI (subreg:HI (reg:QI 102 [ tem_3 ]) 0))
            (const_int 15 [0xf]))
        (const_int 0 [0])))

(*testhi_not)

-    9: {r103:QI=r102:QI&0xf;clobber flags:CC;}
+      REG_DEAD r99:QI
+    9: NOTE_INSN_DELETED
+   12: flags:CCZ=cmp(~r102:QI#0&0xf,0)
       REG_DEAD r102:QI
-      REG_UNUSED flags:CC
-   12: flags:CCZ=cmp(r103:QI,0xf)
-      REG_DEAD r103:QI

and we get

foo:
.LFB0:
        .cfi_startproc
        notl    %esi
        orl     %esi, %edi
        notl    %edi
        testb   $15, %dil
        je      .L6
        ret

which I'm not sure is OK?

diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index e2d2b3cb111..784707c1e55 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -1266,6 +1266,7 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum
rtx_code signed_code,
   machine_mode mode;
   int unsignedp;
   enum rtx_code code;
+  unsigned HOST_WIDE_INT nunits;

   /* Don't crash if the comparison was erroneous.  */
   op0 = expand_normal (treeop0);
@@ -1308,6 +1309,18 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum
rtx_code signed_code,
       emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
       op1 = new_op1;
     }
+  else if (VECTOR_BOOLEAN_TYPE_P (type)
+          && mode == QImode
+          && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits)
+          && nunits < BITS_PER_UNIT)
+    {
+      op0 = expand_binop (mode, and_optab, op0,
+                         GEN_INT ((1 << nunits) - 1), NULL_RTX,
+                         true, OPTAB_WIDEN);
+      op1 = expand_binop (mode, and_optab, op1,
+                         GEN_INT ((1 << nunits) - 1), NULL_RTX,
+                         true, OPTAB_WIDEN);
+    }

   do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode,
                           ((mode == BLKmode)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2024-02-08 10:35 ` rguenth at gcc dot gnu.org
@ 2024-02-08 10:48 ` rguenth at gcc dot gnu.org
  2024-02-08 14:58 ` liuhongt at gcc dot gnu.org
                   ` (14 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-08 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
And the do_store_flag part:

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fc5e998e329..44d64274071 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13693,6 +13693,19 @@ do_store_flag (sepops ops, rtx target, machine_mode
mode)
     subtarget = 0;

   expand_operands (arg0, arg1, subtarget, &op0, &op1, EXPAND_NORMAL);
+  unsigned HOST_WIDE_INT nunits;
+  if (VECTOR_BOOLEAN_TYPE_P (type)
+      && operand_mode == QImode
+      && TYPE_VECTOR_SUBPARTS (type).is_constant (&nunits)
+      && nunits < BITS_PER_UNIT)
+    {
+      op0 = expand_binop (mode, and_optab, op0,
+                         GEN_INT ((1 << nunits) - 1), NULL_RTX,
+                         true, OPTAB_WIDEN);
+      op1 = expand_binop (mode, and_optab, op1,
+                         GEN_INT ((1 << nunits) - 1), NULL_RTX,
+                         true, OPTAB_WIDEN);
+    }

   if (target == 0)
     target = gen_reg_rtx (mode);


for the testcase

typedef long v4si __attribute__((vector_size(4*sizeof(long))));
typedef v4si v4sib __attribute__((vector_mask));
typedef _Bool sbool1 __attribute__((signed_bool_precision(1)));
_Bool x;
void __GIMPLE (ssa) foo (v4sib v1, v4sib v2)
{
  v4sib tem;
  _Bool _7;

__BB(2):
  tem_5 = ~v2_2(D);
  tem_3 = v1_1(D) | tem_5;
  tem_4 = _Literal (v4sib) { _Literal (sbool1) -1, _Literal (sbool1) -1,
_Literal (sbool1) -1, _Literal (sbool1) -1 };
  _7 = tem_3 == tem_4;
  x = _7;
  return;
}

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2024-02-08 10:48 ` rguenth at gcc dot gnu.org
@ 2024-02-08 14:58 ` liuhongt at gcc dot gnu.org
  2024-02-08 15:07 ` liuhongt at gcc dot gnu.org
                   ` (13 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #43 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> Well, yes, the discussion in this bug was whether to do this at consumers
> (that's sth new) or with all mask operations (that's how we handle
> bit-precision integer operations, so it might be relatively easy to
> do that - specifically spot the places eventually needing adjustment).
> 
> There's do_store_flag to fixup for uses not in branches and
> do_compare_and_jump for conditional jumps.

reasonable enough for me.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2024-02-08 14:58 ` liuhongt at gcc dot gnu.org
@ 2024-02-08 15:07 ` liuhongt at gcc dot gnu.org
  2024-02-08 15:19 ` liuhongt at gcc dot gnu.org
                   ` (12 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #44 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> 
> Note the AND is removed by combine if I add it:
> 
> Successfully matched this instruction:
> (set (reg:CCZ 17 flags)
>     (compare:CCZ (and:HI (not:HI (subreg:HI (reg:QI 102 [ tem_3 ]) 0))
>             (const_int 15 [0xf]))
>         (const_int 0 [0])))
> 
> (*testhi_not)
> 
> -    9: {r103:QI=r102:QI&0xf;clobber flags:CC;}
> +      REG_DEAD r99:QI
> +    9: NOTE_INSN_DELETED
> +   12: flags:CCZ=cmp(~r102:QI#0&0xf,0)
>        REG_DEAD r102:QI
> -      REG_UNUSED flags:CC
> -   12: flags:CCZ=cmp(r103:QI,0xf)
> -      REG_DEAD r103:QI
> 
> and we get
> 
> foo:
> .LFB0:
>         .cfi_startproc
>         notl    %esi
>         orl     %esi, %edi
>         notl    %edi
>         testb   $15, %dil
>         je      .L6
>         ret
> 
> which I'm not sure is OK?
> 

Yes, I think it's on purpose

11508;; Split and;cmp (as optimized by combine) into not;test
11509;; Except when TARGET_BMI provides andn (*andn_<mode>_ccno).
11510(define_insn_and_split "*test<mode>_not"
11511  [(set (reg:CCZ FLAGS_REG)
11512        (compare:CCZ
11513          (and:SWI
11514            (not:SWI (match_operand:SWI 0 "register_operand"))
11515            (match_operand:SWI 1 "<nonmemory_szext_operand>"))
11516          (const_int 0)))]
11517  "ix86_pre_reload_split ()
11518   && (!TARGET_BMI || !REG_P (operands[1]))"
11519  "#"
11520  "&& 1"
11521  [(set (match_dup 2) (not:SWI (match_dup 0)))
11522   (set (reg:CCZ FLAGS_REG)
11523        (compare:CCZ (and:SWI (match_dup 2) (match_dup 1))
11524                     (const_int 0)))]
11525  "operands[2] = gen_reg_rtx (<MODE>mode);")
11526
11527;; Split and;cmp (as optimized by combine) into andn;cmp $0
11528(define_insn_and_split "*test<mode>_not_doubleword"

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2024-02-08 15:07 ` liuhongt at gcc dot gnu.org
@ 2024-02-08 15:19 ` liuhongt at gcc dot gnu.org
  2024-02-14 12:07 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-08 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #45 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> > There's do_store_flag to fixup for uses not in branches and
> > do_compare_and_jump for conditional jumps.
> 
> reasonable enough for me.
I mean we only handle it at consumers where upper bits matters.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2024-02-08 15:19 ` liuhongt at gcc dot gnu.org
@ 2024-02-14 12:07 ` cvs-commit at gcc dot gnu.org
  2024-02-14 12:38 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-14 12:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #46 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5352ede92483b949e811cbdcdfaec5378f3e06d6

commit r14-8975-g5352ede92483b949e811cbdcdfaec5378f3e06d6
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 9 08:15:44 2024 +0100

    middle-end/113576 - zero padding of vector bools when expanding compares

    The following zeros paddings of vector bools when expanding compares
    and the mode used for the compare is an integer mode.  In that case
    targets cannot distinguish between a 4 element and 8 element vector
    compare (both get to the QImode compare optab) so we have to do the
    job in the middle-end.

            PR middle-end/113576
            * expr.cc (do_store_flag): For vector bool compares of vectors
            with padding zero that.
            * dojump.cc (do_compare_and_jump): Likewise.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2024-02-14 12:07 ` cvs-commit at gcc dot gnu.org
@ 2024-02-14 12:38 ` rguenth at gcc dot gnu.org
  2024-02-14 13:48 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #47 from Richard Biener <rguenth at gcc dot gnu.org> ---
This should now be fixed, but I think the issue might be latent on branches for
GCN or for AVX512 via intrinsics (fully masked loops for AVX512 is also only
available in GCC 14).

Let's close this P1, we have to consider backporting when we manage to create a
miscompiled testcase for older releases.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2024-02-14 12:38 ` rguenth at gcc dot gnu.org
@ 2024-02-14 13:48 ` ubizjak at gmail dot com
  2024-02-14 13:54 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: ubizjak at gmail dot com @ 2024-02-14 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #48 from Uroš Bizjak <ubizjak at gmail dot com> ---
The runtime testcase fails on non-AVX512F x86 targets due to:

/* { dg-do run } */
/* { dg-options "-O3" } */
/* { dg-additional-options "-march=skylake-avx512" { target { x86_64-*-*
i?86-*-* } } } */

but check_vect() only checks runtime support up to AVX2.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2024-02-14 13:48 ` ubizjak at gmail dot com
@ 2024-02-14 13:54 ` rguenth at gcc dot gnu.org
  2024-02-14 13:59 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 13:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #49 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #48)
> The runtime testcase fails on non-AVX512F x86 targets due to:
> 
> /* { dg-do run } */
> /* { dg-options "-O3" } */
> /* { dg-additional-options "-march=skylake-avx512" { target { x86_64-*-*
> i?86-*-* } } } */
> 
> but check_vect() only checks runtime support up to AVX2.

Hmm, can we fix that?  We could change the above to { target avx512f_runtime }
but that really only checks for AVX512F, not say AVX512VL ...

I do remember using -mavx512vl wasn't enough to trigger the miscompile
nor did it trigger with -march=znver4 ... so I stuck to skylake-avx512 :/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2024-02-14 13:54 ` rguenth at gcc dot gnu.org
@ 2024-02-14 13:59 ` jakub at gcc dot gnu.org
  2024-02-14 14:01 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-14 13:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #50 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #49)
> (In reply to Uroš Bizjak from comment #48)
> > The runtime testcase fails on non-AVX512F x86 targets due to:
> > 
> > /* { dg-do run } */
> > /* { dg-options "-O3" } */
> > /* { dg-additional-options "-march=skylake-avx512" { target { x86_64-*-*
> > i?86-*-* } } } */
> > 
> > but check_vect() only checks runtime support up to AVX2.
> 
> Hmm, can we fix that?  We could change the above to { target avx512f_runtime
> }
> but that really only checks for AVX512F, not say AVX512VL ...
> 
> I do remember using -mavx512vl wasn't enough to trigger the miscompile
> nor did it trigger with -march=znver4 ... so I stuck to skylake-avx512 :/

It is certainly preferable to add -mavx512{bw,dq,vl} or whatever the testcase
actually needs and then one can
#define AVX512BW
#define AVX512DQ
#define AVX512VL
#include "avx512-check.h"
and get checks for all those.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (49 preceding siblings ...)
  2024-02-14 13:59 ` jakub at gcc dot gnu.org
@ 2024-02-14 14:01 ` jakub at gcc dot gnu.org
  2024-02-14 14:31 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-14 14:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #51 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
From the -mavx* options I think -march=skylake-avx512 implies
-mavx512{f,cd,vl,bw,dq} but -mavx512f is implied by any of the latter 4.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (50 preceding siblings ...)
  2024-02-14 14:01 ` jakub at gcc dot gnu.org
@ 2024-02-14 14:31 ` ubizjak at gmail dot com
  2024-02-14 14:36 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: ubizjak at gmail dot com @ 2024-02-14 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #52 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 57424
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57424&action=edit
Proposed testsuite patch

This patch fixes the failure for me (+ some other dg.exp/vect inconsistencies).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (51 preceding siblings ...)
  2024-02-14 14:31 ` ubizjak at gmail dot com
@ 2024-02-14 14:36 ` jakub at gcc dot gnu.org
  2024-02-14 15:07 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-14 14:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #53 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Comment on attachment 57424
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57424
Proposed testsuite patch

As skylake-avx512 is -mavx512{f,cd,bw,dq,vl}, requiring just avx512f effective
target and testing it at runtime IMHO isn't enough.
For dg-do run testcases I really think we should avoid those -march= options,
because it means a lot of other stuff, BMI, LZCNT, ...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (52 preceding siblings ...)
  2024-02-14 14:36 ` jakub at gcc dot gnu.org
@ 2024-02-14 15:07 ` rguenth at gcc dot gnu.org
  2024-02-14 15:26 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  56 siblings, 0 replies; 58+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-14 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #54 from Richard Biener <rguenth at gcc dot gnu.org> ---
Please also verify the bug reproduced with the altered set of options.

What's the reason to have avx512-check.h in addition to tree-vect.h?
At least for the vectorizer testsuite the latter is the canonical one,
can we please merge AVX512 support therein?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (53 preceding siblings ...)
  2024-02-14 15:07 ` rguenth at gcc dot gnu.org
@ 2024-02-14 15:26 ` ubizjak at gmail dot com
  2024-02-14 20:18 ` ubizjak at gmail dot com
  2024-02-18  2:10 ` liuhongt at gcc dot gnu.org
  56 siblings, 0 replies; 58+ messages in thread
From: ubizjak at gmail dot com @ 2024-02-14 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #55 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #53)
> Comment on attachment 57424 [details]
> Proposed testsuite patch
> 
> As skylake-avx512 is -mavx512{f,cd,bw,dq,vl}, requiring just avx512f
> effective target and testing it at runtime IMHO isn't enough.
> For dg-do run testcases I really think we should avoid those -march=
> options, because it means a lot of other stuff, BMI, LZCNT, ...

I think that addition of

+# if defined(__AVX512VL__)
+    want_level = 7, want_b = bit_AVX512VL;
+# elif defined(__AVX512F__)
+    want_level = 7, want_b = bit_AVX512F;
+# elif defined(__AVX2__)

to check_vect solves all current uses in gcc.dg/vect

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (54 preceding siblings ...)
  2024-02-14 15:26 ` ubizjak at gmail dot com
@ 2024-02-14 20:18 ` ubizjak at gmail dot com
  2024-02-18  2:10 ` liuhongt at gcc dot gnu.org
  56 siblings, 0 replies; 58+ messages in thread
From: ubizjak at gmail dot com @ 2024-02-14 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #56 from Uroš Bizjak <ubizjak at gmail dot com> ---
The testcase is fixed with g:430c772be3382134886db33133ed466c02efc71c

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs  r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
  2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
                   ` (55 preceding siblings ...)
  2024-02-14 20:18 ` ubizjak at gmail dot com
@ 2024-02-18  2:10 ` liuhongt at gcc dot gnu.org
  56 siblings, 0 replies; 58+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-02-18  2:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576

--- Comment #57 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> For dg-do run testcases I really think we should avoid those -march=
> options, because it means a lot of other stuff, BMI, LZCNT, ...

Make sense.

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2024-02-18  2:11 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24 10:05 [Bug tree-optimization/113576] New: [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c liuhongt at gcc dot gnu.org
2024-01-24 10:07 ` [Bug tree-optimization/113576] " liuhongt at gcc dot gnu.org
2024-01-24 12:27 ` rguenth at gcc dot gnu.org
2024-01-24 13:37 ` rguenth at gcc dot gnu.org
2024-01-24 13:47 ` rguenth at gcc dot gnu.org
2024-01-24 13:50 ` rguenth at gcc dot gnu.org
2024-01-25  1:03 ` liuhongt at gcc dot gnu.org
2024-01-25  6:28 ` liuhongt at gcc dot gnu.org
2024-01-25  7:02 ` liuhongt at gcc dot gnu.org
2024-01-25  7:39 ` cvs-commit at gcc dot gnu.org
2024-01-25  8:34 ` liuhongt at gcc dot gnu.org
2024-01-25  8:52 ` rguenth at gcc dot gnu.org
2024-01-25  9:15 ` rguenther at suse dot de
2024-01-25  9:26 ` rsandifo at gcc dot gnu.org
2024-01-25 10:09 ` liuhongt at gcc dot gnu.org
2024-01-25 10:13 ` rguenth at gcc dot gnu.org
2024-01-25 12:24 ` rsandifo at gcc dot gnu.org
2024-01-25 13:03 ` tnfchris at gcc dot gnu.org
2024-01-25 13:28 ` rsandifo at gcc dot gnu.org
2024-01-25 14:12 ` rguenther at suse dot de
2024-01-26  2:24 ` liuhongt at gcc dot gnu.org
2024-01-26  3:12 ` liuhongt at gcc dot gnu.org
2024-01-26  5:43 ` liuhongt at gcc dot gnu.org
2024-01-29 17:28 ` tnfchris at gcc dot gnu.org
2024-01-29 18:31 ` tnfchris at gcc dot gnu.org
2024-01-30  3:29 ` liuhongt at gcc dot gnu.org
2024-01-30  7:46 ` tnfchris at gcc dot gnu.org
2024-01-30  8:26 ` rguenth at gcc dot gnu.org
2024-01-30  8:34 ` liuhongt at gcc dot gnu.org
2024-01-30  8:56 ` rguenth at gcc dot gnu.org
2024-01-30 10:12 ` rsandifo at gcc dot gnu.org
2024-01-30 10:39 ` rguenther at suse dot de
2024-02-01  8:59 ` rguenth at gcc dot gnu.org
2024-02-01  9:06 ` jakub at gcc dot gnu.org
2024-02-01  9:18 ` rsandifo at gcc dot gnu.org
2024-02-01 10:04 ` rguenth at gcc dot gnu.org
2024-02-07 15:08 ` rguenth at gcc dot gnu.org
2024-02-08  1:18 ` liuhongt at gcc dot gnu.org
2024-02-08  1:58 ` liuhongt at gcc dot gnu.org
2024-02-08  4:22 ` liuhongt at gcc dot gnu.org
2024-02-08  7:36 ` jakub at gcc dot gnu.org
2024-02-08 10:35 ` rguenth at gcc dot gnu.org
2024-02-08 10:48 ` rguenth at gcc dot gnu.org
2024-02-08 14:58 ` liuhongt at gcc dot gnu.org
2024-02-08 15:07 ` liuhongt at gcc dot gnu.org
2024-02-08 15:19 ` liuhongt at gcc dot gnu.org
2024-02-14 12:07 ` cvs-commit at gcc dot gnu.org
2024-02-14 12:38 ` rguenth at gcc dot gnu.org
2024-02-14 13:48 ` ubizjak at gmail dot com
2024-02-14 13:54 ` rguenth at gcc dot gnu.org
2024-02-14 13:59 ` jakub at gcc dot gnu.org
2024-02-14 14:01 ` jakub at gcc dot gnu.org
2024-02-14 14:31 ` ubizjak at gmail dot com
2024-02-14 14:36 ` jakub at gcc dot gnu.org
2024-02-14 15:07 ` rguenth at gcc dot gnu.org
2024-02-14 15:26 ` ubizjak at gmail dot com
2024-02-14 20:18 ` ubizjak at gmail dot com
2024-02-18  2:10 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).