public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
@ 2023-04-05  9:16 Jakub Jelinek
  2023-04-05 13:14 ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-05  9:16 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, Richard Sandiford, Eric Botcazou; +Cc: gcc-patches

Hi!

The following testcase is miscompiled on riscv since the addition
of *mvconst_internal define_insn_and_split.
I believe the bug is in DSE.  We have:
(insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
                (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
        (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 36 40 2 (set (reg:SI 171)
        (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
                    (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
     (nil))
and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
even different modes like in the above case, and so it optimizes it into:
(insn 47 35 39 2 (set (reg:HI 175)
        (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:HI 175)
        (nil)))
Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Combine attempts to combine the AND with the insn 47 above created by DSE,
and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
the subword operations are actually done on word mode into:
(set (subreg:SI (reg:HI 175) 0)
    (and:SI (reg:SI 167 [ m ])
        (reg:SI 168)))
and later on the ZERO_EXTEND is thrown away.

The following patch changes DSE to instead emit
(insn 47 35 39 2 (set (reg:SI 175)
        (reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:SI 175)
        (nil)))
i.e. in the new insn copy whole reg rather than subword part of it where
we don't really know anything about the upper bits.
With this change, combine manages to do the right thing, optimies the
(unsigned) (unsigned short) (reg & 0x8084c) into
reg & 0x84c.

Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not
WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv
on the testcase.  I have unfortunately no way to bootstrap this on
risc*-linux, could somebody do that?  Ok for trunk if that passes/

2023-04-05  Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
	a new REG rather than the SUBREG.

	* gcc.c-torture/execute/pr109040.c: New test.

--- gcc/dse.cc.jj	2023-01-02 09:32:50.369880943 +0100
+++ gcc/dse.cc	2023-04-04 22:17:22.906347794 +0200
@@ -2012,7 +2012,19 @@ replace_read (store_info *store_info, in
     }
   /* Force the value into a new register so that it won't be clobbered
      between the store and the load.  */
-  read_reg = copy_to_mode_reg (read_mode, read_reg);
+  if (WORD_REGISTER_OPERATIONS
+      && GET_CODE (read_reg) == SUBREG
+      && REG_P (SUBREG_REG (read_reg))
+      && GET_MODE (SUBREG_REG (read_reg)) == word_mode)
+    {
+      /* For WORD_REGISTER_OPERATIONS with subreg of word_mode register
+	 force SUBREG_REG into a new register rather than the SUBREG.  */
+      rtx r = copy_to_mode_reg (word_mode, SUBREG_REG (read_reg));
+      read_reg = shallow_copy_rtx (read_reg);
+      SUBREG_REG (read_reg) = r;
+    }
+  else
+    read_reg = copy_to_mode_reg (read_mode, read_reg);
   insns = get_insns ();
   end_sequence ();
 
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj	2023-04-04 16:57:31.739658564 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c	2023-04-04 16:57:15.238900638 +0200
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, &x);
+  if (x != (unsigned short) ~0)
+    __builtin_abort ();
+  return 0;
+}

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05  9:16 [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040] Jakub Jelinek
@ 2023-04-05 13:14 ` Jeff Law
  2023-04-05 14:51   ` Jakub Jelinek
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-05 13:14 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Richard Sandiford, Eric Botcazou
  Cc: gcc-patches



On 4/5/23 03:16, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase is miscompiled on riscv since the addition
> of *mvconst_internal define_insn_and_split.
> I believe the bug is in DSE.  We have:
> (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
>                  (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
>          (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 36 40 2 (set (reg:SI 171)
>          (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
>                      (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>       (nil))
> and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
> even different modes like in the above case, and so it optimizes it into:
> (insn 47 35 39 2 (set (reg:HI 175)
>          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:HI 175)
>          (nil)))
> Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Right.  But do we agree that the two above are equivalent?  If they are 
then changing DSE just papers over the combine issue downstream.





> Combine attempts to combine the AND with the insn 47 above created by DSE,
> and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
> the subword operations are actually done on word mode into:
> (set (subreg:SI (reg:HI 175) 0)
>      (and:SI (reg:SI 167 [ m ])
>          (reg:SI 168)))
> and later on the ZERO_EXTEND is thrown away.
And isn't that where the bug really is.  There's a patch from pan2.li in 
this exact space.  That patch still doesn't look right to me, but it 
seems to be attacking this problem closer to the right place.


> 
> The following patch changes DSE to instead emit
> (insn 47 35 39 2 (set (reg:SI 175)
>          (reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:SI 175)
>          (nil)))
> i.e. in the new insn copy whole reg rather than subword part of it where
> we don't really know anything about the upper bits.
> With this change, combine manages to do the right thing, optimies the
> (unsigned) (unsigned short) (reg & 0x8084c) into
> reg & 0x84c.
This may still be desirable but it feels like papering over the problem 
to me.


> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not
> WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv
> on the testcase.  I have unfortunately no way to bootstrap this on
> risc*-linux, could somebody do that?  Ok for trunk if that passes/
> 
> 2023-04-05  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109040
> 	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
> 	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
> 	a new REG rather than the SUBREG.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.
No problem with the patch itself.  I just want to make sure we're fixing 
the problem in the right place.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 13:14 ` Jeff Law
@ 2023-04-05 14:51   ` Jakub Jelinek
  2023-04-05 16:17     ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-05 14:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Richard Sandiford, Eric Botcazou, gcc-patches

On Wed, Apr 05, 2023 at 07:14:23AM -0600, Jeff Law wrote:
> > The following testcase is miscompiled on riscv since the addition
> > of *mvconst_internal define_insn_and_split.
> > I believe the bug is in DSE.  We have:
> > (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
> >                  (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
> >          (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
> >       (expr_list:REG_DEAD (reg:SI 166)
> >          (nil)))
> > (insn 39 36 40 2 (set (reg:SI 171)
> >          (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
> >                      (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
> >       (nil))
> > and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
> > even different modes like in the above case, and so it optimizes it into:
> > (insn 47 35 39 2 (set (reg:HI 175)
> >          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
> >       (expr_list:REG_DEAD (reg:SI 166)
> >          (nil)))
> > (insn 39 47 40 2 (set (reg:SI 171)
> >          (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
> >       (expr_list:REG_DEAD (reg:HI 175)
> >          (nil)))
> > Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
> Right.  But do we agree that the two above are equivalent?  If they are then
> changing DSE just papers over the combine issue downstream.

It is true that an instruction like
(insn 8 7 9 2 (set (reg:HI 141)
        (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
     (nil))
can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
upper bits shouldn't be random garbage in that case, it should be zero
extended or sign extended.

What happens in combine is we enter combine.cc (simplify_set) with
(set (reg:HI 175)
    (subreg:HI (and:SI (reg:SI 167 [ m ])
            (reg:SI 168)) 0))
and there trigger the
  /* If we have (set x (subreg:m1 (op:m2 ...) 0)) with OP being some operation,
     and X being a REG or (subreg (reg)), we may be able to convert this to
     (set (subreg:m2 x) (op)).

     We can always do this if M1 is narrower than M2 because that means that
     we only care about the low bits of the result.

     However, on machines without WORD_REGISTER_OPERATIONS defined, we cannot
     perform a narrower operation than requested since the high-order bits will
     be undefined.  On machine where it is defined, this transformation is safe
     as long as M1 and M2 have the same number of words.  */
transformation into:
(set (subreg:SI (reg:HI 175) 0)
    (and:SI (reg:SI 167 [ m ])
        (reg:SI 168)))
Though, it is !paradoxical_subreg_p (src) in that case, so it is done
regardless of WORD_REGISTER_OPERATIONS I think.

Then after that try_combine we do:
13325		record_value_for_reg (dest, record_dead_insn,
13326				      WORD_REGISTER_OPERATIONS
13327				      && word_register_operation_p (SET_SRC (setter))
13328				      && paradoxical_subreg_p (SET_DEST (setter))
13329				      ? SET_SRC (setter)
13330				      : gen_lowpart (GET_MODE (dest),
13331						     SET_SRC (setter)));
and the 3 conditions are true here and so record value of the whole setter.
That then records among other things nonzero_bits as 0x8084c.

Next when trying to combine
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (reg:HI 175))) "pr109040.c":10:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:HI 175)
        (nil)))
into
(insn 40 39 43 2 (set (reg:SI 172)
        (leu:SI (reg:SI 171)
            (const_int 5 [0x5]))) "pr109040.c":10:11 291 {*sleu_sisi}
     (expr_list:REG_DEAD (reg:SI 171)
        (nil)))
we i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
and that correctly simplifies it into
(and:SI (subreg:SI (reg:HI 175) 0)
    (const_int 2124 [0x84c]))
We substitute that
(leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
        (const_int 2124 [0x84c]))
    (const_int 5 [0x5]))
but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
          /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
             fits in both M1 and M2 and the SUBREG is either paradoxical
             or represents the low part, permute the SUBREG and the AND
             and try again.  */
          if (GET_CODE (XEXP (op0, 0)) == SUBREG
              && CONST_INT_P (XEXP (op0, 1)))
            {
              unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
              /* Require an integral mode, to avoid creating something like
                 (AND:SF ...).  */
              if ((is_a <scalar_int_mode>
                   (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
                  /* It is unsafe to commute the AND into the SUBREG if the
                     SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
                     not defined.  As originally written the upper bits
                     have a defined value due to the AND operation.
                     However, if we commute the AND inside the SUBREG then
                     they no longer have defined values and the meaning of
                     the code has been changed.
                     Also C1 should not change value in the smaller mode,
                     see PR67028 (a positive C1 can become negative in the
                     smaller mode, so that the AND does no longer mask the
                     upper bits).  */
                  && ((WORD_REGISTER_OPERATIONS
                       && mode_width > GET_MODE_PRECISION (tmode)
                       && mode_width <= BITS_PER_WORD
                       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
                      || (mode_width <= GET_MODE_PRECISION (tmode)
                          && subreg_lowpart_p (XEXP (op0, 0))))
                  && mode_width <= HOST_BITS_PER_WIDE_INT
                  && HWI_COMPUTABLE_MODE_P (tmode)
                  && (c1 & ~mask) == 0
                  && (c1 & ~GET_MODE_MASK (tmode)) == 0
                  && c1 != mask
                  && c1 != GET_MODE_MASK (tmode))
                {
                  op0 = simplify_gen_binary (AND, tmode,
                                             SUBREG_REG (XEXP (op0, 0)),
                                             gen_int_mode (c1, tmode));
                  op0 = gen_lowpart (mode, op0);
                  continue;
                }
            }
c1 is 0x84c.  I believe this is the exact spot where things go wrong,
and is because for WORD_REGISTER_OPERATIONS we assume something that the
DSE added instruction didn't guarantee.

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 14:51   ` Jakub Jelinek
@ 2023-04-05 16:17     ` Jeff Law
  2023-04-05 16:48       ` Jakub Jelinek
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-05 16:17 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Richard Sandiford, Eric Botcazou, gcc-patches



On 4/5/23 08:51, Jakub Jelinek wrote:
> On Wed, Apr 05, 2023 at 07:14:23AM -0600, Jeff Law wrote:
>>> The following testcase is miscompiled on riscv since the addition
>>> of *mvconst_internal define_insn_and_split.
>>> I believe the bug is in DSE.  We have:
>>> (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
>>>                   (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
>>>           (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
>>>        (expr_list:REG_DEAD (reg:SI 166)
>>>           (nil)))
>>> (insn 39 36 40 2 (set (reg:SI 171)
>>>           (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
>>>                       (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>>>        (nil))
>>> and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
>>> even different modes like in the above case, and so it optimizes it into:
>>> (insn 47 35 39 2 (set (reg:HI 175)
>>>           (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>>>        (expr_list:REG_DEAD (reg:SI 166)
>>>           (nil)))
>>> (insn 39 47 40 2 (set (reg:SI 171)
>>>           (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>>>        (expr_list:REG_DEAD (reg:HI 175)
>>>           (nil)))
>>> Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
>> Right.  But do we agree that the two above are equivalent?  If they are then
>> changing DSE just papers over the combine issue downstream.
> 
> It is true that an instruction like
> (insn 8 7 9 2 (set (reg:HI 141)
>          (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>       (nil))
> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
> upper bits shouldn't be random garbage in that case, it should be zero
> extended or sign extended.
Well, that's one of the core questions here.  What are the state of the 
upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't 
100% clear as we're not really doing any operation.

So again, I think we need to decide if the DSE transformation is correct 
or not.  I *think* we can aggree that insn 39 is OK.  It's really the 
semantics of insn 47 that I think we need to agree on.  What is the 
state of the upper 16 bits of (reg:HI 175) after insn 47?




> 
> What happens in combine is we enter combine.cc (simplify_set) with
> (set (reg:HI 175)
>      (subreg:HI (and:SI (reg:SI 167 [ m ])
>              (reg:SI 168)) 0))
> and there trigger the
>    /* If we have (set x (subreg:m1 (op:m2 ...) 0)) with OP being some operation,
>       and X being a REG or (subreg (reg)), we may be able to convert this to
>       (set (subreg:m2 x) (op)).
> 
>       We can always do this if M1 is narrower than M2 because that means that
>       we only care about the low bits of the result.
> 
>       However, on machines without WORD_REGISTER_OPERATIONS defined, we cannot
>       perform a narrower operation than requested since the high-order bits will
>       be undefined.  On machine where it is defined, this transformation is safe
>       as long as M1 and M2 have the same number of words.  */
> transformation into:
> (set (subreg:SI (reg:HI 175) 0)
>      (and:SI (reg:SI 167 [ m ])
>          (reg:SI 168)))
I think we're OK a this point.


> Though, it is !paradoxical_subreg_p (src) in that case, so it is done
> regardless of WORD_REGISTER_OPERATIONS I think.
> 
> Then after that try_combine we do:
> 13325		record_value_for_reg (dest, record_dead_insn,
> 13326				      WORD_REGISTER_OPERATIONS
> 13327				      && word_register_operation_p (SET_SRC (setter))
> 13328				      && paradoxical_subreg_p (SET_DEST (setter))
> 13329				      ? SET_SRC (setter)
> 13330				      : gen_lowpart (GET_MODE (dest),
> 13331						     SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c.
> 
> Next when trying to combine
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (reg:HI 175))) "pr109040.c":10:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:HI 175)
>          (nil)))
> into
> (insn 40 39 43 2 (set (reg:SI 172)
>          (leu:SI (reg:SI 171)
>              (const_int 5 [0x5]))) "pr109040.c":10:11 291 {*sleu_sisi}
>       (expr_list:REG_DEAD (reg:SI 171)
>          (nil)))
> we i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
> and that correctly simplifies it into
> (and:SI (subreg:SI (reg:HI 175) 0)
>      (const_int 2124 [0x84c]))
Right.  Still seems sane at this point.


> We substitute that
> (leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
>          (const_int 2124 [0x84c]))
>      (const_int 5 [0x5]))
> but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
>            /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
>               fits in both M1 and M2 and the SUBREG is either paradoxical
>               or represents the low part, permute the SUBREG and the AND
>               and try again.  */
>            if (GET_CODE (XEXP (op0, 0)) == SUBREG
>                && CONST_INT_P (XEXP (op0, 1)))
>              {
>                unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
>                /* Require an integral mode, to avoid creating something like
>                   (AND:SF ...).  */
>                if ((is_a <scalar_int_mode>
>                     (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
>                    /* It is unsafe to commute the AND into the SUBREG if the
>                       SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
>                       not defined.  As originally written the upper bits
>                       have a defined value due to the AND operation.
>                       However, if we commute the AND inside the SUBREG then
>                       they no longer have defined values and the meaning of
>                       the code has been changed.
>                       Also C1 should not change value in the smaller mode,
>                       see PR67028 (a positive C1 can become negative in the
>                       smaller mode, so that the AND does no longer mask the
>                       upper bits).  */
>                    && ((WORD_REGISTER_OPERATIONS
>                         && mode_width > GET_MODE_PRECISION (tmode)
>                         && mode_width <= BITS_PER_WORD
>                         && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
>                        || (mode_width <= GET_MODE_PRECISION (tmode)
>                            && subreg_lowpart_p (XEXP (op0, 0))))
>                    && mode_width <= HOST_BITS_PER_WIDE_INT
>                    && HWI_COMPUTABLE_MODE_P (tmode)
>                    && (c1 & ~mask) == 0
>                    && (c1 & ~GET_MODE_MASK (tmode)) == 0
>                    && c1 != mask
>                    && c1 != GET_MODE_MASK (tmode))
>                  {
>                    op0 = simplify_gen_binary (AND, tmode,
>                                               SUBREG_REG (XEXP (op0, 0)),
>                                               gen_int_mode (c1, tmode));
>                    op0 = gen_lowpart (mode, op0);
>                    continue;
>                  }
>              }
> c1 is 0x84c.  I believe this is the exact spot where things go wrong,
> and is because for WORD_REGISTER_OPERATIONS we assume something that the
> DSE added instruction didn't guarantee.
pan2.li zero'd in on the same block of code.

The leap I'm strugging with is the assertion that this combine code 
assumes something that the DSE added instruction does not guarantee. 
That's why I asked if we agreed that the before/after from DSE was 
correct or not.  I think that's still the outstanding question.

Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 16:17     ` Jeff Law
@ 2023-04-05 16:48       ` Jakub Jelinek
  2023-04-05 17:31         ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-05 16:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Richard Sandiford, Eric Botcazou, gcc-patches

On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
> > It is true that an instruction like
> > (insn 8 7 9 2 (set (reg:HI 141)
> >          (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
> >       (nil))
> > can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
> > upper bits shouldn't be random garbage in that case, it should be zero
> > extended or sign extended.
> Well, that's one of the core questions here.  What are the state of the
> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
> 100% clear as we're not really doing any operation.
> 
> So again, I think we need to decide if the DSE transformation is correct or
> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
> of insn 47 that I think we need to agree on.  What is the state of the upper
> 16 bits of (reg:HI 175) after insn 47?

I'm afraid I don't know the answers here, I think Eric is
WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
targets are !WORD_REGISTER_OPERATIONS).
Intuitively, WORD_REGISTER_OPERATIONS from the description would be
that
(insn 47 35 39 2 (set (reg:HI 175)                                                                                                                                                    
        (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}                                                                                                           
     (expr_list:REG_DEAD (reg:SI 166)                                                                                                                                                 
        (nil)))                                                                                                                                                                       
would copy not just the low 16-bits of pseudo 166 to 175, but also the upper
16-bits.  But if that is so, then something is broken in the code below.
Some archeology shows that we were doing that on all arches initially
and then
Wed May 13 17:38:35 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

	* combine.c (simplify_comparison, case AND): Don't commute AND
	with SUBREG if constant is whole mode and don't do if lowpart
	and not WORD_REGISTER_OPERATIONS.
restricted that to WORD_REGISTER_OPERATIONS only.
The whole optimization is then likely
Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

	* combine.c (gen_binary): Don't make AND that does nothing.
	(simplify_comparison, case AND): Commute AND and SUBREG.
	* i386.h (CONST_CONSTS, case CONST_INT): One-byte integers are cost 0.
but the code has been tweaked further many times since then.

> > We substitute that
> > (leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
> >          (const_int 2124 [0x84c]))
> >      (const_int 5 [0x5]))
> > but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
> >            /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
> >               fits in both M1 and M2 and the SUBREG is either paradoxical
> >               or represents the low part, permute the SUBREG and the AND
> >               and try again.  */
> >            if (GET_CODE (XEXP (op0, 0)) == SUBREG
> >                && CONST_INT_P (XEXP (op0, 1)))
> >              {
> >                unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
> >                /* Require an integral mode, to avoid creating something like
> >                   (AND:SF ...).  */
> >                if ((is_a <scalar_int_mode>
> >                     (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
> >                    /* It is unsafe to commute the AND into the SUBREG if the
> >                       SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
> >                       not defined.  As originally written the upper bits
> >                       have a defined value due to the AND operation.
> >                       However, if we commute the AND inside the SUBREG then
> >                       they no longer have defined values and the meaning of
> >                       the code has been changed.
> >                       Also C1 should not change value in the smaller mode,
> >                       see PR67028 (a positive C1 can become negative in the
> >                       smaller mode, so that the AND does no longer mask the
> >                       upper bits).  */
> >                    && ((WORD_REGISTER_OPERATIONS
> >                         && mode_width > GET_MODE_PRECISION (tmode)
> >                         && mode_width <= BITS_PER_WORD
> >                         && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
> >                        || (mode_width <= GET_MODE_PRECISION (tmode)
> >                            && subreg_lowpart_p (XEXP (op0, 0))))
> >                    && mode_width <= HOST_BITS_PER_WIDE_INT
> >                    && HWI_COMPUTABLE_MODE_P (tmode)
> >                    && (c1 & ~mask) == 0
> >                    && (c1 & ~GET_MODE_MASK (tmode)) == 0
> >                    && c1 != mask
> >                    && c1 != GET_MODE_MASK (tmode))
> >                  {
> >                    op0 = simplify_gen_binary (AND, tmode,
> >                                               SUBREG_REG (XEXP (op0, 0)),
> >                                               gen_int_mode (c1, tmode));
> >                    op0 = gen_lowpart (mode, op0);
> >                    continue;
> >                  }
> >              }
> > c1 is 0x84c.  I believe this is the exact spot where things go wrong,
> > and is because for WORD_REGISTER_OPERATIONS we assume something that the
> > DSE added instruction didn't guarantee.
> pan2.li zero'd in on the same block of code.
> 
> The leap I'm strugging with is the assertion that this combine code assumes
> something that the DSE added instruction does not guarantee. That's why I
> asked if we agreed that the before/after from DSE was correct or not.  I
> think that's still the outstanding question.

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 16:48       ` Jakub Jelinek
@ 2023-04-05 17:31         ` Jeff Law
  2023-04-06  9:31           ` Richard Sandiford
  2023-04-06 10:15           ` Eric Botcazou
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-05 17:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Richard Sandiford, Eric Botcazou, gcc-patches



On 4/5/23 10:48, Jakub Jelinek wrote:
> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>> It is true that an instruction like
>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>        (nil))
>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>> upper bits shouldn't be random garbage in that case, it should be zero
>>> extended or sign extended.
>> Well, that's one of the core questions here.  What are the state of the
>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>> 100% clear as we're not really doing any operation.
>>
>> So again, I think we need to decide if the DSE transformation is correct or
>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>> of insn 47 that I think we need to agree on.  What is the state of the upper
>> 16 bits of (reg:HI 175) after insn 47?
> 
> I'm afraid I don't know the answers here, I think Eric is
> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
> targets are !WORD_REGISTER_OPERATIONS).
Hopefully he'll chime in.

> Intuitively, WORD_REGISTER_OPERATIONS from the description would be
> that
> (insn 47 35 39 2 (set (reg:HI 175)
>          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> would copy not just the low 16-bits of pseudo 166 to 175, but also the upper
> 16-bits.  
I've gone back and forth repeatedly on this.

Originally I didn't really see this as an operation.  But the more and 
more I ponder it feels like it's an operation and thus should be subject 
to WORD_REGISTER_OPERATIONS.

While it's not really binding on RTL semantics, if we look at how some 
architectures implement reg->reg copies, they're actually implemented 
with an ADD or IOR -- so a real operation under the hood.

If we accept a subreg copy as an operation and thus subject to 
WORD_REGISTER_OPERATIONS then that would seem to imply the combine is 
the real problem here.  Otherwise dse is the culprit.







But if that is so, then something is broken in the code below.
> Some archeology shows that we were doing that on all arches initially
> and then
> Wed May 13 17:38:35 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>
> 
> 	* combine.c (simplify_comparison, case AND): Don't commute AND
> 	with SUBREG if constant is whole mode and don't do if lowpart
> 	and not WORD_REGISTER_OPERATIONS.
> restricted that to WORD_REGISTER_OPERATIONS only.
> The whole optimization is then likely
> Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>
> 
> 	* combine.c (gen_binary): Don't make AND that does nothing.
> 	(simplify_comparison, case AND): Commute AND and SUBREG.
> 	* i386.h (CONST_CONSTS, case CONST_INT): One-byte integers are cost 0.
> but the code has been tweaked further many times since then.
Sigh.  1998 and probably directly from Kenner's tree at the time.  So no 
public discussion, likely no testcases either.  Off to my private 
archives.  Yup, no discussion of Kenner's patch or testcase.

Interestingly enough I did stumble across Andreas Schwab reporting a bug 
in Kenner's change, though on the m68k and unrelated to 
WORD_REGISTER_OPERATIONS


jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 17:31         ` Jeff Law
@ 2023-04-06  9:31           ` Richard Sandiford
  2023-04-06  9:37             ` Li, Pan2
  2023-04-06 14:45             ` Jeff Law
  2023-04-06 10:15           ` Eric Botcazou
  1 sibling, 2 replies; 37+ messages in thread
From: Richard Sandiford @ 2023-04-06  9:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jakub Jelinek, Richard Biener, Eric Botcazou, gcc-patches

Jeff Law <jeffreyalaw@gmail.com> writes:
> On 4/5/23 10:48, Jakub Jelinek wrote:
>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>> It is true that an instruction like
>>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>        (nil))
>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>>> upper bits shouldn't be random garbage in that case, it should be zero
>>>> extended or sign extended.
>>> Well, that's one of the core questions here.  What are the state of the
>>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>>> 100% clear as we're not really doing any operation.
>>>
>>> So again, I think we need to decide if the DSE transformation is correct or
>>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>>> of insn 47 that I think we need to agree on.  What is the state of the upper
>>> 16 bits of (reg:HI 175) after insn 47?
>> 
>> I'm afraid I don't know the answers here, I think Eric is
>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
>> targets are !WORD_REGISTER_OPERATIONS).
> Hopefully he'll chime in.

Just curious: have you experimented with making RISC-V
!WORD_REGISTER_OPERATIONS too?  Realise it's not the right way
to fix the bug, just curious in general.

Not defining it seems to have worked well for AArch64.  And IMO
the semantics are much easier to follow when there is no special
treatment of upper bits.  Subregs are hard enough to reason about
as it is...

Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06  9:31           ` Richard Sandiford
@ 2023-04-06  9:37             ` Li, Pan2
  2023-04-06 14:49               ` Jeff Law
  2023-04-06 14:45             ` Jeff Law
  1 sibling, 1 reply; 37+ messages in thread
From: Li, Pan2 @ 2023-04-06  9:37 UTC (permalink / raw)
  To: Richard Sandiford, Jeff Law
  Cc: Jakub Jelinek, Richard Biener, Eric Botcazou, gcc-patches,
	kito.cheng, juzhe.zhong

Yes, RISC-V riscv.h defined the WORD_REGISTER_OPERATIONS to be 1, while aarch64.h defined it as 0, with below comments. No idea this can fit RISC-V or not.

/* WORD_REGISTER_OPERATIONS does not hold for AArch64.
   The assigned word_mode is DImode but operations narrower than SImode
   behave as 32-bit operations if using the W-form of the registers rather
   than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
   expects.  */
#define WORD_REGISTER_OPERATIONS 0

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Richard Sandiford via Gcc-patches
Sent: Thursday, April 6, 2023 5:31 PM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>; Richard Biener <rguenther@suse.de>; Eric Botcazou <botcazou@adacore.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

Jeff Law <jeffreyalaw@gmail.com> writes:
> On 4/5/23 10:48, Jakub Jelinek wrote:
>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>> It is true that an instruction like (insn 8 7 9 2 (set (reg:HI 141)
>>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>        (nil))
>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I 
>>>> think the upper bits shouldn't be random garbage in that case, it 
>>>> should be zero extended or sign extended.
>>> Well, that's one of the core questions here.  What are the state of 
>>> the upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS 
>>> docs aren't 100% clear as we're not really doing any operation.
>>>
>>> So again, I think we need to decide if the DSE transformation is 
>>> correct or not.  I *think* we can aggree that insn 39 is OK.  It's 
>>> really the semantics of insn 47 that I think we need to agree on.  
>>> What is the state of the upper
>>> 16 bits of (reg:HI 175) after insn 47?
>> 
>> I'm afraid I don't know the answers here, I think Eric is 
>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the 
>> major targets are !WORD_REGISTER_OPERATIONS).
> Hopefully he'll chime in.

Just curious: have you experimented with making RISC-V !WORD_REGISTER_OPERATIONS too?  Realise it's not the right way to fix the bug, just curious in general.

Not defining it seems to have worked well for AArch64.  And IMO the semantics are much easier to follow when there is no special treatment of upper bits.  Subregs are hard enough to reason about as it is...

Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-05 17:31         ` Jeff Law
  2023-04-06  9:31           ` Richard Sandiford
@ 2023-04-06 10:15           ` Eric Botcazou
  2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
  2023-04-06 14:53             ` [PATCH] dse: Handle SUBREGs of word REGs differently " Jeff Law
  1 sibling, 2 replies; 37+ messages in thread
From: Eric Botcazou @ 2023-04-06 10:15 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jakub Jelinek, Richard Biener, Richard Sandiford, gcc-patches

> Originally I didn't really see this as an operation.  But the more and
> more I ponder it feels like it's an operation and thus should be subject
> to WORD_REGISTER_OPERATIONS.
> 
> While it's not really binding on RTL semantics, if we look at how some
> architectures implement reg->reg copies, they're actually implemented
> with an ADD or IOR -- so a real operation under the hood.
> 
> If we accept a subreg copy as an operation and thus subject to
> WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
> the real problem here.  Otherwise dse is the culprit.

Yes, I agree that there is an ambiguity for subreg copy operations.  At some 
point I tried to define what register operations are and added a predicate to 
that effect (word_register_operation_p ); while it returns true for SUBREG, 
it's an opt-out predicate so this does not mean much.

I don't think that DSE does anything wrong: as I wrote in the PR, defining 
WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.

I therefore think that the problem is in the combiner and probably in the 
intermediate step shown by Jakub:

"Then after that try_combine we do:
13325           record_value_for_reg (dest, record_dead_insn,
13326                                 WORD_REGISTER_OPERATIONS
13327                                 && word_register_operation_p (SET_SRC 
(setter))
13328                                 && paradoxical_subreg_p (SET_DEST 
(setter))
13329                                 ? SET_SRC (setter)
13330                                 : gen_lowpart (GET_MODE (dest),
13331                                                SET_SRC (setter)));
and the 3 conditions are true here and so record value of the whole setter.
That then records among other things nonzero_bits as 0x8084c."

That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297) 
and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either 
be reverted or restricted to the load case documented in its comment.  I can 
provide testing on SPARC if need be.

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:15           ` Eric Botcazou
@ 2023-04-06 10:31             ` Jakub Jelinek
  2023-04-06 10:51               ` Eric Botcazou
                                 ` (2 more replies)
  2023-04-06 14:53             ` [PATCH] dse: Handle SUBREGs of word REGs differently " Jeff Law
  1 sibling, 3 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-06 10:31 UTC (permalink / raw)
  To: Eric Botcazou, Jeff Law; +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Thu, Apr 06, 2023 at 12:15:51PM +0200, Eric Botcazou wrote:
> > Originally I didn't really see this as an operation.  But the more and
> > more I ponder it feels like it's an operation and thus should be subject
> > to WORD_REGISTER_OPERATIONS.
> > 
> > While it's not really binding on RTL semantics, if we look at how some
> > architectures implement reg->reg copies, they're actually implemented
> > with an ADD or IOR -- so a real operation under the hood.
> > 
> > If we accept a subreg copy as an operation and thus subject to
> > WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
> > the real problem here.  Otherwise dse is the culprit.
> 
> Yes, I agree that there is an ambiguity for subreg copy operations.  At some 
> point I tried to define what register operations are and added a predicate to 
> that effect (word_register_operation_p ); while it returns true for SUBREG, 
> it's an opt-out predicate so this does not mean much.
> 
> I don't think that DSE does anything wrong: as I wrote in the PR, defining 
> WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.
> 
> I therefore think that the problem is in the combiner and probably in the 
> intermediate step shown by Jakub:
> 
> "Then after that try_combine we do:
> 13325           record_value_for_reg (dest, record_dead_insn,
> 13326                                 WORD_REGISTER_OPERATIONS
> 13327                                 && word_register_operation_p (SET_SRC 
> (setter))
> 13328                                 && paradoxical_subreg_p (SET_DEST 
> (setter))
> 13329                                 ? SET_SRC (setter)
> 13330                                 : gen_lowpart (GET_MODE (dest),
> 13331                                                SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c."
> 
> That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297) 
> and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either 
> be reverted or restricted to the load case documented in its comment.  I can 
> provide testing on SPARC if need be.

If we want to fix it in the combiner, I think the fix would be following.
The optimization is about
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
and IMHO we can only optimize it into
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
if we know that the upper bits of the REG are zeros.  The optimization
is only when the constant second AND operand is the same in both modes.
Otherwise because of nonzero_bits ((reg:SI xxx), SImode) == 0x8084c
we optimize into (subreg:SI (reg:HI xxx) 0).
The original form has the upper 16 bits guaranteed to be all zeros,
while the new form has them whatever value comes from the original
operation.  For !WORD_REGISTER_OPERATIONS, we don't optimize this way
paradoxical SUBREGs at all, then by definition the upper bits are unspecified.

Now, this patch fixes the PR, but certainly generates worse (but correct)
code than the dse.cc patch.  So perhaps we want both of them?

As before, I unfortunately can't test it on riscv-linux (could perhaps try
that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
target, but last my bootstrap attempt there failed miserably because of the
Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
for that once I test it).

2023-04-06  Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* combine.cc (simplify_comparison) <case AND>: For swapping AND and
	SUBREG with paradoxical subreg on WORD_REGISTER_OPERATIONS target,
	ensure the upper bits are known to be 0.

--- gcc/combine.cc.jj	2023-02-28 11:28:56.423182357 +0100
+++ gcc/combine.cc	2023-04-06 12:02:31.694747361 +0200
@@ -12671,7 +12671,11 @@ simplify_comparison (enum rtx_code code,
 		  && ((WORD_REGISTER_OPERATIONS
 		       && mode_width > GET_MODE_PRECISION (tmode)
 		       && mode_width <= BITS_PER_WORD
-		       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
+		       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1
+		       /* Make sure the bits above tmode in the SUBREG_REG
+			  are all known to be 0, see PR109040.  */
+		       && (nonzero_bits (SUBREG_REG (XEXP (op0, 0)), mode)
+			   & ~GET_MODE_MASK (tmode)) == 0)
 		      || (mode_width <= GET_MODE_PRECISION (tmode)
 			  && subreg_lowpart_p (XEXP (op0, 0))))
 		  && mode_width <= HOST_BITS_PER_WIDE_INT


	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
@ 2023-04-06 10:51               ` Eric Botcazou
  2023-04-06 11:37                 ` Jakub Jelinek
  2023-04-06 14:35               ` Jeff Law
  2023-04-06 15:06               ` Jeff Law
  2 siblings, 1 reply; 37+ messages in thread
From: Eric Botcazou @ 2023-04-06 10:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Richard Biener, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]

> If we want to fix it in the combiner, I think the fix would be following.
> The optimization is about
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> and IMHO we can only optimize it into
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> if we know that the upper bits of the REG are zeros.

The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation 
is done on the full word register, in other words that it's in effect:

(subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)

that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.

> Now, this patch fixes the PR, but certainly generates worse (but correct)
> code than the dse.cc patch.  So perhaps we want both of them?

What happens if you disable the step I mentioned (patchlet attached)?

> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).

I think that we have one in the Ada compiler too; glad to know we are not 
alone in this boat. ;-)

-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 1027 bytes --]

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..5453ce85156 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -13321,14 +13321,12 @@ record_dead_and_set_regs_1 (rtx dest, const_rtx setter, void *data)
 	       && SUBREG_REG (SET_DEST (setter)) == dest
 	       && known_le (GET_MODE_PRECISION (GET_MODE (dest)),
 			    BITS_PER_WORD)
-	       && subreg_lowpart_p (SET_DEST (setter)))
+	       && subreg_lowpart_p (SET_DEST (setter))
+	       && !(WORD_REGISTER_OPERATIONS
+		    && paradoxical_subreg_p (SET_DEST (setter))
+		    && contains_mem_rtx_p (SET_SRC (setter))))
 	record_value_for_reg (dest, record_dead_insn,
-			      WORD_REGISTER_OPERATIONS
-			      && word_register_operation_p (SET_SRC (setter))
-			      && paradoxical_subreg_p (SET_DEST (setter))
-			      ? SET_SRC (setter)
-			      : gen_lowpart (GET_MODE (dest),
-					     SET_SRC (setter)));
+			      gen_lowpart (GET_MODE (dest), SET_SRC (setter)));
       else
 	record_value_for_reg (dest, record_dead_insn, NULL_RTX);
     }

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:51               ` Eric Botcazou
@ 2023-04-06 11:37                 ` Jakub Jelinek
  2023-04-06 14:21                   ` Eric Botcazou
  2023-04-09  1:15                   ` Jeff Law
  0 siblings, 2 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-06 11:37 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Jeff Law, Richard Biener, Richard Sandiford, gcc-patches

On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > If we want to fix it in the combiner, I think the fix would be following.
> > The optimization is about
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > and IMHO we can only optimize it into
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > if we know that the upper bits of the REG are zeros.
> 
> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation 
> is done on the full word register, in other words that it's in effect:
> 
> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> 
> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.

If the
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
to
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
invalid operation is then in
simplify_context::simplify_binary_operation_1
    case AND:
...
      if (HWI_COMPUTABLE_MODE_P (mode))
        {
          HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
          HOST_WIDE_INT nzop1;
          if (CONST_INT_P (trueop1))
            {
              HOST_WIDE_INT val1 = INTVAL (trueop1);
              /* If we are turning off bits already known off in OP0, we need
                 not do an AND.  */
              if ((nzop0 & ~val1) == 0)
                return op0;
            }
We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
[0x84c]).
We then for integral? modes smaller than word_mode would then need to
actually check nonzero_bits in the word_mode (on paradoxical subreg of
trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
nonzero_bits in the wider mode would be all we need (although the
subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
means AND with a constant which has the most significant bit set for the
upper bits.

So, perhaps just in the return op0; case add further code for
WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
again for the word mode and decide if it is still safe.

> > Now, this patch fixes the PR, but certainly generates worse (but correct)
> > code than the dse.cc patch.  So perhaps we want both of them?
> 
> What happens if you disable the step I mentioned (patchlet attached)?

That patch doesn't change anything at all on the testcase, it is still
miscompiled.

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 11:37                 ` Jakub Jelinek
@ 2023-04-06 14:21                   ` Eric Botcazou
  2023-04-09  0:25                     ` Jeff Law
  2023-04-09  1:15                   ` Jeff Law
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Botcazou @ 2023-04-06 14:21 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Jeff Law, Richard Biener, Richard Sandiford, gcc-patches

> If the
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> to
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
> invalid operation is then in
> simplify_context::simplify_binary_operation_1
>     case AND:
> ...
>       if (HWI_COMPUTABLE_MODE_P (mode))
>         {
>           HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
>           HOST_WIDE_INT nzop1;
>           if (CONST_INT_P (trueop1))
>             {
>               HOST_WIDE_INT val1 = INTVAL (trueop1);
>               /* If we are turning off bits already known off in OP0, we
> need not do an AND.  */
>               if ((nzop0 & ~val1) == 0)
>                 return op0;
>             }
> We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
> [0x84c]).
> We then for integral? modes smaller than word_mode would then need to
> actually check nonzero_bits in the word_mode (on paradoxical subreg of
> trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
> nonzero_bits in the wider mode would be all we need (although the
> subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
> nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
> means AND with a constant which has the most significant bit set for the
> upper bits.

Yes, I agree that there is a tension between this AND case and the swapping 
done in the combiner for WORD_REGISTER_OPERATIONS.  I also agree that it would 
make sense do call nonzero_bits on word_mode instead of mode here in this case 
because AND is a word_register_operation_p.

> So, perhaps just in the return op0; case add further code for
> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> again for the word mode and decide if it is still safe.

Does it work to just replace mode by word_mode in the calls to nonzero_bits?

> That patch doesn't change anything at all on the testcase, it is still
> miscompiled.

OK, too bad, thanks for trying it!

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
  2023-04-06 10:51               ` Eric Botcazou
@ 2023-04-06 14:35               ` Jeff Law
  2023-04-06 15:06               ` Jeff Law
  2 siblings, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-06 14:35 UTC (permalink / raw)
  To: Jakub Jelinek, Eric Botcazou
  Cc: Richard Biener, Richard Sandiford, gcc-patches



On 4/6/23 04:31, Jakub Jelinek wrote:

> 
> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).
I can do that test.  It'll take most of a day once it starts, so not 
eager to fire it up until we've settled on a patch.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06  9:31           ` Richard Sandiford
  2023-04-06  9:37             ` Li, Pan2
@ 2023-04-06 14:45             ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-06 14:45 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Eric Botcazou, gcc-patches,
	richard.sandiford



On 4/6/23 03:31, Richard Sandiford wrote:
> Jeff Law <jeffreyalaw@gmail.com> writes:
>> On 4/5/23 10:48, Jakub Jelinek wrote:
>>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>>> It is true that an instruction like
>>>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>>>            (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>>         (nil))
>>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>>>> upper bits shouldn't be random garbage in that case, it should be zero
>>>>> extended or sign extended.
>>>> Well, that's one of the core questions here.  What are the state of the
>>>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>>>> 100% clear as we're not really doing any operation.
>>>>
>>>> So again, I think we need to decide if the DSE transformation is correct or
>>>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>>>> of insn 47 that I think we need to agree on.  What is the state of the upper
>>>> 16 bits of (reg:HI 175) after insn 47?
>>>
>>> I'm afraid I don't know the answers here, I think Eric is
>>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
>>> targets are !WORD_REGISTER_OPERATIONS).
>> Hopefully he'll chime in.
> 
> Just curious: have you experimented with making RISC-V
> !WORD_REGISTER_OPERATIONS too?  Realise it's not the right way
> to fix the bug, just curious in general.
We haven't experimented with it AFAIK.  We don't have a full set of 
SImode operations, but we may have enough of a subset to try. 
Alternately, we could potentially see what happens if we ignore the 
32bit ops that we do support.  Both general directions are probably 
worth exploring, but not right now and probably not even for gcc-14 
(where we're going to be busy as hell on the vector side).

> 
> Not defining it seems to have worked well for AArch64.  And IMO
> the semantics are much easier to follow when there is no special
> treatment of upper bits.  Subregs are hard enough to reason about
> as it is...
Amen to that.  My sense is that the risc-v port relies far too heavily 
on SUBREGs and the WORD_REGISTER_OPERATIONS just makes reasoning about 
correctness harder.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06  9:37             ` Li, Pan2
@ 2023-04-06 14:49               ` Jeff Law
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-06 14:49 UTC (permalink / raw)
  To: Li, Pan2, Richard Sandiford
  Cc: Jakub Jelinek, Richard Biener, Eric Botcazou, gcc-patches,
	kito.cheng, juzhe.zhong




On 4/6/23 03:37, Li, Pan2 wrote:
> Yes, RISC-V riscv.h defined the WORD_REGISTER_OPERATIONS to be 1, while aarch64.h defined it as 0, with below comments. No idea this can fit RISC-V or not.
I don't see any fundamental reason why it won't work.  Most of the 
expansion code already has code to widen types as necessary.  And given 
that we have a subset of 32bit ops, even in 64bit modes makes a 
WORD_REGISTER_OPERATIONS 0 a more sensible choice.

Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:15           ` Eric Botcazou
  2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
@ 2023-04-06 14:53             ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-06 14:53 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: Jakub Jelinek, Richard Biener, Richard Sandiford, gcc-patches



On 4/6/23 04:15, Eric Botcazou wrote:
>> Originally I didn't really see this as an operation.  But the more and
>> more I ponder it feels like it's an operation and thus should be subject
>> to WORD_REGISTER_OPERATIONS.
>>
>> While it's not really binding on RTL semantics, if we look at how some
>> architectures implement reg->reg copies, they're actually implemented
>> with an ADD or IOR -- so a real operation under the hood.
>>
>> If we accept a subreg copy as an operation and thus subject to
>> WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
>> the real problem here.  Otherwise dse is the culprit.
> 
> Yes, I agree that there is an ambiguity for subreg copy operations.  At some
> point I tried to define what register operations are and added a predicate to
> that effect (word_register_operation_p ); while it returns true for SUBREG,
> it's an opt-out predicate so this does not mean much.
Yea, I saw word_register_operation_p.  I was hesitant to treat it as a 
canonical definition of what ops are and are not subject to 
WORD_REGISTER_OPERATIONS.

> 
> I don't think that DSE does anything wrong: as I wrote in the PR, defining
> WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.
That was the conclusion I'd come to, predicated on treating SUBREGs as 
affected by WORD_REGISTER_OPERATIONS.

> 
> I therefore think that the problem is in the combiner and probably in the
> intermediate step shown by Jakub:
> 
> "Then after that try_combine we do:
> 13325           record_value_for_reg (dest, record_dead_insn,
> 13326                                 WORD_REGISTER_OPERATIONS
> 13327                                 && word_register_operation_p (SET_SRC
> (setter))
> 13328                                 && paradoxical_subreg_p (SET_DEST
> (setter))
> 13329                                 ? SET_SRC (setter)
> 13330                                 : gen_lowpart (GET_MODE (dest),
> 13331                                                SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c."
> 
> That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297)
> and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either
> be reverted or restricted to the load case documented in its comment.  I can
> provide testing on SPARC if need be.
I think that's the job for today.  Pan2, Jakub and myself have all 
zero'd in on this code in combine.

jeff


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
  2023-04-06 10:51               ` Eric Botcazou
  2023-04-06 14:35               ` Jeff Law
@ 2023-04-06 15:06               ` Jeff Law
  2 siblings, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-06 15:06 UTC (permalink / raw)
  To: Jakub Jelinek, Eric Botcazou
  Cc: Richard Biener, Richard Sandiford, gcc-patches



On 4/6/23 04:31, Jakub Jelinek wrote:

> 
> If we want to fix it in the combiner, I think the fix would be following.
> The optimization is about
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> and IMHO we can only optimize it into
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> if we know that the upper bits of the REG are zeros.  
But in WORD_REGISTER_OPERATIONS, that inner AND variant operates on a 
full word.  So I think they're equivalent.  But maybe I'm getting myself 
confused again.



> 
> Now, this patch fixes the PR, but certainly generates worse (but correct)
> code than the dse.cc patch.  So perhaps we want both of them?
I think the dse patch has value independently of this discussion, though 
I think it's more of a gcc-14 thing.

> 
> As before, I unfortunately can't test it on riscv-linux (could perhaps try
> that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
> target, but last my bootstrap attempt there failed miserably because of the
> Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
> for that once I test it).
I can spin it here when the time comes.

jeff



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 14:21                   ` Eric Botcazou
@ 2023-04-09  0:25                     ` Jeff Law
  2023-04-10  7:10                       ` Jakub Jelinek
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-09  0:25 UTC (permalink / raw)
  To: Eric Botcazou, Jakub Jelinek
  Cc: gcc-patches, Richard Biener, Richard Sandiford



On 4/6/23 08:21, Eric Botcazou wrote:

>> So, perhaps just in the return op0; case add further code for
>> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
>> again for the word mode and decide if it is still safe.
> 
> Does it work to just replace mode by word_mode in the calls to nonzero_bits?
It helps marginally -- basically we defer mucking up the code a bit.  We 
then hit this in simplify_and_const_int_1:


   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
      MODE.  */

   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);

That just seems wrong for WORD_REGISTER_OPERATIONS targets.


Hacking both locations in a similar manner fixes the test.
jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-06 11:37                 ` Jakub Jelinek
  2023-04-06 14:21                   ` Eric Botcazou
@ 2023-04-09  1:15                   ` Jeff Law
  2023-04-10  5:13                     ` Hongtao Liu
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-09  1:15 UTC (permalink / raw)
  To: Jakub Jelinek, Eric Botcazou
  Cc: Richard Biener, Richard Sandiford, gcc-patches



On 4/6/23 05:37, Jakub Jelinek wrote:
> On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
>>> If we want to fix it in the combiner, I think the fix would be following.
>>> The optimization is about
>>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
>>> and IMHO we can only optimize it into
>>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
>>> if we know that the upper bits of the REG are zeros.
>>
>> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
>> is done on the full word register, in other words that it's in effect:
>>
>> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
>>
>> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> 
> If the
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> to
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
I think it is. In both cases the AND wipes the upper 16 bits.


> not really sure what for WORD_REGISTER_OPERATIONS
> means AND with a constant which has the most significant bit set for the
> upper bits.
That's a very good question.  I'm not sure either.  Obviously in the 
non-constant case all the bits up to word_mode get used.  The same thing 
is going to happen in the constant case.

THe fact that constants are sign extended from the mode bit is a gcc-ism 
though and not necessarily indicative of what hardware is going to to.



>>
>> What happens if you disable the step I mentioned (patchlet attached)?
> 
> That patch doesn't change anything at all on the testcase, it is still
> miscompiled.
That may be an artifact of later code in combine coming along and 
mucking things up in a manner similar.  That what I saw after twiddling 
simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls 
to nonzero_bits

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-09  1:15                   ` Jeff Law
@ 2023-04-10  5:13                     ` Hongtao Liu
  2023-04-10  5:15                       ` Hongtao Liu
  0 siblings, 1 reply; 37+ messages in thread
From: Hongtao Liu @ 2023-04-10  5:13 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Eric Botcazou, Richard Biener, Richard Sandiford,
	gcc-patches

On Sun, Apr 9, 2023 at 9:15 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/6/23 05:37, Jakub Jelinek wrote:
> > On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> >>> If we want to fix it in the combiner, I think the fix would be following.
> >>> The optimization is about
> >>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> >>> and IMHO we can only optimize it into
> >>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> >>> if we know that the upper bits of the REG are zeros.
> >>
> >> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
> >> is done on the full word register, in other words that it's in effect:
> >>
> >> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> >>
> >> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> >
> > If the
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > to
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> I think it is. In both cases the AND wipes the upper 16 bits.
>
>
> > not really sure what for WORD_REGISTER_OPERATIONS
> > means AND with a constant which has the most significant bit set for the
> > upper bits.
> That's a very good question.  I'm not sure either.  Obviously in the
> non-constant case all the bits up to word_mode get used.  The same thing
> is going to happen in the constant case.
>
> THe fact that constants are sign extended from the mode bit is a gcc-ism
> though and not necessarily indicative of what hardware is going to to.
>
>
>
> >>
> >> What happens if you disable the step I mentioned (patchlet attached)?
> >
> > That patch doesn't change anything at all on the testcase, it is still
> > miscompiled.
> That may be an artifact of later code in combine coming along and
> mucking things up in a manner similar.  That what I saw after twiddling
> simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls
> to nonzero_bits
Li pan and I tried to set SUBREG_PROMOTED_UNSIGNED_P for the
(subreg:SI (reg:HI xxx)) after the AND was optimized off.
But it looks RA doesn't handle it as expected,not sure it I understand
the semantics of SUBREG_PROMOTED_UNSIGNED_P correctly.
>
> jeff



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-10  5:13                     ` Hongtao Liu
@ 2023-04-10  5:15                       ` Hongtao Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Hongtao Liu @ 2023-04-10  5:15 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Eric Botcazou, Richard Biener, Richard Sandiford,
	gcc-patches

On Mon, Apr 10, 2023 at 1:13 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Sun, Apr 9, 2023 at 9:15 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 4/6/23 05:37, Jakub Jelinek wrote:
> > > On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > >>> If we want to fix it in the combiner, I think the fix would be following.
> > >>> The optimization is about
> > >>> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > >>> and IMHO we can only optimize it into
> > >>> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > >>> if we know that the upper bits of the REG are zeros.
> > >>
> > >> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND operation
> > >> is done on the full word register, in other words that it's in effect:
> > >>
> > >> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> > >>
> > >> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.
> > >
> > > If the
> > > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > > to
> > > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > I think it is. In both cases the AND wipes the upper 16 bits.
> >
> >
> > > not really sure what for WORD_REGISTER_OPERATIONS
> > > means AND with a constant which has the most significant bit set for the
> > > upper bits.
> > That's a very good question.  I'm not sure either.  Obviously in the
> > non-constant case all the bits up to word_mode get used.  The same thing
> > is going to happen in the constant case.
> >
> > THe fact that constants are sign extended from the mode bit is a gcc-ism
> > though and not necessarily indicative of what hardware is going to to.
> >
> >
> >
> > >>
> > >> What happens if you disable the step I mentioned (patchlet attached)?
> > >
> > > That patch doesn't change anything at all on the testcase, it is still
> > > miscompiled.
> > That may be an artifact of later code in combine coming along and
> > mucking things up in a manner similar.  That what I saw after twiddling
> > simplify_binary_operation_1.  See simplify_and_const_int_1 and its calls
> > to nonzero_bits
> Li pan and I tried to set SUBREG_PROMOTED_UNSIGNED_P for the
I mean SUBREG_PROMOTED_UNSIGNED_SET.
> (subreg:SI (reg:HI xxx)) after the AND was optimized off.
> But it looks RA doesn't handle it as expected,not sure it I understand
> the semantics of SUBREG_PROMOTED_UNSIGNED_P correctly.
> >
> > jeff
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-09  0:25                     ` Jeff Law
@ 2023-04-10  7:10                       ` Jakub Jelinek
  2023-04-12  1:26                         ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-10  7:10 UTC (permalink / raw)
  To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:
> 
> 
> On 4/6/23 08:21, Eric Botcazou wrote:
> 
> > > So, perhaps just in the return op0; case add further code for
> > > WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> > > again for the word mode and decide if it is still safe.
> > 
> > Does it work to just replace mode by word_mode in the calls to nonzero_bits?
> It helps marginally -- basically we defer mucking up the code a bit.  We
> then hit this in simplify_and_const_int_1:
> 
> 
>   /* See what bits may be nonzero in VAROP.  Unlike the general case of
>      a call to nonzero_bits, here we don't care about bits outside
>      MODE.  */
> 
>   nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> 
> That just seems wrong for WORD_REGISTER_OPERATIONS targets.
> 
> 
> Hacking both locations in a similar manner fixes the test.

If so, can you post that in patch form and can we go with that version
plus the testcase (e.g. from the first patch I've posted where I've changed
dse)?

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-10  7:10                       ` Jakub Jelinek
@ 2023-04-12  1:26                         ` Jeff Law
  2023-04-12  6:21                           ` Jakub Jelinek
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-12  1:26 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]



On 4/10/23 01:10, Jakub Jelinek wrote:
> On Sat, Apr 08, 2023 at 06:25:32PM -0600, Jeff Law wrote:
>>
>>
>> On 4/6/23 08:21, Eric Botcazou wrote:
>>
>>>> So, perhaps just in the return op0; case add further code for
>>>> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
>>>> again for the word mode and decide if it is still safe.
>>>
>>> Does it work to just replace mode by word_mode in the calls to nonzero_bits?
>> It helps marginally -- basically we defer mucking up the code a bit.  We
>> then hit this in simplify_and_const_int_1:
>>
>>
>>    /* See what bits may be nonzero in VAROP.  Unlike the general case of
>>       a call to nonzero_bits, here we don't care about bits outside
>>       MODE.  */
>>
>>    nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
>>
>> That just seems wrong for WORD_REGISTER_OPERATIONS targets.
>>
>>
>> Hacking both locations in a similar manner fixes the test.
> 
> If so, can you post that in patch form and can we go with that version
> plus the testcase (e.g. from the first patch I've posted where I've changed
> dse)?
So as I mentioned in IRC, I haven't really looked closely at 
simplify_and_const_int_1.  I don't have a high degree of confidence this 
patch is complete, even though it does fix the test for 108947 and 109040.

I did bootstrap on riscv, but not a regression test, that's spinning 
right now.

Jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 1419 bytes --]

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 22bf8e1ec89..c41d8a09b3b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
-     MODE.  */
+     MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
      Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 3b33afa2461..5f6f70491d8 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
 	return op0;
       if (HWI_COMPUTABLE_MODE_P (mode))
 	{
-	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+	     nonzero bits in WORD_MODE rather than MODE.  */
+	  HOST_WIDE_INT nzop0
+	    = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : mode);
 	  HOST_WIDE_INT nzop1;
 	  if (CONST_INT_P (trueop1))
 	    {

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12  1:26                         ` Jeff Law
@ 2023-04-12  6:21                           ` Jakub Jelinek
  2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
  2023-04-12 13:29                             ` [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] Jeff Law
  0 siblings, 2 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-12  6:21 UTC (permalink / raw)
  To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:
> I did bootstrap on riscv, but not a regression test, that's spinning right
> now.
> 
> Jeff

> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 22bf8e1ec89..c41d8a09b3b 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
>  
>    /* See what bits may be nonzero in VAROP.  Unlike the general case of
>       a call to nonzero_bits, here we don't care about bits outside
> -     MODE.  */
> +     MODE unless WORD_REGISTER_OPERATIONS is true.  */

I would have expected something like
WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
as the condition to use word_mode, rather than just
WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
used as is, not a narrower word_mode instead of them.

> -  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
> +  enum machine_mode tmode = WORD_REGISTER_OPERATIONS ? word_mode : mode;
> +  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
>  
>    /* Turn off all bits in the constant that are known to already be zero.
>       Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 3b33afa2461..5f6f70491d8 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3752,7 +3752,10 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>  	return op0;
>        if (HWI_COMPUTABLE_MODE_P (mode))
>  	{
> -	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
> +	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
> +	     nonzero bits in WORD_MODE rather than MODE.  */
> +	  HOST_WIDE_INT nzop0
> +	    = nonzero_bits (trueop0, WORD_REGISTER_OPERATIONS ? word_mode : mode);
>  	  HOST_WIDE_INT nzop1;
>  	  if (CONST_INT_P (trueop1))
>  	    {

Regarding my earlier comments for this spot, the later code does
          nzop1 = nonzero_bits (trueop1, mode);
          /* If we are clearing all the nonzero bits, the result is zero.  */
          if ((nzop1 & nzop0) == 0
              && !side_effects_p (op0) && !side_effects_p (op1))
            return CONST0_RTX (mode);
and because nonzero_bits in word_mode if it is wider might have just more
bits set above mode, but nzop1 will not have those bits set, I think it is
fine the way you wrote it (except for the precision check).

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12  6:21                           ` Jakub Jelinek
@ 2023-04-12 10:02                             ` Jakub Jelinek
  2023-04-12 14:17                               ` Jeff Law
                                                 ` (2 more replies)
  2023-04-12 13:29                             ` [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] Jeff Law
  1 sibling, 3 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-12 10:02 UTC (permalink / raw)
  To: Segher Boessenkool, Jeff Law, Eric Botcazou, gcc-patches,
	Richard Biener, Richard Sandiford

Hi!

On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> I would have expected something like
> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> as the condition to use word_mode, rather than just
> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> used as is, not a narrower word_mode instead of them.

In patch form that would be following (given that the combine.cc change
had scalar_int_mode mode we can as well just use normal comparison, and
simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
true for scalar int modes).

I've tried the pr108947.c testcase, but I see no differences in the assembly
before/after the patch (but dunno if I'm using the right options).
The pr109040.c testcase from the patch I don't see the expected zero
extension without the patch and do see it with it.

As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
targets.

2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
	smaller than word_mode.
	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
	<case AND>: Likewise.

	* gcc.c-torture/execute/pr109040.c: New test.

--- gcc/combine.cc.jj	2023-04-07 16:02:06.668051629 +0200
+++ gcc/combine.cc	2023-04-12 11:24:18.458240028 +0200
@@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
-     MODE.  */
+     MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  scalar_int_mode tmode = mode;
+  if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+    tmode = word_mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
      Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
      a power of two, we can replace this with an ASHIFT.  */
-  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
       && (i = exact_log2 (constop)) >= 0)
     return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
 
--- gcc/simplify-rtx.cc.jj	2023-03-02 19:09:45.459594212 +0100
+++ gcc/simplify-rtx.cc	2023-04-12 11:26:26.027400305 +0200
@@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat
 	return op0;
       if (HWI_COMPUTABLE_MODE_P (mode))
 	{
-	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+	     nonzero bits in WORD_MODE rather than MODE.  */
+          scalar_int_mode tmode = as_a <scalar_int_mode> (mode);
+          if (WORD_REGISTER_OPERATIONS
+	      && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+	    tmode = word_mode;
+	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
 	  HOST_WIDE_INT nzop1;
 	  if (CONST_INT_P (trueop1))
 	    {
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj	2023-04-12 11:11:56.728938344 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c	2023-04-12 11:11:56.728938344 +0200
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, &x);
+  if (x != (unsigned short) ~0)
+    __builtin_abort ();
+  return 0;
+}

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12  6:21                           ` Jakub Jelinek
  2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
@ 2023-04-12 13:29                             ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-12 13:29 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford



On 4/12/23 00:21, Jakub Jelinek wrote:
> On Tue, Apr 11, 2023 at 07:26:07PM -0600, Jeff Law wrote:
>> I did bootstrap on riscv, but not a regression test, that's spinning right
>> now.
>>
>> Jeff
> 
>> diff --git a/gcc/combine.cc b/gcc/combine.cc
>> index 22bf8e1ec89..c41d8a09b3b 100644
>> --- a/gcc/combine.cc
>> +++ b/gcc/combine.cc
>> @@ -10055,9 +10055,10 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop,
>>   
>>     /* See what bits may be nonzero in VAROP.  Unlike the general case of
>>        a call to nonzero_bits, here we don't care about bits outside
>> -     MODE.  */
>> +     MODE unless WORD_REGISTER_OPERATIONS is true.  */
> 
> I would have expected something like
> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> as the condition to use word_mode, rather than just
> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> used as is, not a narrower word_mode instead of them.
Agreed.

Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
@ 2023-04-12 14:17                               ` Jeff Law
  2023-04-12 14:30                                 ` Jakub Jelinek
  2023-04-12 15:24                               ` Segher Boessenkool
  2023-04-12 16:58                               ` [PATCH] combine, v4: " Jakub Jelinek
  2 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-12 14:17 UTC (permalink / raw)
  To: Jakub Jelinek, Segher Boessenkool, Eric Botcazou, gcc-patches,
	Richard Biener, Richard Sandiford



On 4/12/23 04:02, Jakub Jelinek wrote:
> Hi!
> 
> On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
>> I would have expected something like
>> WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
>> as the condition to use word_mode, rather than just
>> WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
>> used as is, not a narrower word_mode instead of them.
> 
> In patch form that would be following (given that the combine.cc change
> had scalar_int_mode mode we can as well just use normal comparison, and
> simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
> true for scalar int modes).
> 
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
> 
> As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
> targets.
> 
> 2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109040
> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> 	smaller than word_mode.
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> 	<case AND>: Likewise.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.
Looks pretty sensible.  It'll take most of the day, but I'll do a 
bootstrap and regression test with this variant.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12 14:17                               ` Jeff Law
@ 2023-04-12 14:30                                 ` Jakub Jelinek
  0 siblings, 0 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-12 14:30 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford

On Wed, Apr 12, 2023 at 08:17:46AM -0600, Jeff Law wrote:
> Looks pretty sensible.  It'll take most of the day, but I'll do a bootstrap
> and regression test with this variant.

Thanks.  Note, it bootstraps/regtests on x86_64-linux and i686-linux fine,
though that is not WORD_REGISTER_OPERATIONS target.  And it builds cc1 on
aarch64-linux (just wanted to make sure I didn't break anything on 2
coefficient poly-int arches).

	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v3: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
  2023-04-12 14:17                               ` Jeff Law
@ 2023-04-12 15:24                               ` Segher Boessenkool
  2023-04-12 16:58                               ` [PATCH] combine, v4: " Jakub Jelinek
  2 siblings, 0 replies; 37+ messages in thread
From: Segher Boessenkool @ 2023-04-12 15:24 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jeff Law, Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 12, 2023 at 08:21:26AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > I would have expected something like
> > WORD_REGISTER_OPERATIONS && known_le (GET_MODE_PRECISION (mode), BITS_PER_WORD)
> > as the condition to use word_mode, rather than just
> > WORD_REGISTER_OPERATIONS.  In both spots.  Because larger modes should be
> > used as is, not a narrower word_mode instead of them.
> 
> In patch form that would be following (given that the combine.cc change
> had scalar_int_mode mode we can as well just use normal comparison, and
> simplify-rtx.cc has it guarded on HWI_COMPUTABLE_MODE_P, which is also only
> true for scalar int modes).
> 
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
> 
> As before, I can only test this easily on non-WORD_REGISTER_OPERATIONS
> targets.

There are no doubt tens more similar WORD_REGISTER_OPERATIONS problems
lurking.  We would be much better off if this wart was removed and we
handled such things properly.

That said:

> 	PR target/109040
> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> 	smaller than word_mode.
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> 	<case AND>: Likewise.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
  2023-04-12 14:17                               ` Jeff Law
  2023-04-12 15:24                               ` Segher Boessenkool
@ 2023-04-12 16:58                               ` Jakub Jelinek
  2023-04-13  4:05                                 ` Jeff Law
  2 siblings, 1 reply; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-12 16:58 UTC (permalink / raw)
  To: Segher Boessenkool, Jeff Law, Eric Botcazou, gcc-patches,
	Richard Biener, Richard Sandiford

On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.

Seems my cross defaulted to 32-bit compilation, reproduced it with
additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
So, let's include that test in the patch too:

2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
	    Jakub Jelinek  <jakub@redhat.com>

	PR target/108947
	PR target/109040
	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
	smaller than word_mode.
	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
	<case AND>: Likewise.

	* gcc.dg/pr108947.c: New test.
	* gcc.c-torture/execute/pr109040.c: New test.

--- gcc/combine.cc.jj	2023-04-07 16:02:06.668051629 +0200
+++ gcc/combine.cc	2023-04-12 11:24:18.458240028 +0200
@@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* See what bits may be nonzero in VAROP.  Unlike the general case of
      a call to nonzero_bits, here we don't care about bits outside
-     MODE.  */
+     MODE unless WORD_REGISTER_OPERATIONS is true.  */
 
-  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+  scalar_int_mode tmode = mode;
+  if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+    tmode = word_mode;
+  nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
 
   /* Turn off all bits in the constant that are known to already be zero.
      Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod
 
   /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
      a power of two, we can replace this with an ASHIFT.  */
-  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+  if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
       && (i = exact_log2 (constop)) >= 0)
     return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
 
--- gcc/simplify-rtx.cc.jj	2023-03-02 19:09:45.459594212 +0100
+++ gcc/simplify-rtx.cc	2023-04-12 11:26:26.027400305 +0200
@@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat
 	return op0;
       if (HWI_COMPUTABLE_MODE_P (mode))
 	{
-	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+	  /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+	     nonzero bits in WORD_MODE rather than MODE.  */
+          scalar_int_mode tmode = as_a <scalar_int_mode> (mode);
+          if (WORD_REGISTER_OPERATIONS
+	      && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+	    tmode = word_mode;
+	  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
 	  HOST_WIDE_INT nzop1;
 	  if (CONST_INT_P (trueop1))
 	    {
--- gcc/testsuite/gcc.dg/pr108947.c.jj	2023-04-12 18:54:13.115630365 +0200
+++ gcc/testsuite/gcc.dg/pr108947.c	2023-04-12 18:53:21.166372386 +0200
@@ -0,0 +1,21 @@
+/* PR target/108947 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-forward-propagate -Wno-psabi" } */
+
+typedef unsigned short __attribute__((__vector_size__ (2 * sizeof (short)))) V;
+
+__attribute__((__noipa__)) V
+foo (V v)
+{
+  V w = 3 > (v & 3992);
+  return w;
+}
+
+int
+main ()
+{
+  V w = foo ((V) { 0, 9 });
+  if (w[0] != 0xffff || w[1] != 0)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj	2023-04-12 11:11:56.728938344 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c	2023-04-12 11:11:56.728938344 +0200
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, &x);
+  if (x != (unsigned short) ~0)
+    __builtin_abort ();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-12 16:58                               ` [PATCH] combine, v4: " Jakub Jelinek
@ 2023-04-13  4:05                                 ` Jeff Law
  2023-04-13 10:57                                   ` Segher Boessenkool
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-13  4:05 UTC (permalink / raw)
  To: Jakub Jelinek, Segher Boessenkool, Eric Botcazou, gcc-patches,
	Richard Biener, Richard Sandiford



On 4/12/23 10:58, Jakub Jelinek wrote:
> On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> I've tried the pr108947.c testcase, but I see no differences in the assembly
>> before/after the patch (but dunno if I'm using the right options).
>> The pr109040.c testcase from the patch I don't see the expected zero
>> extension without the patch and do see it with it.
> 
> Seems my cross defaulted to 32-bit compilation, reproduced it with
> additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
> So, let's include that test in the patch too:
> 
> 2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/108947
> 	PR target/109040
> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> 	smaller than word_mode.
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> 	<case AND>: Likewise.
> 
> 	* gcc.dg/pr108947.c: New test.
> 	* gcc.c-torture/execute/pr109040.c: New test.
Bootstrap of the v3 patch has completed.  Regression testing is still 
spinning.   It should be done and waiting for me when I wake up in the 
morning.

jeff-

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-13  4:05                                 ` Jeff Law
@ 2023-04-13 10:57                                   ` Segher Boessenkool
  2023-04-13 12:35                                     ` Jeff Law
  0 siblings, 1 reply; 37+ messages in thread
From: Segher Boessenkool @ 2023-04-13 10:57 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford

On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:
> On 4/12/23 10:58, Jakub Jelinek wrote:
> >Seems my cross defaulted to 32-bit compilation, reproduced it with
> >additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
> >So, let's include that test in the patch too:
> >
> >2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
> >	    Jakub Jelinek  <jakub@redhat.com>
> >
> >	PR target/108947
> >	PR target/109040
> >	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> >	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> >	smaller than word_mode.
> >	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> >	<case AND>: Likewise.
> >
> >	* gcc.dg/pr108947.c: New test.
> >	* gcc.c-torture/execute/pr109040.c: New test.
> Bootstrap of the v3 patch has completed.  Regression testing is still 
> spinning.   It should be done and waiting for me when I wake up in the 
> morning.

It's still okay for trunk (of course) if the bootstrap doesn't fail (of
course).  Thanks guys!


Segher

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
  2023-04-13 10:57                                   ` Segher Boessenkool
@ 2023-04-13 12:35                                     ` Jeff Law
  2023-04-13 13:45                                       ` [PATCH] loop-iv: Fix up bounds computation Jakub Jelinek
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Law @ 2023-04-13 12:35 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jakub Jelinek, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford



On 4/13/23 04:57, Segher Boessenkool wrote:
> On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:
>> On 4/12/23 10:58, Jakub Jelinek wrote:
>>> Seems my cross defaulted to 32-bit compilation, reproduced it with
>>> additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
>>> So, let's include that test in the patch too:
>>>
>>> 2023-04-12  Jeff Law  <jlaw@ventanamicro.com>
>>> 	    Jakub Jelinek  <jakub@redhat.com>
>>>
>>> 	PR target/108947
>>> 	PR target/109040
>>> 	* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
>>> 	word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
>>> 	smaller than word_mode.
>>> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
>>> 	<case AND>: Likewise.
>>>
>>> 	* gcc.dg/pr108947.c: New test.
>>> 	* gcc.c-torture/execute/pr109040.c: New test.
>> Bootstrap of the v3 patch has completed.  Regression testing is still
>> spinning.   It should be done and waiting for me when I wake up in the
>> morning.
> 
> It's still okay for trunk (of course) if the bootstrap doesn't fail (of
> course).  Thanks guys!
Bootstrap was successful with v3, but there's hundreds of testsuite 
failures due to the simplify-rtx hunk.  compile/20070520-1.c for example 
when compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d

Thursdays are my hell day.  It's unlikely I'd be able to look at this at 
all today.


typedef unsigned char uint8_t;
extern uint8_t ff_cropTbl[256 + 2 * 1024];

void ff_pred8x8_plane_c(uint8_t *src, int stride){
   int j, k;
   int a;
   uint8_t *cm = ff_cropTbl + 1024;
   const uint8_t * const src0 = src+3-stride;
   const uint8_t *src1 = src+4*stride-1;
   const uint8_t *src2 = src1-2*stride;
   int H = src0[1] - src0[-1];
   int V = src1[0] - src2[ 0];
   for(k=2; k<=4; ++k) {
     src1 += stride; src2 -= stride;
     H += k*(src0[k] - src0[-k]);
     V += k*(src1[0] - src2[ 0]);
   }
   H = ( 17*H+16 ) >> 5;
   V = ( 17*V+16 ) >> 5;

   a = 16*(src1[0] + src2[8]+1) - 3*(V+H);
   for(j=8; j>0; --j) {
     int b = a;
     a += V;
     src[0] = cm[ (b ) >> 5 ];
     src[1] = cm[ (b+ H) >> 5 ];
     src[2] = cm[ (b+2*H) >> 5 ];
     src[3] = cm[ (b+3*H) >> 5 ];
     src[4] = cm[ (b+4*H) >> 5 ];
     src[5] = cm[ (b+5*H) >> 5 ];
     src[6] = cm[ (b+6*H) >> 5 ];
     src[7] = cm[ (b+7*H) >> 5 ];
     src += stride;
   }
}

Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] loop-iv: Fix up bounds computation
  2023-04-13 12:35                                     ` Jeff Law
@ 2023-04-13 13:45                                       ` Jakub Jelinek
  2023-04-13 15:07                                         ` Jeff Law
  2023-04-13 19:37                                         ` Jeff Law
  0 siblings, 2 replies; 37+ messages in thread
From: Jakub Jelinek @ 2023-04-13 13:45 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford

On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:
> Bootstrap was successful with v3, but there's hundreds of testsuite failures
> due to the simplify-rtx hunk.  compile/20070520-1.c for example when
> compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d
> 
> Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
> today.

So, seems to me this is because loop-iv.cc asks for invalid RTL to be
simplified, it calls simplify_gen_binary (AND, SImode,
(subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
        (const_int 4294967295 [0xffffffff])) 0),
(const_int 4294967295 [0xffffffff]))
but 0xffffffff is not valid SImode CONST_INT, and unlike previously
we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
word_mode optimize that into the op0, so the invalid constant is emitted
into the IL and checking fails.

The following patch fixes that (and we optimize that & -1 away even earlier
with that).

Could you please just quickly try to apply this patch, make in the stage3
directory followed by
make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
(with all tests that regressed previously), whether this is the only spot
or whether we need to fix some other place too?

2023-04-13  Jakub Jelinek  <jakub@redhat.com>

	* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
	of GEN_INT.

--- gcc/loop-iv.cc.jj	2023-01-02 09:32:23.000000000 +0100
+++ gcc/loop-iv.cc	2023-04-13 15:34:11.939045804 +0200
@@ -2617,7 +2617,7 @@ iv_number_of_iterations (class loop *loo
 	  d *= 2;
 	  size--;
 	}
-      bound = GEN_INT (((uint64_t) 1 << (size - 1 ) << 1) - 1);
+      bound = gen_int_mode (((uint64_t) 1 << (size - 1) << 1) - 1, mode);
 
       tmp1 = lowpart_subreg (mode, iv1.base, comp_mode);
       tmp = simplify_gen_binary (UMOD, mode, tmp1, gen_int_mode (d, mode));


	Jakub


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] loop-iv: Fix up bounds computation
  2023-04-13 13:45                                       ` [PATCH] loop-iv: Fix up bounds computation Jakub Jelinek
@ 2023-04-13 15:07                                         ` Jeff Law
  2023-04-13 19:37                                         ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-13 15:07 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Segher Boessenkool, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford



On 4/13/23 07:45, Jakub Jelinek wrote:
> On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:
>> Bootstrap was successful with v3, but there's hundreds of testsuite failures
>> due to the simplify-rtx hunk.  compile/20070520-1.c for example when
>> compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d
>>
>> Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
>> today.
> 
> So, seems to me this is because loop-iv.cc asks for invalid RTL to be
> simplified, it calls simplify_gen_binary (AND, SImode,
> (subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
>          (const_int 4294967295 [0xffffffff])) 0),
> (const_int 4294967295 [0xffffffff]))
> but 0xffffffff is not valid SImode CONST_INT, and unlike previously
> we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
> word_mode optimize that into the op0, so the invalid constant is emitted
> into the IL and checking fails.
> 
> The following patch fixes that (and we optimize that & -1 away even earlier
> with that).
> 
> Could you please just quickly try to apply this patch, make in the stage3
> directory followed by
> make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
> (with all tests that regressed previously), whether this is the only spot
> or whether we need to fix some other place too?
> 
> 2023-04-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
> 	of GEN_INT.
I'll try to apply this and do just an incremental build & test to see if 
it resolves all the regressions.  It should complete while I'm in my 
meeting hell.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] loop-iv: Fix up bounds computation
  2023-04-13 13:45                                       ` [PATCH] loop-iv: Fix up bounds computation Jakub Jelinek
  2023-04-13 15:07                                         ` Jeff Law
@ 2023-04-13 19:37                                         ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Law @ 2023-04-13 19:37 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Segher Boessenkool, Eric Botcazou, gcc-patches, Richard Biener,
	Richard Sandiford



On 4/13/23 07:45, Jakub Jelinek wrote:
> On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:
>> Bootstrap was successful with v3, but there's hundreds of testsuite failures
>> due to the simplify-rtx hunk.  compile/20070520-1.c for example when
>> compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d
>>
>> Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
>> today.
> 
> So, seems to me this is because loop-iv.cc asks for invalid RTL to be
> simplified, it calls simplify_gen_binary (AND, SImode,
> (subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
>          (const_int 4294967295 [0xffffffff])) 0),
> (const_int 4294967295 [0xffffffff]))
> but 0xffffffff is not valid SImode CONST_INT, and unlike previously
> we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
> word_mode optimize that into the op0, so the invalid constant is emitted
> into the IL and checking fails.
> 
> The following patch fixes that (and we optimize that & -1 away even earlier
> with that).
> 
> Could you please just quickly try to apply this patch, make in the stage3
> directory followed by
> make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
> (with all tests that regressed previously), whether this is the only spot
> or whether we need to fix some other place too?
> 
> 2023-04-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
> 	of GEN_INT.
That fixes all the regressions and looks OK to me.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-04-13 19:37 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05  9:16 [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040] Jakub Jelinek
2023-04-05 13:14 ` Jeff Law
2023-04-05 14:51   ` Jakub Jelinek
2023-04-05 16:17     ` Jeff Law
2023-04-05 16:48       ` Jakub Jelinek
2023-04-05 17:31         ` Jeff Law
2023-04-06  9:31           ` Richard Sandiford
2023-04-06  9:37             ` Li, Pan2
2023-04-06 14:49               ` Jeff Law
2023-04-06 14:45             ` Jeff Law
2023-04-06 10:15           ` Eric Botcazou
2023-04-06 10:31             ` [PATCH] combine: Fix simplify_comparison AND handling " Jakub Jelinek
2023-04-06 10:51               ` Eric Botcazou
2023-04-06 11:37                 ` Jakub Jelinek
2023-04-06 14:21                   ` Eric Botcazou
2023-04-09  0:25                     ` Jeff Law
2023-04-10  7:10                       ` Jakub Jelinek
2023-04-12  1:26                         ` Jeff Law
2023-04-12  6:21                           ` Jakub Jelinek
2023-04-12 10:02                             ` [PATCH] combine, v3: Fix " Jakub Jelinek
2023-04-12 14:17                               ` Jeff Law
2023-04-12 14:30                                 ` Jakub Jelinek
2023-04-12 15:24                               ` Segher Boessenkool
2023-04-12 16:58                               ` [PATCH] combine, v4: " Jakub Jelinek
2023-04-13  4:05                                 ` Jeff Law
2023-04-13 10:57                                   ` Segher Boessenkool
2023-04-13 12:35                                     ` Jeff Law
2023-04-13 13:45                                       ` [PATCH] loop-iv: Fix up bounds computation Jakub Jelinek
2023-04-13 15:07                                         ` Jeff Law
2023-04-13 19:37                                         ` Jeff Law
2023-04-12 13:29                             ` [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] Jeff Law
2023-04-09  1:15                   ` Jeff Law
2023-04-10  5:13                     ` Hongtao Liu
2023-04-10  5:15                       ` Hongtao Liu
2023-04-06 14:35               ` Jeff Law
2023-04-06 15:06               ` Jeff Law
2023-04-06 14:53             ` [PATCH] dse: Handle SUBREGs of word REGs differently " Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).