[Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu
@ 2020-04-12  2:49 qrzhang at gatech dot edu
  2020-04-14  6:41 ` [Bug rtl-optimization/94567] [10 Regression] " marxin at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: qrzhang at gatech dot edu @ 2020-04-12  2:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

            Bug ID: 94567
           Summary: wrong code at -O2 and -O3 on x86_64-linux-gnu
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: qrzhang at gatech dot edu
  Target Milestone: ---

It's a recent regression. Bisection points to g:529ea7d9596b26ba103578eeab


$ gcc-trunk -v
gcc version 10.0.1 20200411 (experimental) [master revision
bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566] (GCC)


$ gcc-trunk abc.c ; ./a.out
4

$ gcc-trunk -O2 abc.c ; ./a.out
0


$ cat abc.c
volatile int a = 1, b;
short c, d = 4, e = 53736, f = 2, g;
int(h)(int i, int j) { return i && j ? 0 : i + j; }
int main() {
  for (; a; a = 0) {
    unsigned short k = e;
    g = k >> 3;
    if (h(g < (f || c), b))
      d = 0;
  }
  printf("%X\n", d);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
@ 2020-04-14  6:41 ` marxin at gcc dot gnu.org
  2020-04-14 11:28 ` law at redhat dot com
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-04-14  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org
            Summary|wrong code at -O2 and -O3   |[10 Regression] wrong code
                   |on x86_64-linux-gnu         |at -O2 and -O3 on
                   |                            |x86_64-linux-gnu
      Known to work|                            |9.3.0
      Known to fail|                            |10.0
   Last reconfirmed|                            |2020-04-14
     Ever confirmed|0                           |1
           Priority|P3                          |P1
   Target Milestone|---                         |10.0
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed, slightly modified test-case:

$ cat pr94567.c
volatile int a = 1, b;
short c, d = 4, e = 53736, f = 2, g;
int(h)(int i, int j) { return i && j ? 0 : i + j; }
int main() {
  for (; a; a = 0) {
    unsigned short k = e;
    g = k >> 3;
    if (h(g < (f || c), b))
      d = 0;
  }
  if (d != 4)
    __builtin_abort();
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
  2020-04-14  6:41 ` [Bug rtl-optimization/94567] [10 Regression] " marxin at gcc dot gnu.org
@ 2020-04-14 11:28 ` law at redhat dot com
  2020-04-14 11:45 ` [Bug target/94567] " rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-14 11:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #2 from Jeffrey A. Law <law at redhat dot com> ---
THanks Martin.  I'm already well into this one :-)

I'm pretty sure the problem is testqi_ext_3 and the code we generate when
splitting it.

Consider (from my own slightly reduced testcase, but I'm sure yours will have
something similar):

(insn 25 22 26 3 (set (reg:CCNO 17 flags)
        (compare:CCNO (zero_extract:SI (reg:HI 104 [ e ])
                (const_int 13 [0xd])
                (const_int 3 [0x3]))
            (const_int 0 [0]))) "j.c":15:17 449 {*testqi_ext_3}
     (expr_list:REG_DEAD (reg:HI 104 [ e ])
        (nil)))

Which gets "split" into:

(insn 124 22 26 3 (set (reg:CCNO 17 flags)
        (compare:CCNO (and:HI (reg:HI 104 [ e ])
                (const_int -8 [0xfffffffffffffff8]))
            (const_int 0 [0]))) "j.c":15:17 444 {*testhi_1}
     (expr_list:REG_DEAD (reg:HI 104 [ e ])
        (nil)))

But consider the affect on the flags, particularly if the HImode sign bit is on
in (reg 104).   In the pre-split we extract a 13 bit field starting at bit 3
and zero extend it to SImode, then compare it against zero.  No matter what the
value in (reg 104) the sign flag bit will be off.

In the post-split version we do a HImode AND and thus the sign flag bit will be
on if bit 16 in (reg 104) is on.

The pr90275 changes indirectly enabled generation of testqi_ext_3, but after a
day of looking at the dumps, I'm confident the 90275 changes are correct/safe
and that we're really dealing with a latent issue in testqi_ext_3.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
  2020-04-14  6:41 ` [Bug rtl-optimization/94567] [10 Regression] " marxin at gcc dot gnu.org
  2020-04-14 11:28 ` law at redhat dot com
@ 2020-04-14 11:45 ` rguenth at gcc dot gnu.org
  2020-04-14 11:48 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-14 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-* i?86-*-*
          Component|rtl-optimization            |target

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Target issue then.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (2 preceding siblings ...)
  2020-04-14 11:45 ` [Bug target/94567] " rguenth at gcc dot gnu.org
@ 2020-04-14 11:48 ` rguenth at gcc dot gnu.org
  2020-04-14 11:59 ` law at redhat dot com
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-14 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Wonder if the simplest thing is to turn the zero_extract into a zero_extend
so we can maintain the and in SImode.  And whether combine will ever
generate a zero_extract that extracts the SImode MSB.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (3 preceding siblings ...)
  2020-04-14 11:48 ` rguenth at gcc dot gnu.org
@ 2020-04-14 11:59 ` law at redhat dot com
  2020-04-14 12:45 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-14 11:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #5 from Jeffrey A. Law <law at redhat dot com> ---
I've pondered just killing that pattern, but I'm pretty sure there'll be
notable regressions.  There was a clear regression we fixed in gcc-6 due to not
handling QImode operands in that pattern.

What I'm playing with is looking at pos + len and if it hits the sign bit in
the operand's mode, then widening the mode.  Something like this:

+  /* If the mask is going to have the sign bit set in the mode
+     we want to do the comparison, then we must widen the target
+     mode.  Otherwise the flags will be incorrect when we split
+     this into a (compare (and (op0) (mask))) and a subsequent
+     test like LE will get the wrong result.  */
+  if (mode < E_DImode
+      && pos + len == GET_MODE_PRECISION (mode))
+    {
+      mode = GET_MODE_WIDER_MODE (mode).require ();
+      val = gen_lowpart (mode, val);
+    }
+

Which I think is roughly what you were suggesting.  Mine does it with a SUBREG,
so it matches existing patterns...  A ZERO_EXTEND may well require new
patterns.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (4 preceding siblings ...)
  2020-04-14 11:59 ` law at redhat dot com
@ 2020-04-14 12:45 ` rguenth at gcc dot gnu.org
  2020-04-14 13:20 ` law at redhat dot com
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-14 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #5)
> I've pondered just killing that pattern, but I'm pretty sure there'll be
> notable regressions.  There was a clear regression we fixed in gcc-6 due to
> not handling QImode operands in that pattern.
> 
> What I'm playing with is looking at pos + len and if it hits the sign bit in
> the operand's mode, then widening the mode.  Something like this:
> 
> +  /* If the mask is going to have the sign bit set in the mode
> +     we want to do the comparison, then we must widen the target
> +     mode.  Otherwise the flags will be incorrect when we split
> +     this into a (compare (and (op0) (mask))) and a subsequent
> +     test like LE will get the wrong result.  */
> +  if (mode < E_DImode
> +      && pos + len == GET_MODE_PRECISION (mode))
> +    {
> +      mode = GET_MODE_WIDER_MODE (mode).require ();
> +      val = gen_lowpart (mode, val);
> +    }
> +
> 
> Which I think is roughly what you were suggesting.  Mine does it with a
> SUBREG, so it matches existing patterns...  A ZERO_EXTEND may well require
> new patterns.

I see.  A subreg should work as well indeed.  Note I think simply using
the original mode combine used is safer than using some random other(?)
then we know the comparison semantics are unchanged.  Of course your
suggested change above will cause less changes at this point.

There's also

   && ix86_match_ccmode (insn,
                         /* *testdi_1 requires CCZmode if the mask has bit
                            31 set and all bits above it clear.  */
                         GET_MODE (operands[2]) == DImode
                         && INTVAL (operands[3]) + INTVAL (operands[4]) == 32
                         ? CCZmode : CCNOmode)"

which is likely where the original correctness issue lies - it fails
to check for the similar cases and HImode (and SImode and QImode).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (5 preceding siblings ...)
  2020-04-14 12:45 ` rguenth at gcc dot gnu.org
@ 2020-04-14 13:20 ` law at redhat dot com
  2020-04-15  9:57 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-14 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #7 from Jeffrey A. Law <law at redhat dot com> ---
I think it's trying to use smaller modes because the encodings can be smaller. 
In other cases it changes the mode to avoid partial register stalls.  It's a
bit of a mess.

WRT the fragment you mentioned, I looked at that repeatedly trying to ascertain
the real motivation and whether or not that code needed generalization to
handle this case or was a misguided attempt to fix another instance of this
issue.

The conclusion I came to was that hunk of code may well be working around
another instance of this same problem, but it was neither generalizable to this
BZ nor would my approach totally fix that instance.

We may be able to remove the hack in the testqi_ext_3 pattern, but I think the
corresponding hack in testdi_1 would have to stay unless we found a way to
merge testdi_1 into the more general test<mode>_1 pattern.  Neither of those
seems terribly appropriate right now.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (6 preceding siblings ...)
  2020-04-14 13:20 ` law at redhat dot com
@ 2020-04-15  9:57 ` rguenth at gcc dot gnu.org
  2020-04-15 16:57 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-15  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #7)
> I think it's trying to use smaller modes because the encodings can be
> smaller.  In other cases it changes the mode to avoid partial register
> stalls.  It's a bit of a mess.
> 
> WRT the fragment you mentioned, I looked at that repeatedly trying to
> ascertain the real motivation and whether or not that code needed
> generalization to handle this case or was a misguided attempt to fix another
> instance of this issue.
> 
> The conclusion I came to was that hunk of code may well be working around
> another instance of this same problem, but it was neither generalizable to
> this BZ nor would my approach totally fix that instance.
> 
> We may be able to remove the hack in the testqi_ext_3 pattern, but I think
> the corresponding hack in testdi_1 would have to stay unless we found a way
> to merge testdi_1 into the more general test<mode>_1 pattern.  Neither of
> those seems terribly appropriate right now.

agreed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (7 preceding siblings ...)
  2020-04-15  9:57 ` rguenth at gcc dot gnu.org
@ 2020-04-15 16:57 ` jakub at gcc dot gnu.org
  2020-04-15 17:10 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-15 16:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> There's also
> 
>    && ix86_match_ccmode (insn,
>                          /* *testdi_1 requires CCZmode if the mask has bit
>                             31 set and all bits above it clear.  */
>                          GET_MODE (operands[2]) == DImode
>                          && INTVAL (operands[3]) + INTVAL (operands[4]) == 32
>                          ? CCZmode : CCNOmode)"
> 
> which is likely where the original correctness issue lies - it fails
> to check for the similar cases and HImode (and SImode and QImode).

At least in my understanding of this PR, the above addition in PR94088 isn't
where the problem is, but the problem is that we shouldn't require CCZmode just
in that case, but also in other cases where the splitting would change the
meaning of the sign flag.
If (GET_MODE_PRECISION (<MODE>mode) < INTVAL (operands[3])), then the unsplit
pattern will have SF always false, so we need to ensure that either the pattern
is using CCZmode in which case we don't care about the sign, or that the mask
we'll use doesn't have the most significant bit set.
If (GET_MODE_PRECISION (<MODE>mode) == INTVAL (operands[3])), then the SF will
be equal to the most significant bit before splitting.  But then we'd need the
resulting insn to use that exact mode, which would mean only trivial zero
extraction from position 0 and all bits, combiner should have simplified that
IMNSHO.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (8 preceding siblings ...)
  2020-04-15 16:57 ` jakub at gcc dot gnu.org
@ 2020-04-15 17:10 ` jakub at gcc dot gnu.org
  2020-04-15 17:14 ` law at redhat dot com
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-15 17:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So something like:
--- gcc/config/i386/i386.md.jj  2020-03-16 22:56:55.556043275 +0100
+++ gcc/config/i386/i386.md     2020-04-15 19:07:04.405933639 +0200
@@ -8732,8 +8732,20 @@
    && ix86_match_ccmode (insn,
                         /* *testdi_1 requires CCZmode if the mask has bit
                            31 set and all bits above it clear.  */
-                        GET_MODE (operands[2]) == DImode
-                        && INTVAL (operands[3]) + INTVAL (operands[4]) == 32
+                        (GET_MODE (operands[2]) == DImode
+                         && INTVAL (operands[3]) + INTVAL (operands[4]) == 32)
+                        /* If zero_extract mode precision is the same
+                           as len, the SF of the zero_extract
+                           comparison will be the most significant
+                           extracted bit, but this could be matched
+                           after splitting only for pos 0 len all bits
+                           trivial extractions.  Require CCZmode.  */
+                        || (GET_MODE_PRECISION (<MODE>mode)
+                            == INTVAL (operands[3]))
+                        /* Otherwise, require CCZmode if we'd use a mask
+                           with the most significant bit set.  */
+                        || (INTVAL (operands[3]) + INTVAL (operands[4])
+                            == GET_MODE (operands[2]))
                         ? CCZmode : CCNOmode)"
   "#"
   "&& 1"
@@ -8758,7 +8770,12 @@
     }

   /* Small HImode tests can be converted to QImode.  */
-  if (register_operand (val, HImode) && pos + len <= 8)
+  if (register_operand (val, HImode)
+      && (pos + len < 8
+         /* If the mask would include all bits, ensure we don't
+            care about the SF.  */
+         || (pos + len == 8
+             && GET_MODE (operands[0]) == CCZmode)))
     {
       val = gen_lowpart (QImode, val);
       mode = QImode;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (9 preceding siblings ...)
  2020-04-15 17:10 ` jakub at gcc dot gnu.org
@ 2020-04-15 17:14 ` law at redhat dot com
  2020-04-15 17:48 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-15 17:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #11 from Jeffrey A. Law <law at redhat dot com> ---
Rather than extending that hack, I think just widening the mode when the sign
bit is being tested (c#5) is simpler and easier to understand.  The bits you're
changing should be killed rather than extended to handle more cases.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (10 preceding siblings ...)
  2020-04-15 17:14 ` law at redhat dot com
@ 2020-04-15 17:48 ` jakub at gcc dot gnu.org
  2020-04-15 23:12 ` law at redhat dot com
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-15 17:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #11)
> Rather than extending that hack, I think just widening the mode when the
> sign bit is being tested (c#5) is simpler and easier to understand.  The
> bits you're changing should be killed rather than extended to handle more
> cases.

That is not a hack.  x86_64 doesn't have an instruction that ands a 64-bit
reg/mem with a 32-bit zero extended immediate (nor 64-bit immediate), so when
we want to use reg against 32-bit zero extended, we need to use 32-bit
instruction and that has different behavior for the SF, so we must ensure we
don't care about that most significant bit.

Now for these other cases, perhaps we could instead user wider mode if
possible, but it isn't possible always.
So, combine my patch with the last condition changed into >= 32 from
== GET_MODE (operands[2]) and then apply what you have in #c5, but with
hardcoded SImode instead of wider mode - we want to avoid introducing HImode
stuff when there wasn't before, it is both larger and slower - and only do it
if GET_MODE (operands[0]) isn't CCZmode.
If pos + len >= 32, such in the pos + len == 64 special case where we turn
testing bits pos (< 32) up to most significant into a testq which has that
sign-extended 32-bit immediate, then the pattern before splitting would have
always SF zero (unless pos is 0) but split pattern would always copy there the
MSB of the 64-bit register.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (11 preceding siblings ...)
  2020-04-15 17:48 ` jakub at gcc dot gnu.org
@ 2020-04-15 23:12 ` law at redhat dot com
  2020-04-16 12:30 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-15 23:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #13 from Jeffrey A. Law <law at redhat dot com> ---
Sigh.  That code is good in that it's rejecting matching the pattern for the
SImode sign bit that we can't implement.   For some dumb reason I was thinking
it was changing how we split, but it's actually the main condition.  So calling
it a "hack" was a mistake.

The only time we have to widen is when the pos + len exactly hits the sign bit
in the operand's mode, which is what I thought my patch did.  Certainly we
don't want to be changing sizes unnecessarily, that's a given.  I guess what I
did could be refined to allow that case for CCZmode.

Your test of:

+                        || (INTVAL (operands[3]) + INTVAL (operands[4])
+                            == GET_MODE (operands[2]))

Looks wrong.  Didn't you mean to get the precision of the mode of operand2?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (12 preceding siblings ...)
  2020-04-15 23:12 ` law at redhat dot com
@ 2020-04-16 12:30 ` jakub at gcc dot gnu.org
  2020-04-16 22:45 ` law at redhat dot com
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-16 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 48288
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48288&action=edit
gcc10-pr94567.patch

So perhaps this?  In the condition exclude cases where we can't widen the
problematic cases (and, as I figured out today, we can't widen if val is a MEM
because we may not read from memory more than was read originally and also
paradoxical subregs of MEM which are accepted as register_operand before reload
could be problematic) and during splitting ensure we don't narrow if we
couldn't widen again and widen if possible?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (13 preceding siblings ...)
  2020-04-16 12:30 ` jakub at gcc dot gnu.org
@ 2020-04-16 22:45 ` law at redhat dot com
  2020-04-17 14:58 ` cvs-commit at gcc dot gnu.org
  2020-04-17 15:11 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: law at redhat dot com @ 2020-04-16 22:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #15 from Jeffrey A. Law <law at redhat dot com> ---
Comment on attachment 48288
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48288
gcc10-pr94567.patch

I think that'll work. If it passes, consider it approved.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (14 preceding siblings ...)
  2020-04-16 22:45 ` law at redhat dot com
@ 2020-04-17 14:58 ` cvs-commit at gcc dot gnu.org
  2020-04-17 15:11 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-04-17 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1dfc50232dcb703454db4f54c538042a32be2138

commit r10-7773-g1dfc50232dcb703454db4f54c538042a32be2138
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Apr 17 16:56:12 2020 +0200

    i386: Fix up *testqi_ext_3 define_insn_and_split [PR94567]

    As the testcase shows, there are unfortunately more problematic cases
    in *testqi_ext_3 if the mode is not CCZmode, because the sign flag might
    not behave the same between the insn with zero_extract and what we split it
    into.

    The previous fix to the insn condition was because *testdi_1 for mask with
    upper 32-bits clear and bit 31 set is implemented using SImode test and
thus
    SF is set depending on that bit 31 rather than on always cleared.

    But we can have other cases.  On the zero_extract (which has <MODE>mode),
    we can have either the pos + len == precision of <MODE>mode, or
    pos + len < precision of <MODE>mode cases.  The former one copies the most
    significant bit into SF, the latter will have SF always cleared.

    For the former case, either it is a zero_extract from a larger mode, but
    then when we perform test in that larger mode, SF will be always clear and
    thus mismatch from the zero_extract case (so we need to enforce CCZmode),
    or it will be a zero_extract from same mode with pos 0 and len equal to
    mode precision, such zero_extracts should have been really simplified
    into their first operand.

    For the latter case, when SF is always clear on the define_insn with
    zero_extract, we need to split into something that doesn't sometimes set
    SF, i.e. it has to be a test with mask that doesn't have the most
    significant bit set.  In some cases it can be achieved through using test
    in a wider mode (e.g. in the testcase, there is
    (zero_extract:SI (reg:HI) (const_int 13) (const_int 3))
    which will always set SF to 0, but we split it into
    (and:HI (reg:HI) (const_int -8))
    which will copy the MSB of (reg:HI) into SF, but we can do:
    (and:SI (subreg:SI (reg:HI) 0) (const_int 0xfff8))
    which will keep SF always cleared), but there are various cases where we
    can't (when already using DImode, or when SImode and we'd turned it into
    the problematic *testdi_1 implemented using SImode test, or when
    the val operand is a MEM (we don't want to read from memory more than
    the user originally wanted), paradoxical subreg of MEM could be problematic
    too if we through the narrowing end up with a MEM).

    So, the patch attempts to require CCZmode (and not CCNOmode) if it can't
    really ensure the SF will have same meaning between the define_insn and
what
    we split it into, and if we decide we allow CCNOmode, it needs to avoid
    performing narrowing and/or widen if pos + len would indicate we'd have MSB
    set in the mask.

    2020-04-17  Jakub Jelinek  <jakub@redhat.com>
                Jeff Law  <law@redhat.com>

            PR target/94567
            * config/i386/i386.md (*testqi_ext_3): Use CCZmode rather than
            CCNOmode in ix86_match_ccmode if len is equal to <MODE>mode
precision,
            or pos + len >= 32, or pos + len is equal to operands[2] precision
            and operands[2] is not a register operand.  During splitting
perform
            SImode AND if operands[0] doesn't have CCZmode and pos + len is
            equal to mode precision.

            * gcc.c-torture/execute/pr94567.c: New test.

    Co-Authored-By: Jeff Law <law@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu
  2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
                   ` (15 preceding siblings ...)
  2020-04-17 14:58 ` cvs-commit at gcc dot gnu.org
@ 2020-04-17 15:11 ` jakub at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-04-17 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-04-17 15:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-12  2:49 [Bug tree-optimization/94567] New: wrong code at -O2 and -O3 on x86_64-linux-gnu qrzhang at gatech dot edu
2020-04-14  6:41 ` [Bug rtl-optimization/94567] [10 Regression] " marxin at gcc dot gnu.org
2020-04-14 11:28 ` law at redhat dot com
2020-04-14 11:45 ` [Bug target/94567] " rguenth at gcc dot gnu.org
2020-04-14 11:48 ` rguenth at gcc dot gnu.org
2020-04-14 11:59 ` law at redhat dot com
2020-04-14 12:45 ` rguenth at gcc dot gnu.org
2020-04-14 13:20 ` law at redhat dot com
2020-04-15  9:57 ` rguenth at gcc dot gnu.org
2020-04-15 16:57 ` jakub at gcc dot gnu.org
2020-04-15 17:10 ` jakub at gcc dot gnu.org
2020-04-15 17:14 ` law at redhat dot com
2020-04-15 17:48 ` jakub at gcc dot gnu.org
2020-04-15 23:12 ` law at redhat dot com
2020-04-16 12:30 ` jakub at gcc dot gnu.org
2020-04-16 22:45 ` law at redhat dot com
2020-04-17 14:58 ` cvs-commit at gcc dot gnu.org
2020-04-17 15:11 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).