[PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
@ 2015-02-18 11:36 Kyrill Tkachov
  2015-02-18 12:33 ` Maxim Kuvyrkov
  2015-02-20 14:03 ` Marcus Shawcroft
  0 siblings, 2 replies; 7+ messages in thread
From: Kyrill Tkachov @ 2015-02-18 11:36 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 2970 bytes --]

Hi all,

This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
pattern and its associated splitters. The problem is that for the 2nd
alternative it will split a right-shift into a SISD left-shift by the 
negated
amount to be shifted by (the ushl instruction allows such semantics).
The splitter generates this RTL:

(set (match_dup 2)
        (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
(set (match_dup 0)
        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))

The problem here is that the shift amount register is negated without 
telling
the register allocator about it (and it can't figure it out itself).
So if you try to use the register that operand 2 is assigned to later on,
you get the negated shift amount instead!

The testcase in the patch demonstrates the simple code that can get 
miscompiled
due to this behaviour.

The solution in this patch is to negate the shift amount into the output
operand (operand 0) and mark it as an earlyclobber in that alternative.
This is actually exactly what the very similar
*aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
I believe this is the safest and simplest fix at this stage.

This bug was exposed on the Linaro 4.9 branch that happened to have the 
perfect
storm of costs and register pressure and ended up miscompiling
the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
  Linaro compiler, generating essentially code like:

.L141:
     neg    d8, d8               //d8 negated!
     ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
     fmov    w0, s0
         <...irrelevant code...>
         b    .L140
<...>
.L140:
     fmov    w0, s8  // s8/d8 used and incremented assuming it had not 
changed at L141
     add    w0, w0, 1
     fmov    s8, w0
     fmov    w1, s10
     cmp    w0, w1
     bne    .L141


Basically d8 is negated and later used as if it had not been at .L140 
leading
to completely wrong behaviour.

With this patch that particular part of the assembly now contains at L141:
         neg     d0, d8
         ushl    v0.2s, v11.2s, v0.2s
         fmov    w0, s0

Leaving the original shift amount in d8 intact.

This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
pattern was introduced for 4.9)
Bootstrapped and tested on trunk and 4.9.

Ok for trunk and 4.9?

Thanks,
Kyrill

2015-02-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (*aarch64_lshr_sisd_or_int_<mode>3):
     Mark operand 0 as earlyclobber in 2nd alternative.
     (1st define_split below *aarch64_lshr_sisd_or_int_<mode>3):
     Write negated shift amount into QI lowpart operand 0 and use it
     in the shift step.
     (2nd define_split below *aarch64_lshr_sisd_or_int_<mode>3): Likewise.

2015-02-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/sisd-shft-neg_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-sisd-shiftrt.patch --]
[-- Type: text/x-patch; name=aarch64-sisd-shiftrt.patch, Size: 2746 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1f4169e..8f157ce 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3360,7 +3360,7 @@ (define_insn "*aarch64_ashl_sisd_or_int_<mode>3"
 
 ;; Logical right shift using SISD or Integer instruction
 (define_insn "*aarch64_lshr_sisd_or_int_<mode>3"
-  [(set (match_operand:GPI 0 "register_operand" "=w,w,r")
+  [(set (match_operand:GPI 0 "register_operand" "=w,&w,r")
         (lshiftrt:GPI
           (match_operand:GPI 1 "register_operand" "w,w,r")
           (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,rUs<cmode>")))]
@@ -3379,11 +3379,13 @@ (define_split
            (match_operand:DI 1 "aarch64_simd_register")
            (match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
-  ""
+        (unspec:DI [(match_dup 1) (match_dup 3)] UNSPEC_SISD_USHL))]
+  {
+    operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )
 
 (define_split
@@ -3392,11 +3394,13 @@ (define_split
            (match_operand:SI 1 "aarch64_simd_register")
            (match_operand:QI 2 "aarch64_simd_register")))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))]
-  ""
+        (unspec:SI [(match_dup 1) (match_dup 3)] UNSPEC_USHL_2S))]
+  {
+    operands[3] = gen_lowpart (QImode, operands[0]);
+  }
 )
 
 ;; Arithmetic right shift using SISD or Integer instruction
diff --git a/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c b/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c
new file mode 100644
index 0000000..c091657
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sisd-shft-neg_1.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
+
+extern void abort (void);
+
+#define force_simd_si(v) asm volatile ("mov %s0, %1.s[0]" :"=w" (v) :"w" (v) :)
+
+unsigned int
+shft_add (unsigned int a, unsigned int b)
+{
+  unsigned int c;
+
+  force_simd_si (a);
+  force_simd_si (b);
+  c = a >> b;
+  force_simd_si (c);
+
+  return c + b;
+}
+
+int
+main (void)
+{
+  unsigned int i = 0;
+  unsigned int a = 0xdeadbeef;
+
+  for (i = 0; i < 32; i++)
+  {
+    unsigned int exp = (a / (1 << i) + i);
+    unsigned int got = shft_add (a, i);
+
+    if (exp != got)
+      abort ();
+  }
+
+  return 0;
+}
+

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 11:36 [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns Kyrill Tkachov
@ 2015-02-18 12:33 ` Maxim Kuvyrkov
  2015-02-18 12:38   ` Maxim Kuvyrkov
  2015-02-18 13:42   ` Kyrill Tkachov
  2015-02-20 14:03 ` Marcus Shawcroft
  1 sibling, 2 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2015-02-18 12:33 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 3286 bytes --]

On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> Hi all,
> 
> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
> pattern and its associated splitters. The problem is that for the 2nd
> alternative it will split a right-shift into a SISD left-shift by the negated
> amount to be shifted by (the ushl instruction allows such semantics).
> The splitter generates this RTL:
> 
> (set (match_dup 2)
>       (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
> (set (match_dup 0)
>       (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
> 
> The problem here is that the shift amount register is negated without telling
> the register allocator about it (and it can't figure it out itself).
> So if you try to use the register that operand 2 is assigned to later on,
> you get the negated shift amount instead!
> 
> The testcase in the patch demonstrates the simple code that can get miscompiled
> due to this behaviour.
> 
> The solution in this patch is to negate the shift amount into the output
> operand (operand 0) and mark it as an earlyclobber in that alternative.
> This is actually exactly what the very similar
> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
> I believe this is the safest and simplest fix at this stage.
> 
> This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
> storm of costs and register pressure and ended up miscompiling
> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
> Linaro compiler, generating essentially code like:
> 
> .L141:
>    neg    d8, d8               //d8 negated!
>    ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>    fmov    w0, s0
>        <...irrelevant code...>
>        b    .L140
> <...>
> .L140:
>    fmov    w0, s8  // s8/d8 used and incremented assuming it had not changed at L141
>    add    w0, w0, 1
>    fmov    s8, w0
>    fmov    w1, s10
>    cmp    w0, w1
>    bne    .L141
> 
> 
> Basically d8 is negated and later used as if it had not been at .L140 leading
> to completely wrong behaviour.
> 
> With this patch that particular part of the assembly now contains at L141:
>        neg     d0, d8
>        ushl    v0.2s, v11.2s, v0.2s
>        fmov    w0, s0
> 
> Leaving the original shift amount in d8 intact.
> 
> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
> pattern was introduced for 4.9)
> Bootstrapped and tested on trunk and 4.9.
> 
> Ok for trunk and 4.9?

First of all, applauses!  I realize how difficult it was to reduce this problem.

Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards.  In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).

Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch?  If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


[-- Attachment #2: bz1149.patch --]
[-- Type: application/octet-stream, Size: 1868 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1f4169e..65fdec9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3363,7 +3363,8 @@
   [(set (match_operand:GPI 0 "register_operand" "=w,w,r")
         (lshiftrt:GPI
           (match_operand:GPI 1 "register_operand" "w,w,r")
-          (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,rUs<cmode>")))]
+          (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,rUs<cmode>")))
+   (clobber (match_scratch:QI 3 "=X,w,X"))]
   ""
   "@
    ushr\t%<rtn>0<vas>, %<rtn>1<vas>, %2
@@ -3377,12 +3378,13 @@
   [(set (match_operand:DI 0 "aarch64_simd_register")
         (lshiftrt:DI
            (match_operand:DI 1 "aarch64_simd_register")
-           (match_operand:QI 2 "aarch64_simd_register")))]
+           (match_operand:QI 2 "aarch64_simd_register")))
+   (clobber (match_scratch:QI 3))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
+        (unspec:DI [(match_dup 1) (match_dup 3)] UNSPEC_SISD_USHL))]
   ""
 )
 
@@ -3390,12 +3392,13 @@
   [(set (match_operand:SI 0 "aarch64_simd_register")
         (lshiftrt:SI
            (match_operand:SI 1 "aarch64_simd_register")
-           (match_operand:QI 2 "aarch64_simd_register")))]
+           (match_operand:QI 2 "aarch64_simd_register")))
+   (clobber (match_scratch:DI 3))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))]
+        (unspec:SI [(match_dup 1) (match_dup 3)] UNSPEC_USHL_2S))]
   ""
 )
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 12:33 ` Maxim Kuvyrkov
@ 2015-02-18 12:38   ` Maxim Kuvyrkov
  2015-02-18 13:42   ` Kyrill Tkachov
  1 sibling, 0 replies; 7+ messages in thread
From: Maxim Kuvyrkov @ 2015-02-18 12:38 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Kyrill Tkachov, GCC Patches, Marcus Shawcroft, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

On Feb 18, 2015, at 3:32 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:

> First of all, applauses!  I realize how difficult it was to reduce this problem.
> 
> Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards.  In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).
> 
> Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch?  If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.

There is a typo in the patch I sent (mode for last match_scratch should be QI, not DI).  Corrected patch attached.

--
Maxim Kuvyrkov
www.linaro.org



[-- Attachment #2: bz1149.patch --]
[-- Type: application/octet-stream, Size: 1868 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1f4169e..2725e78 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3363,7 +3363,8 @@
   [(set (match_operand:GPI 0 "register_operand" "=w,w,r")
         (lshiftrt:GPI
           (match_operand:GPI 1 "register_operand" "w,w,r")
-          (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,rUs<cmode>")))]
+          (match_operand:QI 2 "aarch64_reg_or_shift_imm_<mode>" "Us<cmode>,w,rUs<cmode>")))
+   (clobber (match_scratch:QI 3 "=X,w,X"))]
   ""
   "@
    ushr\t%<rtn>0<vas>, %<rtn>1<vas>, %2
@@ -3377,12 +3378,13 @@
   [(set (match_operand:DI 0 "aarch64_simd_register")
         (lshiftrt:DI
            (match_operand:DI 1 "aarch64_simd_register")
-           (match_operand:QI 2 "aarch64_simd_register")))]
+           (match_operand:QI 2 "aarch64_simd_register")))
+   (clobber (match_scratch:QI 3))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:DI [(match_dup 1) (match_dup 2)] UNSPEC_SISD_USHL))]
+        (unspec:DI [(match_dup 1) (match_dup 3)] UNSPEC_SISD_USHL))]
   ""
 )
 
@@ -3390,12 +3392,13 @@
   [(set (match_operand:SI 0 "aarch64_simd_register")
         (lshiftrt:SI
            (match_operand:SI 1 "aarch64_simd_register")
-           (match_operand:QI 2 "aarch64_simd_register")))]
+           (match_operand:QI 2 "aarch64_simd_register")))
+   (clobber (match_scratch:QI 3))]
   "TARGET_SIMD && reload_completed"
-  [(set (match_dup 2)
+  [(set (match_dup 3)
         (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
    (set (match_dup 0)
-        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))]
+        (unspec:SI [(match_dup 1) (match_dup 3)] UNSPEC_USHL_2S))]
   ""
 )
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 12:33 ` Maxim Kuvyrkov
  2015-02-18 12:38   ` Maxim Kuvyrkov
@ 2015-02-18 13:42   ` Kyrill Tkachov
  2015-02-18 13:46     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 7+ messages in thread
From: Kyrill Tkachov @ 2015-02-18 13:42 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw


On 18/02/15 12:32, Maxim Kuvyrkov wrote:
> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> Hi all,
>>
>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
>> pattern and its associated splitters. The problem is that for the 2nd
>> alternative it will split a right-shift into a SISD left-shift by the negated
>> amount to be shifted by (the ushl instruction allows such semantics).
>> The splitter generates this RTL:
>>
>> (set (match_dup 2)
>>        (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>> (set (match_dup 0)
>>        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>
>> The problem here is that the shift amount register is negated without telling
>> the register allocator about it (and it can't figure it out itself).
>> So if you try to use the register that operand 2 is assigned to later on,
>> you get the negated shift amount instead!
>>
>> The testcase in the patch demonstrates the simple code that can get miscompiled
>> due to this behaviour.
>>
>> The solution in this patch is to negate the shift amount into the output
>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>> This is actually exactly what the very similar
>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
>> I believe this is the safest and simplest fix at this stage.
>>
>> This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
>> storm of costs and register pressure and ended up miscompiling
>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>> Linaro compiler, generating essentially code like:
>>
>> .L141:
>>     neg    d8, d8               //d8 negated!
>>     ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>     fmov    w0, s0
>>         <...irrelevant code...>
>>         b    .L140
>> <...>
>> .L140:
>>     fmov    w0, s8  // s8/d8 used and incremented assuming it had not changed at L141
>>     add    w0, w0, 1
>>     fmov    s8, w0
>>     fmov    w1, s10
>>     cmp    w0, w1
>>     bne    .L141
>>
>>
>> Basically d8 is negated and later used as if it had not been at .L140 leading
>> to completely wrong behaviour.
>>
>> With this patch that particular part of the assembly now contains at L141:
>>         neg     d0, d8
>>         ushl    v0.2s, v11.2s, v0.2s
>>         fmov    w0, s0
>>
>> Leaving the original shift amount in d8 intact.
>>
>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>> pattern was introduced for 4.9)
>> Bootstrapped and tested on trunk and 4.9.
>>
>> Ok for trunk and 4.9?
> First of all, applauses!  I realize how difficult it was to reduce this problem.

Thanks!
>
> Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards.  In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).
>
> Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch?  If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.

I considered it (but didn't try it) because we end up demanding a 
scratch register unnecessarily for the two alternatives that don't split 
which might pessimize register allocation.

For stage 4 I think my proposed fix is the minimal one and it keeps 
consistent with the other patterns in that area that were added all 
together with:
https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

Kyrill

>
> Thank you,
>
> --
> Maxim Kuvyrkov
> www.linaro.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 13:42   ` Kyrill Tkachov
@ 2015-02-18 13:46     ` Maxim Kuvyrkov
  2015-02-18 13:49       ` Kyrill Tkachov
  0 siblings, 1 reply; 7+ messages in thread
From: Maxim Kuvyrkov @ 2015-02-18 13:46 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> 
> On 18/02/15 12:32, Maxim Kuvyrkov wrote:
>> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> 
>>> Hi all,
>>> 
>>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
>>> pattern and its associated splitters. The problem is that for the 2nd
>>> alternative it will split a right-shift into a SISD left-shift by the negated
>>> amount to be shifted by (the ushl instruction allows such semantics).
>>> The splitter generates this RTL:
>>> 
>>> (set (match_dup 2)
>>>       (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>>> (set (match_dup 0)
>>>       (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>> 
>>> The problem here is that the shift amount register is negated without telling
>>> the register allocator about it (and it can't figure it out itself).
>>> So if you try to use the register that operand 2 is assigned to later on,
>>> you get the negated shift amount instead!
>>> 
>>> The testcase in the patch demonstrates the simple code that can get miscompiled
>>> due to this behaviour.
>>> 
>>> The solution in this patch is to negate the shift amount into the output
>>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>>> This is actually exactly what the very similar
>>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
>>> I believe this is the safest and simplest fix at this stage.
>>> 
>>> This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
>>> storm of costs and register pressure and ended up miscompiling
>>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>>> Linaro compiler, generating essentially code like:
>>> 
>>> .L141:
>>>    neg    d8, d8               //d8 negated!
>>>    ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>>    fmov    w0, s0
>>>        <...irrelevant code...>
>>>        b    .L140
>>> <...>
>>> .L140:
>>>    fmov    w0, s8  // s8/d8 used and incremented assuming it had not changed at L141
>>>    add    w0, w0, 1
>>>    fmov    s8, w0
>>>    fmov    w1, s10
>>>    cmp    w0, w1
>>>    bne    .L141
>>> 
>>> 
>>> Basically d8 is negated and later used as if it had not been at .L140 leading
>>> to completely wrong behaviour.
>>> 
>>> With this patch that particular part of the assembly now contains at L141:
>>>        neg     d0, d8
>>>        ushl    v0.2s, v11.2s, v0.2s
>>>        fmov    w0, s0
>>> 
>>> Leaving the original shift amount in d8 intact.
>>> 
>>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>>> pattern was introduced for 4.9)
>>> Bootstrapped and tested on trunk and 4.9.
>>> 
>>> Ok for trunk and 4.9?
>> First of all, applauses!  I realize how difficult it was to reduce this problem.
> 
> Thanks!
>> 
>> Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards.  In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).
>> 
>> Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch?  If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.
> 
> I considered it (but didn't try it) because we end up demanding a scratch register unnecessarily for the two alternatives that don't split which might pessimize register allocation.

That's not the case.  The "X" constraint in (match_scratch) is special; it tells RA to not allocate register.

> 
> For stage 4 I think my proposed fix is the minimal one and it keeps consistent with the other patterns in that area that were added all together with:
> https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html

I think this approach is OK, as long as we revisit the possibility of using SCRATCHes in these and similar patterns at stage 1.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 13:46     ` Maxim Kuvyrkov
@ 2015-02-18 13:49       ` Kyrill Tkachov
  0 siblings, 0 replies; 7+ messages in thread
From: Kyrill Tkachov @ 2015-02-18 13:49 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw


On 18/02/15 13:46, Maxim Kuvyrkov wrote:
> On Feb 18, 2015, at 4:42 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> On 18/02/15 12:32, Maxim Kuvyrkov wrote:
>>> On Feb 18, 2015, at 2:35 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This patch fixes a wrong-code bug with the *aarch64_lshr_sisd_or_int_<mode>3
>>>> pattern and its associated splitters. The problem is that for the 2nd
>>>> alternative it will split a right-shift into a SISD left-shift by the negated
>>>> amount to be shifted by (the ushl instruction allows such semantics).
>>>> The splitter generates this RTL:
>>>>
>>>> (set (match_dup 2)
>>>>        (unspec:QI [(match_dup 2)] UNSPEC_SISD_NEG))
>>>> (set (match_dup 0)
>>>>        (unspec:SI [(match_dup 1) (match_dup 2)] UNSPEC_USHL_2S))
>>>>
>>>> The problem here is that the shift amount register is negated without telling
>>>> the register allocator about it (and it can't figure it out itself).
>>>> So if you try to use the register that operand 2 is assigned to later on,
>>>> you get the negated shift amount instead!
>>>>
>>>> The testcase in the patch demonstrates the simple code that can get miscompiled
>>>> due to this behaviour.
>>>>
>>>> The solution in this patch is to negate the shift amount into the output
>>>> operand (operand 0) and mark it as an earlyclobber in that alternative.
>>>> This is actually exactly what the very similar
>>>> *aarch64_ashr_sisd_or_int_<mode>3 pattern does below.
>>>> I believe this is the safest and simplest fix at this stage.
>>>>
>>>> This bug was exposed on the Linaro 4.9 branch that happened to have the perfect
>>>> storm of costs and register pressure and ended up miscompiling
>>>> the TEST_BIT macro in ira-costs.c during a build of trunk by the generated
>>>> Linaro compiler, generating essentially code like:
>>>>
>>>> .L141:
>>>>     neg    d8, d8               //d8 negated!
>>>>     ushl    v0.2s, v11.2s, v8.2s // shift right => shift left by neg amount
>>>>     fmov    w0, s0
>>>>         <...irrelevant code...>
>>>>         b    .L140
>>>> <...>
>>>> .L140:
>>>>     fmov    w0, s8  // s8/d8 used and incremented assuming it had not changed at L141
>>>>     add    w0, w0, 1
>>>>     fmov    s8, w0
>>>>     fmov    w1, s10
>>>>     cmp    w0, w1
>>>>     bne    .L141
>>>>
>>>>
>>>> Basically d8 is negated and later used as if it had not been at .L140 leading
>>>> to completely wrong behaviour.
>>>>
>>>> With this patch that particular part of the assembly now contains at L141:
>>>>         neg     d0, d8
>>>>         ushl    v0.2s, v11.2s, v0.2s
>>>>         fmov    w0, s0
>>>>
>>>> Leaving the original shift amount in d8 intact.
>>>>
>>>> This bug occurs on FSF trunk and 4.9 branch (not on 4.8 as the offending
>>>> pattern was introduced for 4.9)
>>>> Bootstrapped and tested on trunk and 4.9.
>>>>
>>>> Ok for trunk and 4.9?
>>> First of all, applauses!  I realize how difficult it was to reduce this problem.
>> Thanks!
>>> Your patch looks OK to me, but I can't shake off feeling that it will pessimize cases when d8 is not used afterwards.  In particular, your patch makes it impossible to use same register for output (operand 0) and inputs (operands 1 and 2).
>>>
>>> Did you consider using SCRATCHes instead of re-using operand 0 with early clobber like in the attached [untested] patch?  If I got it all correct, register allocator will get more freedom in deciding which register to use for negated shift temporary, while still allowing reusing register from operand 0 for one of the inputs.
>> I considered it (but didn't try it) because we end up demanding a scratch register unnecessarily for the two alternatives that don't split which might pessimize register allocation.
> That's not the case.  The "X" constraint in (match_scratch) is special; it tells RA to not allocate register.
>
>> For stage 4 I think my proposed fix is the minimal one and it keeps consistent with the other patterns in that area that were added all together with:
>> https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01130.html
> I think this approach is OK, as long as we revisit the possibility of using SCRATCHes in these and similar patterns at stage 1.

Ok, these patterns could do with some refactoring anyway (I think 
merging some in define_insn_and_split could be done). We can look at 
them next stage1.

Thanks,
Kyrill

>
> Thanks,
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns
  2015-02-18 11:36 [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns Kyrill Tkachov
  2015-02-18 12:33 ` Maxim Kuvyrkov
@ 2015-02-20 14:03 ` Marcus Shawcroft
  1 sibling, 0 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2015-02-20 14:03 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, Maxim Kuvyrkov

On 18 February 2015 at 11:35, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> Ok for trunk and 4.9?
>
> Thanks,
> Kyrill
>
> 2015-02-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md (*aarch64_lshr_sisd_or_int_<mode>3):
>     Mark operand 0 as earlyclobber in 2nd alternative.
>     (1st define_split below *aarch64_lshr_sisd_or_int_<mode>3):
>     Write negated shift amount into QI lowpart operand 0 and use it
>     in the shift step.
>     (2nd define_split below *aarch64_lshr_sisd_or_int_<mode>3): Likewise.
>
> 2015-02-17  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/sisd-shft-neg_1.c: New test.

OK /Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-02-20 13:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 11:36 [PATCH][AArch64] Fix wrong-code bug in right-shift SISD patterns Kyrill Tkachov
2015-02-18 12:33 ` Maxim Kuvyrkov
2015-02-18 12:38   ` Maxim Kuvyrkov
2015-02-18 13:42   ` Kyrill Tkachov
2015-02-18 13:46     ` Maxim Kuvyrkov
2015-02-18 13:49       ` Kyrill Tkachov
2015-02-20 14:03 ` Marcus Shawcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).