public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx
@ 2015-01-09 13:45 Jiong Wang
  2015-01-09 21:18 ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Jiong Wang @ 2015-01-09 13:45 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2695 bytes --]

as reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64304

given the following test:

unsigned char byte = 0;

void set_bit(unsigned int bit, unsigned char value) {
     unsigned char mask = (unsigned char)(1 << (bit & 7));
     if (!value) {
         byte &= (unsigned char)~mask;
     } else {
         byte |= mask;
     }
}

we should generate something like:

   set_bit:
         and     w0, w0, 7
         mov     w2, 1
         lsl     w2, w2, w0

while we are generating
         mov     w2, 1
         lsl     w2, w2, w0


the necessary "and     w0, w0, 7" deleted wrongly.

that because
  
   (insn 2 5 3 2 (set (reg/v:SI 82 [ bit ])
         (reg:SI 0 x0 [ bit ])) bug.c:3 38 {*movsi_aarch64}
      (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
         (nil)))
   (insn 7 4 8 2 (set (reg:SI 84 [ D.1482 ])
         (and:SI (reg/v:SI 82 [ bit ])
             (const_int 7 [0x7]))) bug.c:4 399 {andsi3}
      (expr_list:REG_DEAD (reg/v:SI 82 [ bit ])
         (nil)))
   (insn 9 8 10 2 (set (reg:SI 85 [ D.1482 ])
         (ashift:SI (reg:SI 86)
             (subreg:QI (reg:SI 84 [ D.1482 ]) 0))) bug.c:4 539 {*aarch64_ashl_sisd_or_int_si3}
      (expr_list:REG_DEAD (reg:SI 86)
         (expr_list:REG_DEAD (reg:SI 84 [ D.1482 ])
             (expr_list:REG_EQUAL (ashift:SI (const_int 1 [0x1])
                     (subreg:QI (reg:SI 84 [ D.1482 ]) 0))
                 (nil)))))

are wrongly combined into

   (insn 9 8 10 2 (set (reg:QI 85 [ D.1482 ])
         (ashift:QI (subreg:QI (reg:SI 86) 0)
             (reg:QI 0 x0 [ bit ]))) bug.c:4 556 {*ashlqi3_insn}
      (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
         (expr_list:REG_DEAD (reg:SI 86)
             (nil))))

thus, the generated assembly is lack of the necessary "and w0, x0, 7".

the root cause is at one place in combine pass, we are passing wrong bitmask to force_to_mode.

in this particular case, for QI mode, we should pass (1 << 8 - 1), while we are passing (1 << 3 - 1),
thus the combiner think we only need the lower 3 bits, that X & 7 is unnecessary. While for QI mode, we
want the lower 8 bits. we should remove the exp operator.

this should be a historical bug in combine pass?? while it's only triggered for target
where SHIFT_COUNT_TRUNCATED be true. it's long time hiding mostly because x86/arm will
not trigger this part of code.

bootstrap on x86 and gcc check OK.
bootstrap on aarch64 and bare-metal regression OK.
ok for trunk?

gcc/
   PR64303
   * combine.c (combine_simplify_rtx): Correct the bitmask passed to force_to_mode.
   
gcc/testsuite/
   PR64303
   * gcc.target/aarch64/pr64304.c: New testcase.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix-combine.patch --]
[-- Type: text/x-patch; name=fix-combine.patch, Size: 1144 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index 1808f97..31a7fd0 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5922,7 +5922,7 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
 	SUBST (XEXP (x, 1),
 	       force_to_mode (XEXP (x, 1), GET_MODE (XEXP (x, 1)),
 			      ((unsigned HOST_WIDE_INT) 1
-			       << exact_log2 (GET_MODE_BITSIZE (GET_MODE (x))))
+			       << GET_MODE_BITSIZE (GET_MODE (x)))
 			      - 1,
 			      0));
       break;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr64304.c b/gcc/testsuite/gcc.target/aarch64/pr64304.c
new file mode 100644
index 0000000..5423bb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr64304.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 --save-temps" } */
+
+unsigned char byte = 0;
+
+void
+set_bit (unsigned int bit, unsigned char value)
+{
+  unsigned char mask = (unsigned char) (1 << (bit & 7));
+
+  if (! value)
+    byte &= (unsigned char)~mask;
+  else
+    byte |= mask;
+  /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, 7" } } */
+}
+
+/* { dg-final { cleanup-saved-temps } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx
  2015-01-09 13:45 [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx Jiong Wang
@ 2015-01-09 21:18 ` Jeff Law
  2015-01-09 21:53   ` Andrew Pinski
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2015-01-09 21:18 UTC (permalink / raw)
  To: Jiong Wang, gcc-patches

On 01/09/15 06:39, Jiong Wang wrote:
> as reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64304
>
> given the following test:
>
> unsigned char byte = 0;
>
> void set_bit(unsigned int bit, unsigned char value) {
>      unsigned char mask = (unsigned char)(1 << (bit & 7));
>      if (!value) {
>          byte &= (unsigned char)~mask;
>      } else {
>          byte |= mask;
>      }
> }
>
> we should generate something like:
>
>    set_bit:
>          and     w0, w0, 7
>          mov     w2, 1
>          lsl     w2, w2, w0
>
> while we are generating
>          mov     w2, 1
>          lsl     w2, w2, w0
>
>
> the necessary "and     w0, w0, 7" deleted wrongly.
>
> that because
>
>    (insn 2 5 3 2 (set (reg/v:SI 82 [ bit ])
>          (reg:SI 0 x0 [ bit ])) bug.c:3 38 {*movsi_aarch64}
>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>          (nil)))
>    (insn 7 4 8 2 (set (reg:SI 84 [ D.1482 ])
>          (and:SI (reg/v:SI 82 [ bit ])
>              (const_int 7 [0x7]))) bug.c:4 399 {andsi3}
>       (expr_list:REG_DEAD (reg/v:SI 82 [ bit ])
>          (nil)))
>    (insn 9 8 10 2 (set (reg:SI 85 [ D.1482 ])
>          (ashift:SI (reg:SI 86)
>              (subreg:QI (reg:SI 84 [ D.1482 ]) 0))) bug.c:4 539
> {*aarch64_ashl_sisd_or_int_si3}
>       (expr_list:REG_DEAD (reg:SI 86)
>          (expr_list:REG_DEAD (reg:SI 84 [ D.1482 ])
>              (expr_list:REG_EQUAL (ashift:SI (const_int 1 [0x1])
>                      (subreg:QI (reg:SI 84 [ D.1482 ]) 0))
>                  (nil)))))
>
> are wrongly combined into
>
>    (insn 9 8 10 2 (set (reg:QI 85 [ D.1482 ])
>          (ashift:QI (subreg:QI (reg:SI 86) 0)
>              (reg:QI 0 x0 [ bit ]))) bug.c:4 556 {*ashlqi3_insn}
>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>          (expr_list:REG_DEAD (reg:SI 86)
>              (nil))))
>
> thus, the generated assembly is lack of the necessary "and w0, x0, 7".
>
> the root cause is at one place in combine pass, we are passing wrong
> bitmask to force_to_mode.
>
> in this particular case, for QI mode, we should pass (1 << 8 - 1), while
> we are passing (1 << 3 - 1),
> thus the combiner think we only need the lower 3 bits, that X & 7 is
> unnecessary. While for QI mode, we
> want the lower 8 bits. we should remove the exp operator.
>
> this should be a historical bug in combine pass?? while it's only
> triggered for target
> where SHIFT_COUNT_TRUNCATED be true. it's long time hiding mostly
> because x86/arm will
> not trigger this part of code.
>
> bootstrap on x86 and gcc check OK.
> bootstrap on aarch64 and bare-metal regression OK.
> ok for trunk?
>
> gcc/
>    PR64303
>    * combine.c (combine_simplify_rtx): Correct the bitmask passed to
> force_to_mode.
> gcc/testsuite/
>    PR64303
>    * gcc.target/aarch64/pr64304.c: New testcase.
I don't think this is correct.

When I put a breakpoint on the code in question I see the following RTL 
prior to the call to DO_SUBST:

(ashift:QI (const_int 1 [0x1])
     (subreg:QI (and:SI (reg:SI 0 x0 [ bit ])
             (const_int 7 [0x7])) 0))


Note carefully the QImode for the ASHIFT.  That clearly indicates that 
just the low 8 bits are meaningful and on a SHIFT_COUNT_TRUNCATED target 
the masking of the count with 0x7 is redundant as the only valid shift 
counts are 0-7 (again because of the QImode for the ASHIFT).  Thus 
that's equivalent to:


(ashift:QI (const_int 1 [0x1])
     (reg:QI 0 x0 [ bit ]))


Similarly for the case:


(ashift:QI (subreg:QI (reg:SI 85) 0)
     (subreg:QI (and:SI (reg:SI 0 x0 [ bit ])
             (const_int 7 [0x7])) 0))


Again, QImode ASHIFT, so the masking of the shift count is redundant 
resulting in:

(ashift:QI (subreg:QI (reg:SI 85) 0)
     (reg:QI 0 x0 [ bit ]))


I think you need to do some further analysis.  Is it perhaps the case 
that SHIFT_COUNT_TRUNCATED is nonzero when in fact it should be zero?

Jeff


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx
  2015-01-09 21:18 ` Jeff Law
@ 2015-01-09 21:53   ` Andrew Pinski
  2015-01-09 22:40     ` Jiong Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Pinski @ 2015-01-09 21:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jiong Wang, gcc-patches

On Fri, Jan 9, 2015 at 12:40 PM, Jeff Law <law@redhat.com> wrote:
> On 01/09/15 06:39, Jiong Wang wrote:
>>
>> as reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64304
>>
>> given the following test:
>>
>> unsigned char byte = 0;
>>
>> void set_bit(unsigned int bit, unsigned char value) {
>>      unsigned char mask = (unsigned char)(1 << (bit & 7));
>>      if (!value) {
>>          byte &= (unsigned char)~mask;
>>      } else {
>>          byte |= mask;
>>      }
>> }
>>
>> we should generate something like:
>>
>>    set_bit:
>>          and     w0, w0, 7
>>          mov     w2, 1
>>          lsl     w2, w2, w0
>>
>> while we are generating
>>          mov     w2, 1
>>          lsl     w2, w2, w0
>>
>>
>> the necessary "and     w0, w0, 7" deleted wrongly.
>>
>> that because
>>
>>    (insn 2 5 3 2 (set (reg/v:SI 82 [ bit ])
>>          (reg:SI 0 x0 [ bit ])) bug.c:3 38 {*movsi_aarch64}
>>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>>          (nil)))
>>    (insn 7 4 8 2 (set (reg:SI 84 [ D.1482 ])
>>          (and:SI (reg/v:SI 82 [ bit ])
>>              (const_int 7 [0x7]))) bug.c:4 399 {andsi3}
>>       (expr_list:REG_DEAD (reg/v:SI 82 [ bit ])
>>          (nil)))
>>    (insn 9 8 10 2 (set (reg:SI 85 [ D.1482 ])
>>          (ashift:SI (reg:SI 86)
>>              (subreg:QI (reg:SI 84 [ D.1482 ]) 0))) bug.c:4 539
>> {*aarch64_ashl_sisd_or_int_si3}
>>       (expr_list:REG_DEAD (reg:SI 86)
>>          (expr_list:REG_DEAD (reg:SI 84 [ D.1482 ])
>>              (expr_list:REG_EQUAL (ashift:SI (const_int 1 [0x1])
>>                      (subreg:QI (reg:SI 84 [ D.1482 ]) 0))
>>                  (nil)))))
>>
>> are wrongly combined into
>>
>>    (insn 9 8 10 2 (set (reg:QI 85 [ D.1482 ])
>>          (ashift:QI (subreg:QI (reg:SI 86) 0)
>>              (reg:QI 0 x0 [ bit ]))) bug.c:4 556 {*ashlqi3_insn}
>>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>>          (expr_list:REG_DEAD (reg:SI 86)
>>              (nil))))
>>
>> thus, the generated assembly is lack of the necessary "and w0, x0, 7".
>>
>> the root cause is at one place in combine pass, we are passing wrong
>> bitmask to force_to_mode.
>>
>> in this particular case, for QI mode, we should pass (1 << 8 - 1), while
>> we are passing (1 << 3 - 1),
>> thus the combiner think we only need the lower 3 bits, that X & 7 is
>> unnecessary. While for QI mode, we
>> want the lower 8 bits. we should remove the exp operator.
>>
>> this should be a historical bug in combine pass?? while it's only
>> triggered for target
>> where SHIFT_COUNT_TRUNCATED be true. it's long time hiding mostly
>> because x86/arm will
>> not trigger this part of code.
>>
>> bootstrap on x86 and gcc check OK.
>> bootstrap on aarch64 and bare-metal regression OK.
>> ok for trunk?
>>
>> gcc/
>>    PR64303
>>    * combine.c (combine_simplify_rtx): Correct the bitmask passed to
>> force_to_mode.
>> gcc/testsuite/
>>    PR64303
>>    * gcc.target/aarch64/pr64304.c: New testcase.
>
> I don't think this is correct.
>
> When I put a breakpoint on the code in question I see the following RTL
> prior to the call to DO_SUBST:
>
> (ashift:QI (const_int 1 [0x1])
>     (subreg:QI (and:SI (reg:SI 0 x0 [ bit ])
>             (const_int 7 [0x7])) 0))
>
>
> Note carefully the QImode for the ASHIFT.  That clearly indicates that just
> the low 8 bits are meaningful and on a SHIFT_COUNT_TRUNCATED target the
> masking of the count with 0x7 is redundant as the only valid shift counts
> are 0-7 (again because of the QImode for the ASHIFT).  Thus that's
> equivalent to:
>
>
> (ashift:QI (const_int 1 [0x1])
>     (reg:QI 0 x0 [ bit ]))
>
>
> Similarly for the case:
>
>
> (ashift:QI (subreg:QI (reg:SI 85) 0)
>     (subreg:QI (and:SI (reg:SI 0 x0 [ bit ])
>             (const_int 7 [0x7])) 0))
>
>
> Again, QImode ASHIFT, so the masking of the shift count is redundant
> resulting in:
>
> (ashift:QI (subreg:QI (reg:SI 85) 0)
>     (reg:QI 0 x0 [ bit ]))
>
>
> I think you need to do some further analysis.  Is it perhaps the case that
> SHIFT_COUNT_TRUNCATED is nonzero when in fact it should be zero?

Jeff is correct here.  SHIFT_COUNT_TRUNCATED cannot be true if the
aarch64 back-end has shifts patterns for smaller than 32/64bit but the
aarch64 target only has shifts for 32 and 64bit.
The middle-end is doing the correct thing, with SHIFT_COUNT_TRUNCATED
true and a pattern for a QIshift, means the shifter does not to be
truncated before use.

Thanks,
Andrew Pinski

>
> Jeff
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx
  2015-01-09 21:53   ` Andrew Pinski
@ 2015-01-09 22:40     ` Jiong Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Jiong Wang @ 2015-01-09 22:40 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jeff Law, Jiong Wang, gcc-patches

2015-01-09 21:29 GMT+00:00 Andrew Pinski <pinskia@gmail.com>:
> On Fri, Jan 9, 2015 at 12:40 PM, Jeff Law <law@redhat.com> wrote:
>> On 01/09/15 06:39, Jiong Wang wrote:
>>>
>>> as reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64304
>>>
>>> given the following test:
>>>
>>> unsigned char byte = 0;
>>>
>>> void set_bit(unsigned int bit, unsigned char value) {
>>>      unsigned char mask = (unsigned char)(1 << (bit & 7));
>>>      if (!value) {
>>>          byte &= (unsigned char)~mask;
>>>      } else {
>>>          byte |= mask;
>>>      }
>>> }
>>>
>>> we should generate something like:
>>>
>>>    set_bit:
>>>          and     w0, w0, 7
>>>          mov     w2, 1
>>>          lsl     w2, w2, w0
>>>
>>> while we are generating
>>>          mov     w2, 1
>>>          lsl     w2, w2, w0
>>>
>>>
>>> the necessary "and     w0, w0, 7" deleted wrongly.
>>>
>>> that because
>>>
>>>    (insn 2 5 3 2 (set (reg/v:SI 82 [ bit ])
>>>          (reg:SI 0 x0 [ bit ])) bug.c:3 38 {*movsi_aarch64}
>>>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>>>          (nil)))
>>>    (insn 7 4 8 2 (set (reg:SI 84 [ D.1482 ])
>>>          (and:SI (reg/v:SI 82 [ bit ])
>>>              (const_int 7 [0x7]))) bug.c:4 399 {andsi3}
>>>       (expr_list:REG_DEAD (reg/v:SI 82 [ bit ])
>>>          (nil)))
>>>    (insn 9 8 10 2 (set (reg:SI 85 [ D.1482 ])
>>>          (ashift:SI (reg:SI 86)
>>>              (subreg:QI (reg:SI 84 [ D.1482 ]) 0))) bug.c:4 539
>>> {*aarch64_ashl_sisd_or_int_si3}
>>>       (expr_list:REG_DEAD (reg:SI 86)
>>>          (expr_list:REG_DEAD (reg:SI 84 [ D.1482 ])
>>>              (expr_list:REG_EQUAL (ashift:SI (const_int 1 [0x1])
>>>                      (subreg:QI (reg:SI 84 [ D.1482 ]) 0))
>>>                  (nil)))))
>>>
>>> are wrongly combined into
>>>
>>>    (insn 9 8 10 2 (set (reg:QI 85 [ D.1482 ])
>>>          (ashift:QI (subreg:QI (reg:SI 86) 0)
>>>              (reg:QI 0 x0 [ bit ]))) bug.c:4 556 {*ashlqi3_insn}
>>>       (expr_list:REG_DEAD (reg:SI 0 x0 [ bit ])
>>>          (expr_list:REG_DEAD (reg:SI 86)
>>>              (nil))))
>>>
>>> thus, the generated assembly is lack of the necessary "and w0, x0, 7".
>>>
>>> the root cause is at one place in combine pass, we are passing wrong
>>> bitmask to force_to_mode.
>>>
>>> in this particular case, for QI mode, we should pass (1 << 8 - 1), while
>>> we are passing (1 << 3 - 1),
>>> thus the combiner think we only need the lower 3 bits, that X & 7 is
>>> unnecessary. While for QI mode, we
>>> want the lower 8 bits. we should remove the exp operator.
>>>
>>> this should be a historical bug in combine pass?? while it's only
>>> triggered for target
>>> where SHIFT_COUNT_TRUNCATED be true. it's long time hiding mostly
>>> because x86/arm will
>>> not trigger this part of code.
>>>
>>> bootstrap on x86 and gcc check OK.
>>> bootstrap on aarch64 and bare-metal regression OK.
>>> ok for trunk?
>>>
>>> gcc/
>>>    PR64303
>>>    * combine.c (combine_simplify_rtx): Correct the bitmask passed to
>>> force_to_mode.
>>> gcc/testsuite/
>>>    PR64303
>>>    * gcc.target/aarch64/pr64304.c: New testcase.
>>
>> I don't think this is correct.
>>
>> When I put a breakpoint on the code in question I see the following RTL
>> prior to the call to DO_SUBST:
>>
>> (ashift:QI (const_int 1 [0x1])
>>     (subreg:QI (and:SI (reg:SI 0 x0 [ bit ])
>>             (const_int 7 [0x7])) 0))
>>
>>
>> Note carefully the QImode for the ASHIFT.  That clearly indicates that just
>> the low 8 bits are meaningful and on a SHIFT_COUNT_TRUNCATED target the
>> masking of the count with 0x7 is redundant as the only valid shift counts
>> are 0-7 (again because of the QImode for the ASHIFT).

Thanks for the explain, I have misunderstood some key points here.

> Jeff is correct here.  SHIFT_COUNT_TRUNCATED cannot be true if the
> aarch64 back-end has shifts patterns for smaller than 32/64bit but the
> aarch64 target only has shifts for 32 and 64bit.

Pinski, thanks for pointing this out.

Agree. AArch64 define shift pattern for SHORT type should be the problem.

Regards,
Jiong

> The middle-end is doing the correct thing, with SHIFT_COUNT_TRUNCATED
> true and a pattern for a QIshift, means the shifter does not to be
> truncated before use.
>
> Thanks,
> Andrew Pinski
>
>>
>> Jeff
>>
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-09 21:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-09 13:45 [Patch/combine] PR64304 wrong bitmask passed to force_to_mode in combine_simplify_rtx Jiong Wang
2015-01-09 21:18 ` Jeff Law
2015-01-09 21:53   ` Andrew Pinski
2015-01-09 22:40     ` Jiong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).