public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* The arm patch 161344 to transform TST into LSLS
@ 2010-07-02  0:53 Carrot Wei
  2010-07-02  9:16 ` Richard Earnshaw
  0 siblings, 1 reply; 4+ messages in thread
From: Carrot Wei @ 2010-07-02  0:53 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches

Hi Richard

The new peephole2 and the old pattern does different optimization. As
you have described the peephole2 can optimize the cases that test a
single bit in a word. But the old pattern tests if the bit fields at
the low end of a word is equal or not equal to zero, the bit field may
contain more than 1 bit. Interestingly the test case with the old
pattern can fit in both situations. If we change the test case as
following, it can show the regression.

struct A
{
  int v:2;
};


int bar();
int foo(struct A* p)
{
  if (p->v)
    return 1;
  return bar();
}

So we need another peephole2 to bring that optimization back.

thanks
Guozhi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The arm patch 161344 to transform TST into LSLS
  2010-07-02  0:53 The arm patch 161344 to transform TST into LSLS Carrot Wei
@ 2010-07-02  9:16 ` Richard Earnshaw
  2010-07-02 22:06   ` Carrot Wei
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw @ 2010-07-02  9:16 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches


On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
> Hi Richard
> 
> The new peephole2 and the old pattern does different optimization. As
> you have described the peephole2 can optimize the cases that test a
> single bit in a word. But the old pattern tests if the bit fields at
> the low end of a word is equal or not equal to zero, the bit field may
> contain more than 1 bit. Interestingly the test case with the old
> pattern can fit in both situations. If we change the test case as
> following, it can show the regression.
> 
> struct A
> {
>   int v:2;
> };
> 
> 
> int bar();
> int foo(struct A* p)
> {
>   if (p->v)
>     return 1;
>   return bar();
> }
> 
> So we need another peephole2 to bring that optimization back.
> 
> thanks
> Guozhi

Yes, a peep2 for that should be pretty straight-forward to generate.
Simply transform the code into a left-shift and compare with 0, then a
branch if eq/ne.  

R.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The arm patch 161344 to transform TST into LSLS
  2010-07-02  9:16 ` Richard Earnshaw
@ 2010-07-02 22:06   ` Carrot Wei
  2010-07-03  9:18     ` Richard Earnshaw
  0 siblings, 1 reply; 4+ messages in thread
From: Carrot Wei @ 2010-07-02 22:06 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

Hi Richard

The following patch has been tested on arm qemu. Could you take a look?

thanks
Guozhi

ChangeLog:
2010-07-02  Wei Guozhi  <carrot@google.com>

        * thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
        to lshift/compare): New.


Index: thumb2.md
===================================================================
--- thumb2.md   (revision 161725)
+++ thumb2.md   (working copy)
@@ -1501,3 +1501,29 @@
                                VOIDmode, operands[0], const0_rtx);
   ")

+(define_peephole2
+  [(set (match_operand:CC_NOOV 0 "cc_register" "")
+       (compare:CC_NOOV (zero_extract:SI
+                         (match_operand:SI 1 "low_register_operand" "")
+                         (match_operand:SI 2 "const_int_operand" "")
+                         (const_int 0))
+                        (const_int 0)))
+   (match_scratch:SI 3 "l")
+   (set (pc)
+       (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
+                      [(match_dup 0) (const_int 0)])
+                     (match_operand 5 "" "")
+                     (match_operand 6 "" "")))]
+  "TARGET_THUMB2
+   && (INTVAL (operands[2]) > 0 && INTVAL (operands[2]) <= 32)"
+  [(parallel [(set (match_dup 0)
+                  (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
+                                   (const_int 0)))
+             (clobber (match_dup 3))])
+   (set (pc)
+       (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
+                     (match_dup 5) (match_dup 6)))]
+  "
+  operands[2] = GEN_INT (32 - INTVAL (operands[2]));
+  ")
+

On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>> Hi Richard
>>
>> The new peephole2 and the old pattern does different optimization. As
>> you have described the peephole2 can optimize the cases that test a
>> single bit in a word. But the old pattern tests if the bit fields at
>> the low end of a word is equal or not equal to zero, the bit field may
>> contain more than 1 bit. Interestingly the test case with the old
>> pattern can fit in both situations. If we change the test case as
>> following, it can show the regression.
>>
>> struct A
>> {
>>   int v:2;
>> };
>>
>>
>> int bar();
>> int foo(struct A* p)
>> {
>>   if (p->v)
>>     return 1;
>>   return bar();
>> }
>>
>> So we need another peephole2 to bring that optimization back.
>>
>> thanks
>> Guozhi
>
> Yes, a peep2 for that should be pretty straight-forward to generate.
> Simply transform the code into a left-shift and compare with 0, then a
> branch if eq/ne.
>
> R.
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The arm patch 161344 to transform TST into LSLS
  2010-07-02 22:06   ` Carrot Wei
@ 2010-07-03  9:18     ` Richard Earnshaw
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw @ 2010-07-03  9:18 UTC (permalink / raw)
  To: Carrot Wei; +Cc: Richard Earnshaw, gcc-patches

On 02/07/10 23:06, Carrot Wei wrote:
> Hi Richard
>
> The following patch has been tested on arm qemu. Could you take a look?
>
> thanks
> Guozhi
>
> ChangeLog:
> 2010-07-02  Wei Guozhi<carrot@google.com>
>
>          * thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
>          to lshift/compare): New.
>

This is basically fine.  But one minor nit.

A lsl ...,#0 in Thumb1 isn't a left shift at all, but a movs (and a 
picky assembler might reject the construct, as some versions of the ARM 
ARM state that the range of the shift should be in the range 1...31). 
Fortunately, I doubt this pattern would ever be generated for that case, 
since a zero_extract of all 32-bits of an SImode value would simplify 
into the original operand.

So please change the limit on op2 to be less than 32.

OK with that change.

R.
>
> Index: thumb2.md
> ===================================================================
> --- thumb2.md   (revision 161725)
> +++ thumb2.md   (working copy)
> @@ -1501,3 +1501,29 @@
>                                  VOIDmode, operands[0], const0_rtx);
>     ")
>
> +(define_peephole2
> +  [(set (match_operand:CC_NOOV 0 "cc_register" "")
> +       (compare:CC_NOOV (zero_extract:SI
> +                         (match_operand:SI 1 "low_register_operand" "")
> +                         (match_operand:SI 2 "const_int_operand" "")
> +                         (const_int 0))
> +                        (const_int 0)))
> +   (match_scratch:SI 3 "l")
> +   (set (pc)
> +       (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
> +                      [(match_dup 0) (const_int 0)])
> +                     (match_operand 5 "" "")
> +                     (match_operand 6 "" "")))]
> +  "TARGET_THUMB2
> +&&  (INTVAL (operands[2])>  0&&  INTVAL (operands[2])<= 32)"
> +  [(parallel [(set (match_dup 0)
> +                  (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
> +                                   (const_int 0)))
> +             (clobber (match_dup 3))])
> +   (set (pc)
> +       (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
> +                     (match_dup 5) (match_dup 6)))]
> +  "
> +  operands[2] = GEN_INT (32 - INTVAL (operands[2]));
> +  ")
> +
>
> On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw<rearnsha@arm.com>  wrote:
>>
>> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>>> Hi Richard
>>>
>>> The new peephole2 and the old pattern does different optimization. As
>>> you have described the peephole2 can optimize the cases that test a
>>> single bit in a word. But the old pattern tests if the bit fields at
>>> the low end of a word is equal or not equal to zero, the bit field may
>>> contain more than 1 bit. Interestingly the test case with the old
>>> pattern can fit in both situations. If we change the test case as
>>> following, it can show the regression.
>>>
>>> struct A
>>> {
>>>    int v:2;
>>> };
>>>
>>>
>>> int bar();
>>> int foo(struct A* p)
>>> {
>>>    if (p->v)
>>>      return 1;
>>>    return bar();
>>> }
>>>
>>> So we need another peephole2 to bring that optimization back.
>>>
>>> thanks
>>> Guozhi
>>
>> Yes, a peep2 for that should be pretty straight-forward to generate.
>> Simply transform the code into a left-shift and compare with 0, then a
>> branch if eq/ne.
>>
>> R.
>>
>>
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-07-03  9:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02  0:53 The arm patch 161344 to transform TST into LSLS Carrot Wei
2010-07-02  9:16 ` Richard Earnshaw
2010-07-02 22:06   ` Carrot Wei
2010-07-03  9:18     ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).