* The arm patch 161344 to transform TST into LSLS
@ 2010-07-02 0:53 Carrot Wei
2010-07-02 9:16 ` Richard Earnshaw
0 siblings, 1 reply; 4+ messages in thread
From: Carrot Wei @ 2010-07-02 0:53 UTC (permalink / raw)
To: Richard Earnshaw, gcc-patches
Hi Richard
The new peephole2 and the old pattern does different optimization. As
you have described the peephole2 can optimize the cases that test a
single bit in a word. But the old pattern tests if the bit fields at
the low end of a word is equal or not equal to zero, the bit field may
contain more than 1 bit. Interestingly the test case with the old
pattern can fit in both situations. If we change the test case as
following, it can show the regression.
struct A
{
int v:2;
};
int bar();
int foo(struct A* p)
{
if (p->v)
return 1;
return bar();
}
So we need another peephole2 to bring that optimization back.
thanks
Guozhi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The arm patch 161344 to transform TST into LSLS
2010-07-02 0:53 The arm patch 161344 to transform TST into LSLS Carrot Wei
@ 2010-07-02 9:16 ` Richard Earnshaw
2010-07-02 22:06 ` Carrot Wei
0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw @ 2010-07-02 9:16 UTC (permalink / raw)
To: Carrot Wei; +Cc: gcc-patches
On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
> Hi Richard
>
> The new peephole2 and the old pattern does different optimization. As
> you have described the peephole2 can optimize the cases that test a
> single bit in a word. But the old pattern tests if the bit fields at
> the low end of a word is equal or not equal to zero, the bit field may
> contain more than 1 bit. Interestingly the test case with the old
> pattern can fit in both situations. If we change the test case as
> following, it can show the regression.
>
> struct A
> {
> int v:2;
> };
>
>
> int bar();
> int foo(struct A* p)
> {
> if (p->v)
> return 1;
> return bar();
> }
>
> So we need another peephole2 to bring that optimization back.
>
> thanks
> Guozhi
Yes, a peep2 for that should be pretty straight-forward to generate.
Simply transform the code into a left-shift and compare with 0, then a
branch if eq/ne.
R.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The arm patch 161344 to transform TST into LSLS
2010-07-02 9:16 ` Richard Earnshaw
@ 2010-07-02 22:06 ` Carrot Wei
2010-07-03 9:18 ` Richard Earnshaw
0 siblings, 1 reply; 4+ messages in thread
From: Carrot Wei @ 2010-07-02 22:06 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: gcc-patches
Hi Richard
The following patch has been tested on arm qemu. Could you take a look?
thanks
Guozhi
ChangeLog:
2010-07-02 Wei Guozhi <carrot@google.com>
* thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
to lshift/compare): New.
Index: thumb2.md
===================================================================
--- thumb2.md (revision 161725)
+++ thumb2.md (working copy)
@@ -1501,3 +1501,29 @@
VOIDmode, operands[0], const0_rtx);
")
+(define_peephole2
+ [(set (match_operand:CC_NOOV 0 "cc_register" "")
+ (compare:CC_NOOV (zero_extract:SI
+ (match_operand:SI 1 "low_register_operand" "")
+ (match_operand:SI 2 "const_int_operand" "")
+ (const_int 0))
+ (const_int 0)))
+ (match_scratch:SI 3 "l")
+ (set (pc)
+ (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
+ [(match_dup 0) (const_int 0)])
+ (match_operand 5 "" "")
+ (match_operand 6 "" "")))]
+ "TARGET_THUMB2
+ && (INTVAL (operands[2]) > 0 && INTVAL (operands[2]) <= 32)"
+ [(parallel [(set (match_dup 0)
+ (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
+ (const_int 0)))
+ (clobber (match_dup 3))])
+ (set (pc)
+ (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
+ (match_dup 5) (match_dup 6)))]
+ "
+ operands[2] = GEN_INT (32 - INTVAL (operands[2]));
+ ")
+
On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>> Hi Richard
>>
>> The new peephole2 and the old pattern does different optimization. As
>> you have described the peephole2 can optimize the cases that test a
>> single bit in a word. But the old pattern tests if the bit fields at
>> the low end of a word is equal or not equal to zero, the bit field may
>> contain more than 1 bit. Interestingly the test case with the old
>> pattern can fit in both situations. If we change the test case as
>> following, it can show the regression.
>>
>> struct A
>> {
>> int v:2;
>> };
>>
>>
>> int bar();
>> int foo(struct A* p)
>> {
>> if (p->v)
>> return 1;
>> return bar();
>> }
>>
>> So we need another peephole2 to bring that optimization back.
>>
>> thanks
>> Guozhi
>
> Yes, a peep2 for that should be pretty straight-forward to generate.
> Simply transform the code into a left-shift and compare with 0, then a
> branch if eq/ne.
>
> R.
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The arm patch 161344 to transform TST into LSLS
2010-07-02 22:06 ` Carrot Wei
@ 2010-07-03 9:18 ` Richard Earnshaw
0 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw @ 2010-07-03 9:18 UTC (permalink / raw)
To: Carrot Wei; +Cc: Richard Earnshaw, gcc-patches
On 02/07/10 23:06, Carrot Wei wrote:
> Hi Richard
>
> The following patch has been tested on arm qemu. Could you take a look?
>
> thanks
> Guozhi
>
> ChangeLog:
> 2010-07-02 Wei Guozhi<carrot@google.com>
>
> * thumb2.md (peephole2 to convert zero_extract/compare of lowest bits
> to lshift/compare): New.
>
This is basically fine. But one minor nit.
A lsl ...,#0 in Thumb1 isn't a left shift at all, but a movs (and a
picky assembler might reject the construct, as some versions of the ARM
ARM state that the range of the shift should be in the range 1...31).
Fortunately, I doubt this pattern would ever be generated for that case,
since a zero_extract of all 32-bits of an SImode value would simplify
into the original operand.
So please change the limit on op2 to be less than 32.
OK with that change.
R.
>
> Index: thumb2.md
> ===================================================================
> --- thumb2.md ï¼revision 161725ï¼
> +++ thumb2.md ï¼working copyï¼
> @@ -1501,3 +1501,29 @@
> VOIDmode, operands[0], const0_rtx);
> ")
>
> +(define_peephole2
> + [(set (match_operand:CC_NOOV 0 "cc_register" "")
> + (compare:CC_NOOV (zero_extract:SI
> + (match_operand:SI 1 "low_register_operand" "")
> + (match_operand:SI 2 "const_int_operand" "")
> + (const_int 0))
> + (const_int 0)))
> + (match_scratch:SI 3 "l")
> + (set (pc)
> + (if_then_else (match_operator:CC_NOOV 4 "equality_operator"
> + [(match_dup 0) (const_int 0)])
> + (match_operand 5 "" "")
> + (match_operand 6 "" "")))]
> + "TARGET_THUMB2
> +&& (INTVAL (operands[2])> 0&& INTVAL (operands[2])<= 32)"
> + [(parallel [(set (match_dup 0)
> + (compare:CC_NOOV (ashift:SI (match_dup 1) (match_dup 2))
> + (const_int 0)))
> + (clobber (match_dup 3))])
> + (set (pc)
> + (if_then_else (match_op_dup 4 [(match_dup 0) (const_int 0)])
> + (match_dup 5) (match_dup 6)))]
> + "
> + operands[2] = GEN_INT (32 - INTVAL (operands[2]));
> + ")
> +
>
> On Fri, Jul 2, 2010 at 5:15 PM, Richard Earnshaw<rearnsha@arm.com> wrote:
>>
>> On Fri, 2010-07-02 at 08:53 +0800, Carrot Wei wrote:
>>> Hi Richard
>>>
>>> The new peephole2 and the old pattern does different optimization. As
>>> you have described the peephole2 can optimize the cases that test a
>>> single bit in a word. But the old pattern tests if the bit fields at
>>> the low end of a word is equal or not equal to zero, the bit field may
>>> contain more than 1 bit. Interestingly the test case with the old
>>> pattern can fit in both situations. If we change the test case as
>>> following, it can show the regression.
>>>
>>> struct A
>>> {
>>> int v:2;
>>> };
>>>
>>>
>>> int bar();
>>> int foo(struct A* p)
>>> {
>>> if (p->v)
>>> return 1;
>>> return bar();
>>> }
>>>
>>> So we need another peephole2 to bring that optimization back.
>>>
>>> thanks
>>> Guozhi
>>
>> Yes, a peep2 for that should be pretty straight-forward to generate.
>> Simply transform the code into a left-shift and compare with 0, then a
>> branch if eq/ne.
>>
>> R.
>>
>>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-07-03 9:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02 0:53 The arm patch 161344 to transform TST into LSLS Carrot Wei
2010-07-02 9:16 ` Richard Earnshaw
2010-07-02 22:06 ` Carrot Wei
2010-07-03 9:18 ` Richard Earnshaw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).