public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
@ 2012-07-03 11:29 Carrot Wei
  2012-07-17 13:47 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 6+ messages in thread
From: Carrot Wei @ 2012-07-03 11:29 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>> Hi
>>
>> This is the second part of the patches that deals with 64bit and. It directly
>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>> constant operands.
>>
>
> Comments about const_di_ok_for_op still apply from my review of your add patch.
>
> However I don't see why and /ior / xor with constants that have either
> the low or high parts set can't be expanded directly into ands of
> subregs with moves of zero's or the original value especially if you
> aren't looking at doing 64 bit operations in neon .With Neon being
> used for 64 bit arithmetic it gets more interesting.
>
> Finally this should target PR target/53189.
>

Hi Ramana

Thanks for the review. Following is the updated patch according to
your comments.

Tested on arm qemu with all arm/thumb neon/non-neon mode combination
without regression.

thanks
Carrot


2012-07-03  Wei Guozhi  <carrot@google.com>

	PR target/53189
	* gcc.target/arm/pr53189-1.c: New testcase.
	* gcc.target/arm/pr53189-2.c: New testcase.
	* gcc.target/arm/pr53189-3.c: New testcase.


2012-07-03  Wei Guozhi  <carrot@google.com>

	PR target/53189
	* config/arm/arm.c (const_ok_for_dimode_op): Handle AND op.
	* config/arm/constraints.md (De): New constraint.
	* config/arm/predicates.md (arm_anddi_operand): New predicate.
	(arm_immediate_anddi_operand): Likewise.
	(anddi_operand): Likewise.
	* config/arm/arm.md (arm_andsi3_insn): Optimization for special
	constants.
	(anddi3): Extend it to handle 64bit constants.
	(anddi3_insn): Likewise.
	* config/arm/neon.md (anddi3_neon): Likewise.



Index: testsuite/gcc.target/arm/pr53189-2.c
===================================================================
--- testsuite/gcc.target/arm/pr53189-2.c	(revision 0)
+++ testsuite/gcc.target/arm/pr53189-2.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-times "mov" 1 } } */
+/* { dg-final { scan-assembler-times "and" 1 } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x100000000;
+}
Index: testsuite/gcc.target/arm/pr53189-3.c
===================================================================
--- testsuite/gcc.target/arm/pr53189-3.c	(revision 0)
+++ testsuite/gcc.target/arm/pr53189-3.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler-times "and" 1 } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x1FFFFFFFF;
+}
Index: testsuite/gcc.target/arm/pr53189-1.c
===================================================================
--- testsuite/gcc.target/arm/pr53189-1.c	(revision 0)
+++ testsuite/gcc.target/arm/pr53189-1.c	(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-options "-O2" }  */
+/* { dg-require-effective-target arm32 } */
+/* { dg-final { scan-assembler-not "mov" } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x100000002;
+}
Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 189107)
+++ config/arm/arm.c	(working copy)
@@ -2524,6 +2524,10 @@
     case PLUS:
       return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);

+    case AND:
+      return ((const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val))
+	      && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val)));
+
     default:
       return 0;
     }
Index: config/arm/neon.md
===================================================================
--- config/arm/neon.md	(revision 189107)
+++ config/arm/neon.md	(working copy)
@@ -776,9 +776,9 @@
 )

 (define_insn "anddi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w")
-        (and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0")
-		(match_operand:DI 2 "neon_inv_logic_op2" "w,DL,r,r,w,DL")))]
+  [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w,?&r,?&r")
+        (and:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,r,w,0,0,r")
+		(match_operand:DI 2 "anddi_operand" "w,DL,r,r,w,DL,De,De")))]
   "TARGET_NEON"
 {
   switch (which_alternative)
@@ -790,12 +790,14 @@
     		     DImode, 1, VALID_NEON_QREG_MODE (DImode));
     case 2: return "#";
     case 3: return "#";
+    case 6: return "#";
+    case 7: return "#";
     default: gcc_unreachable ();
     }
 }
-  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
-   (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+  [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1,*,*")
+   (set_attr "length" "*,*,8,8,*,*,8,8")
+   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8,*,*")]
 )

 (define_insn "orn<mode>3_neon"
Index: config/arm/constraints.md
===================================================================
--- config/arm/constraints.md	(revision 189107)
+++ config/arm/constraints.md	(working copy)
@@ -31,7 +31,7 @@
 ;; 'H' was previously used for FPA.

 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, De, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py

@@ -248,6 +248,12 @@
  (and (match_code "const_int")
       (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, PLUS)")))

+(define_constraint "De"
+ "@internal
+  In ARM/Thumb-2 state a const_int that can be used by anddi3_insn."
+ (and (match_code "const_int")
+      (match_test "TARGET_32BIT && const_ok_for_dimode_op (ival, AND)")))
+
 (define_constraint "Di"
  "@internal
   In ARM/Thumb-2 state a const_int or const_double where both the high
Index: config/arm/predicates.md
===================================================================
--- config/arm/predicates.md	(revision 189107)
+++ config/arm/predicates.md	(working copy)
@@ -104,6 +104,14 @@
   (and (match_code "const_int,const_double")
        (match_test "arm_const_double_by_immediates (op)")))

+(define_predicate "arm_immediate_anddi_operand"
+  (and (match_code "const_int")
+       (match_test "const_ok_for_dimode_op (INTVAL (op), AND)")))
+
+(define_predicate "arm_anddi_operand"
+  (ior (match_operand 0 "arm_immediate_anddi_operand")
+       (match_operand 0 "s_register_operand")))
+
 (define_predicate "arm_neg_immediate_operand"
   (and (match_code "const_int")
        (match_test "const_ok_for_arm (-INTVAL (op))")))
@@ -524,6 +532,10 @@
   (ior (match_operand 0 "imm_for_neon_inv_logic_operand")
        (match_operand 0 "s_register_operand")))

+(define_predicate "anddi_operand"
+  (ior (match_operand 0 "neon_inv_logic_op2")
+       (match_operand 0 "arm_anddi_operand")))
+
 ;; TODO: We could check lane numbers more precisely based on the mode.
 (define_predicate "neon_lane_number"
   (and (match_code "const_int")
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 189107)
+++ config/arm/arm.md	(working copy)
@@ -2046,17 +2046,30 @@
 (define_expand "anddi3"
   [(set (match_operand:DI         0 "s_register_operand" "")
 	(and:DI (match_operand:DI 1 "s_register_operand" "")
-		(match_operand:DI 2 "neon_inv_logic_op2" "")))]
+		(match_operand:DI 2 "anddi_operand" "")))]
   "TARGET_32BIT"
   ""
 )

-(define_insn "*anddi3_insn"
-  [(set (match_operand:DI         0 "s_register_operand" "=&r,&r")
-	(and:DI (match_operand:DI 1 "s_register_operand"  "%0,r")
-		(match_operand:DI 2 "s_register_operand"   "r,r")))]
+(define_insn_and_split "*anddi3_insn"
+  [(set (match_operand:DI         0 "s_register_operand" "=&r,&r,&r,&r")
+	(and:DI (match_operand:DI 1 "s_register_operand"  "%0,r,0,r")
+		(match_operand:DI 2 "arm_anddi_operand"   "r,r,De,De")))]
   "TARGET_32BIT && !TARGET_IWMMXT && !TARGET_NEON"
   "#"
+  "TARGET_32BIT && !TARGET_IWMMXT && reload_completed
+   && !(TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
+  [(set (match_dup 0) (and:SI (match_dup 1) (match_dup 2)))
+   (set (match_dup 3) (and:SI (match_dup 4) (match_dup 5)))]
+  "
+  {
+    operands[3] = gen_highpart (SImode, operands[0]);
+    operands[0] = gen_lowpart (SImode, operands[0]);
+    operands[4] = gen_highpart (SImode, operands[1]);
+    operands[1] = gen_lowpart (SImode, operands[1]);
+    operands[5] = gen_highpart_mode (SImode, DImode, operands[2]);
+    operands[2] = gen_lowpart (SImode, operands[2]);
+  }"
   [(set_attr "length" "8")]
 )

@@ -2176,10 +2189,29 @@
 	(and:SI (match_operand:SI 1 "s_register_operand" "r,r,r")
 		(match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))]
   "TARGET_32BIT"
-  "@
-   and%?\\t%0, %1, %2
-   bic%?\\t%0, %1, #%B2
-   #"
+  "*
+  {
+    if (CONST_INT_P (operands[2]))
+      {
+	HOST_WIDE_INT i = INTVAL (operands[2]) & 0xFFFFFFFF;
+	if (i == 0)
+	  return \"mov%?\\t%0, #0\";
+	if (i == 0xFFFFFFFF)
+	  {
+	    if (!rtx_equal_p (operands[0], operands[1]))
+	      return \"mov%?\\t%0, %1\";
+	    else
+	      return \"\";
+	  }
+      }
+
+    switch (which_alternative)
+      {
+      case 0: return \"and%?\\t%0, %1, %2\";
+      case 1: return \"bic%?\\t%0, %1, #%B2\";
+      case 2: return \"#\";
+      }
+  }"
   "TARGET_32BIT
    && GET_CODE (operands[2]) == CONST_INT
    && !(const_ok_for_arm (INTVAL (operands[2]))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
  2012-07-03 11:29 [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant Carrot Wei
@ 2012-07-17 13:47 ` Ramana Radhakrishnan
  2012-07-18  8:20   ` Carrot Wei
  0 siblings, 1 reply; 6+ messages in thread
From: Ramana Radhakrishnan @ 2012-07-17 13:47 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches

Carrot,

Sorry about the delayed response.

On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>> Hi
>>>
>>> This is the second part of the patches that deals with 64bit and. It directly
>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>> constant operands.
>>>
>>
>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>
>> However I don't see why and /ior / xor with constants that have either
>> the low or high parts set can't be expanded directly into ands of
>> subregs with moves of zero's or the original value especially if you
>> aren't looking at doing 64 bit operations in neon .With Neon being
>> used for 64 bit arithmetic it gets more interesting.
>>
>> Finally this should target PR target/53189.
>>
>
> Hi Ramana
>
> Thanks for the review. Following is the updated patch according to
> your comments.

You've missed answering this part of my review :)

>> However I don't see why and /ior / xor with constants that have either
>> the low or high parts set can't be expanded directly into ands of
>> subregs with moves of zero's or the original value especially if you
>> aren't looking at doing 64 bit operations in neon .With Neon being
>> used for 64 bit arithmetic it gets more interesting.

Is there any reason why we don't split such cases earlier into the
constituent moves and the associated ands earlier than reload in the
non-Neon case?

 In addition, it would be good to have some tests for Thumb2 that deal
with the replicated constants for Thumb2 . Can you have a look at
creating some tests similar to the thumb2*replicated*.c tests in
gcc.target/arm but for 64 bit constants ?


regards,
Ramana

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
  2012-07-17 13:47 ` Ramana Radhakrishnan
@ 2012-07-18  8:20   ` Carrot Wei
  2012-07-18  9:39     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 6+ messages in thread
From: Carrot Wei @ 2012-07-18  8:20 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Carrot,
>
> Sorry about the delayed response.
>
> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>> <ramana.radhakrishnan@linaro.org> wrote:
>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>> Hi
>>>>
>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>> constant operands.
>>>>
>>>
>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>>>
>>> Finally this should target PR target/53189.
>>>
>>
>> Hi Ramana
>>
>> Thanks for the review. Following is the updated patch according to
>> your comments.
>
> You've missed answering this part of my review :)
>
>>> However I don't see why and /ior / xor with constants that have either
>>> the low or high parts set can't be expanded directly into ands of
>>> subregs with moves of zero's or the original value especially if you
>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>> used for 64 bit arithmetic it gets more interesting.
>
It has been handled by the const_ok_for_dimode_op and the output part
of corresponding SI mode insn. Let's take the IOR case as an example.

In the const_ok_for_dimode_op patch

--- arm.c	(revision 189278)
+++ arm.c	(working copy)
@@ -2524,6 +2524,16 @@
     case PLUS:
       return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);

+    case IOR:
+      if ((const_ok_for_arm (hi_val) || (hi_val == 0xFFFFFFFF))
+	  && (const_ok_for_arm (lo_val) || (lo_val == 0xFFFFFFFF)))
+	return 1;
+      if (TARGET_THUMB2
+	  && (const_ok_for_arm (lo_val) || const_ok_for_arm (~lo_val))
+	  && (const_ok_for_arm (hi_val) || const_ok_for_arm (~hi_val)))
+	return 1;
+      return 0;
+
     default:
       return 0;
     }

The 0xFFFFFFFF is not valid arm mode immediate, but ior 0XFFFFFFFF
results in all 1's, so it is also allowed in an iordi3 insn. And the
patch in iorsi3_insn pattern explicitly check the all 0's and all 1's
cases, and output either a simple register mov instruction or
instruction mov all 1's to the destination.

@@ -2902,10 +2915,29 @@
 	(ior:SI (match_operand:SI 1 "s_register_operand" "%r,r,r")
 		(match_operand:SI 2 "reg_or_int_operand" "rI,K,?n")))]
   "TARGET_32BIT"
-  "@
-   orr%?\\t%0, %1, %2
-   orn%?\\t%0, %1, #%B2
-   #"
+  "*
+  {
+    if (CONST_INT_P (operands[2]))
+      {
+	HOST_WIDE_INT i = INTVAL (operands[2]) & 0xFFFFFFFF;
+	if (i == 0xFFFFFFFF)
+	  return \"mvn%?\\t%0, #0\";
+	if (i == 0)
+	  {
+	    if (!rtx_equal_p (operands[0], operands[1]))
+	      return \"mov%?\\t%0, %1\";
+	    else
+	      return \"\";
+	  }
+      }
+
+    switch (which_alternative)
+      {
+      case 0: return \"orr%?\\t%0, %1, %2\";
+      case 1: return \"orn%?\\t%0, %1, #%B2\";
+      case 2: return \"#\";
+      }
+  }"
   "TARGET_32BIT
    && GET_CODE (operands[2]) == CONST_INT
    && !(const_ok_for_arm (INTVAL (operands[2]))


> Is there any reason why we don't split such cases earlier into the
> constituent moves and the associated ands earlier than reload in the
> non-Neon case?
>
I referenced pattern arm_adddi3 which is split after reload_completed.
And the pattern arm_subdi3 is even not split. I guess they keep the
original constant longer may benefit some optimizations involving
constants. But it may also lose flexibility in other cases.

>  In addition, it would be good to have some tests for Thumb2 that deal
> with the replicated constants for Thumb2 . Can you have a look at
> creating some tests similar to the thumb2*replicated*.c tests in
> gcc.target/arm but for 64 bit constants ?
>

The new test cases involving thumb2 replicated constants are added as following.

thanks
Carrot



2012-07-18  Wei Guozhi  <carrot@google.com>

	PR target/53189
	* gcc.target/arm/pr53189-10.c: New testcase.
	* gcc.target/arm/pr53189-11.c: New testcase.
	* gcc.target/arm/pr53189-12.c: New testcase.



Index: pr53189-10.c
===================================================================
--- pr53189-10.c	(revision 0)
+++ pr53189-10.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "and.*#-16843010" } } */
+
+void t0p(long long * p)
+{
+  *p &= 0x9fefefefe;
+}


Index: pr53189-11.c
===================================================================
--- pr53189-11.c	(revision 0)
+++ pr53189-11.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "eor.*#-1426019584" } } */
+
+void t0p(long long * p)
+{
+  *p ^= 0x7ab00ab00;
+}


Index: pr53189-12.c
===================================================================
--- pr53189-12.c	(revision 0)
+++ pr53189-12.c	(revision 0)
@@ -0,0 +1,9 @@
+/* { dg-options "-mthumb -O2" }  */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler-not "mov" } } */
+/* { dg-final { scan-assembler "orr.*#13435085" } } */
+
+void t0p(long long * p)
+{
+  *p |= 0x500cd00cd;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
  2012-07-18  8:20   ` Carrot Wei
@ 2012-07-18  9:39     ` Ramana Radhakrishnan
  2012-07-18 10:13       ` Carrot Wei
  0 siblings, 1 reply; 6+ messages in thread
From: Ramana Radhakrishnan @ 2012-07-18  9:39 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches

On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> Carrot,
>>
>> Sorry about the delayed response.
>>
>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>> Hi
>>>>>
>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>> constant operands.
>>>>>
>>>>
>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>
>>>> However I don't see why and /ior / xor with constants that have either
>>>> the low or high parts set can't be expanded directly into ands of
>>>> subregs with moves of zero's or the original value especially if you
>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>> used for 64 bit arithmetic it gets more interesting.
>>>>
>>>> Finally this should target PR target/53189.
>>>>
>>>
>>> Hi Ramana
>>>
>>> Thanks for the review. Following is the updated patch according to
>>> your comments.
>>
>> You've missed answering this part of my review :)
>>
>>>> However I don't see why and /ior / xor with constants that have either
>>>> the low or high parts set can't be expanded directly into ands of
>>>> subregs with moves of zero's or the original value especially if you
>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>> used for 64 bit arithmetic it gets more interesting.
>>
> It has been handled by the const_ok_for_dimode_op and the output part
> of corresponding SI mode insn. Let's take the IOR case as an example.
>

I noticed that - If I wasn't clear enough, the question was more
towards generating a subreg move at expand time rather than a split
and handling while outputting asm if you see what I mean.

regards,
Ramana

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
  2012-07-18  9:39     ` Ramana Radhakrishnan
@ 2012-07-18 10:13       ` Carrot Wei
  2012-08-03 18:58         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 6+ messages in thread
From: Carrot Wei @ 2012-07-18 10:13 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
>> <ramana.radhakrishnan@linaro.org> wrote:
>>> Carrot,
>>>
>>> Sorry about the delayed response.
>>>
>>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>>> Hi
>>>>>>
>>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>>> constant operands.
>>>>>>
>>>>>
>>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>>
>>>>> Finally this should target PR target/53189.
>>>>>
>>>>
>>>> Hi Ramana
>>>>
>>>> Thanks for the review. Following is the updated patch according to
>>>> your comments.
>>>
>>> You've missed answering this part of my review :)
>>>
>>>>> However I don't see why and /ior / xor with constants that have either
>>>>> the low or high parts set can't be expanded directly into ands of
>>>>> subregs with moves of zero's or the original value especially if you
>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>> used for 64 bit arithmetic it gets more interesting.
>>>
>> It has been handled by the const_ok_for_dimode_op and the output part
>> of corresponding SI mode insn. Let's take the IOR case as an example.
>>
>
> I noticed that - If I wasn't clear enough, the question was more
> towards generating a subreg move at expand time rather than a split
> and handling while outputting asm if you see what I mean.
>
I see your point now. I don't know how much better if we handle it
earlier. Even if I generates subreg move for non-neon code at expand
time, the latter output handling is still necessary for neon insns. Do
you think it deserves the extra expand handling?

thanks
Carrot

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant
  2012-07-18 10:13       ` Carrot Wei
@ 2012-08-03 18:58         ` Ramana Radhakrishnan
  0 siblings, 0 replies; 6+ messages in thread
From: Ramana Radhakrishnan @ 2012-08-03 18:58 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches

On 18 July 2012 11:12, Carrot Wei <carrot@google.com> wrote:
> On Wed, Jul 18, 2012 at 5:39 PM, Ramana Radhakrishnan
> <ramana.radhakrishnan@linaro.org> wrote:
>> On 18 July 2012 09:20, Carrot Wei <carrot@google.com> wrote:
>>> On Tue, Jul 17, 2012 at 9:47 PM, Ramana Radhakrishnan
>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>> Carrot,
>>>>
>>>> Sorry about the delayed response.
>>>>
>>>> On 3 July 2012 12:28, Carrot Wei <carrot@google.com> wrote:
>>>>> On Thu, Jun 28, 2012 at 12:14 AM, Ramana Radhakrishnan
>>>>> <ramana.radhakrishnan@linaro.org> wrote:
>>>>>> On 28 May 2012 11:08, Carrot Wei <carrot@google.com> wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> This is the second part of the patches that deals with 64bit and. It directly
>>>>>>> extends the patterns anddi3, anddi3_insn and anddi3_neon to handle 64bit
>>>>>>> constant operands.
>>>>>>>
>>>>>>
>>>>>> Comments about const_di_ok_for_op still apply from my review of your add patch.
>>>>>>
>>>>>> However I don't see why and /ior / xor with constants that have either
>>>>>> the low or high parts set can't be expanded directly into ands of
>>>>>> subregs with moves of zero's or the original value especially if you
>>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>>>
>>>>>> Finally this should target PR target/53189.
>>>>>>
>>>>>
>>>>> Hi Ramana
>>>>>
>>>>> Thanks for the review. Following is the updated patch according to
>>>>> your comments.
>>>>
>>>> You've missed answering this part of my review :)
>>>>
>>>>>> However I don't see why and /ior / xor with constants that have either
>>>>>> the low or high parts set can't be expanded directly into ands of
>>>>>> subregs with moves of zero's or the original value especially if you
>>>>>> aren't looking at doing 64 bit operations in neon .With Neon being
>>>>>> used for 64 bit arithmetic it gets more interesting.
>>>>
>>> It has been handled by the const_ok_for_dimode_op and the output part
>>> of corresponding SI mode insn. Let's take the IOR case as an example.
>>>
>>
>> I noticed that - If I wasn't clear enough, the question was more
>> towards generating a subreg move at expand time rather than a split
>> and handling while outputting asm if you see what I mean.
>>
> I see your point now. I don't know how much better if we handle it
> earlier. Even if I generates subreg move for non-neon code at expand
> time, the latter output handling is still necessary for neon insns.

Just a quick note to let you know that I had a look this week and I'd
rather have the splitters deal with the idioms properly rather than
change the SImode patterns to deal with mov's . In theory with the
proper idiom type splitting dead loads can disappear and a number of
your tests that do and with the upper 32 bits as 0 don't really need a
load from memory and so on. I'd rather have it split before reload in
that form if possible. The neon case is something I don't yet have an
answer to but I'm gonna take a look.



regards,
ramana

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-08-03 18:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-03 11:29 [ARM Patch 1/3]PR53189: optimizations of 64bit logic operation with constant Carrot Wei
2012-07-17 13:47 ` Ramana Radhakrishnan
2012-07-18  8:20   ` Carrot Wei
2012-07-18  9:39     ` Ramana Radhakrishnan
2012-07-18 10:13       ` Carrot Wei
2012-08-03 18:58         ` Ramana Radhakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).