[patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
@ 2010-12-09 16:45 Andrew Stubbs
  2011-01-27  2:09 ` Ramana Radhakrishnan
  2011-01-27 19:22 ` Richard Earnshaw
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Stubbs @ 2010-12-09 16:45 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

The attached patch fixes a bug that prevented smlalbb being generated, 
even though it was present in the machine description.

The problem was that the MULT portion of the maddhidi4 pattern did not 
match the mulhisi3 pattern, and so combine could not build matching 
patterns.

My patch simply adjusts the (internal) modes and extends in the 
maddhidi4 pattern so that they match. The externally visible modes 
remain unaltered.

Test case:

   long long foolong (long long x, short *a, short *b)
   {
       return x + *a * *b;
   }

Before, compiled with "-O2 -mcpu=cortex-a8 -mthumb":

           ldrh    r2, [r2, #0]
           ldrh    r3, [r3, #0]
           smulbb  r3, r2, r3
           adds    r0, r0, r3
           adc     r1, r1, r3, asr #31
           bx      lr

The compiler is forced to do the multiply and addition operation separately.

After:
           ldrh    r2, [r2, #0]
           ldrh    r3, [r3, #0]
           smlalbb r0, r1, r2, r3
           bx      lr


OK to commit, once stage 1 opens again?

Andrew


[-- Attachment #2: arm-maddhidi4.patch --]
[-- Type: text/x-patch, Size: 1737 bytes --]

2010-12-09  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (*maddhidi4): Make mult mode match
	mulhisi3 for combine.

	gcc/testsuite/
	* gcc.target/arm/smlalbb.c: New file.

---
 src/gcc-mainline/gcc/config/arm/arm.md             |    9 +++++----
 .../gcc/testsuite/gcc.target/arm/smlalbb.c         |    8 ++++++++
 2 files changed, 13 insertions(+), 4 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/smlalbb.c

diff --git a/src/gcc-mainline/gcc/config/arm/arm.md b/src/gcc-mainline/gcc/config/arm/arm.md
index c816126..dc9da52 100644
--- a/src/gcc-mainline/gcc/config/arm/arm.md
+++ b/src/gcc-mainline/gcc/config/arm/arm.md
@@ -1814,10 +1814,11 @@
 (define_insn "*maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
-	  (mult:DI (sign_extend:DI
-	 	    (match_operand:HI 1 "s_register_operand" "%r"))
-		   (sign_extend:DI
-		    (match_operand:HI 2 "s_register_operand" "r")))
+	  (sign_extend:DI
+	   (mult:SI (sign_extend:SI
+		     (match_operand:HI 1 "s_register_operand" "%r"))
+		    (sign_extend:SI
+		     (match_operand:HI 2 "s_register_operand" "r"))))
 	  (match_operand:DI 3 "s_register_operand" "0")))]
   "TARGET_DSP_MULTIPLY"
   "smlalbb%?\\t%Q0, %R0, %1, %2"
diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/smlalbb.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/smlalbb.c
new file mode 100644
index 0000000..0e4dd70
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/smlalbb.c
@@ -0,0 +1,8 @@
+/* Ensure the smlalbb pattern works.  */
+/* { dg-options "-O2 -march=armv7-a" }  */
+/* { dg-final { scan-assembler "smlalbb" } } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2010-12-09 16:45 [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate Andrew Stubbs
@ 2011-01-27  2:09 ` Ramana Radhakrishnan
  2011-01-27 10:46   ` Andrew Stubbs
  2011-01-27 19:22 ` Richard Earnshaw
  1 sibling, 1 reply; 27+ messages in thread
From: Ramana Radhakrishnan @ 2011-01-27  2:09 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches

Hi Andrew,



>
> OK to commit, once stage 1 opens again?

I can't approve or reject your patch but is there any reason why this
shouldn't be covered under the same rules as the smulbb patch that you
submitted around the same time ?

http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00772.html

Ramana

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-27  2:09 ` Ramana Radhakrishnan
@ 2011-01-27 10:46   ` Andrew Stubbs
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Stubbs @ 2011-01-27 10:46 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches

On 27/01/11 01:32, Ramana Radhakrishnan wrote:
> I can't approve or reject your patch but is there any reason why this
> shouldn't be covered under the same rules as the smulbb patch that you
> submitted around the same time ?
>
> http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00772.html

Only that that was stage 3, and we're now in stage 4, I think?

Otherwise it's exactly the same sort of issue.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2010-12-09 16:45 [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate Andrew Stubbs
  2011-01-27  2:09 ` Ramana Radhakrishnan
@ 2011-01-27 19:22 ` Richard Earnshaw
  2011-01-28 13:06   ` Andrew Stubbs
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Earnshaw @ 2011-01-27 19:22 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches


On Thu, 2010-12-09 at 16:19 +0000, Andrew Stubbs wrote:
> The attached patch fixes a bug that prevented smlalbb being generated, 
> even though it was present in the machine description.
> 
> The problem was that the MULT portion of the maddhidi4 pattern did not 
> match the mulhisi3 pattern, and so combine could not build matching 
> patterns.
> 
> My patch simply adjusts the (internal) modes and extends in the 
> maddhidi4 pattern so that they match. The externally visible modes 
> remain unaltered.
> 
> Test case:
> 
>    long long foolong (long long x, short *a, short *b)
>    {
>        return x + *a * *b;
>    }
> 
> Before, compiled with "-O2 -mcpu=cortex-a8 -mthumb":
> 
>            ldrh    r2, [r2, #0]
>            ldrh    r3, [r3, #0]
>            smulbb  r3, r2, r3
>            adds    r0, r0, r3
>            adc     r1, r1, r3, asr #31
>            bx      lr
> 
> The compiler is forced to do the multiply and addition operation separately.
> 
> After:
>            ldrh    r2, [r2, #0]
>            ldrh    r3, [r3, #0]
>            smlalbb r0, r1, r2, r3
>            bx      lr
> 
> 
> OK to commit, once stage 1 opens again?
> 
> Andrew

I'm worried about this change.  The proposed RTL might be what GCC
generates for this particular case, but the pattern hardly looks
canonical: a sign-extend of a multiply that contained sign-extended
operands.

Are you sure that fixing this particular case doesn't introduce a
regression elsewhere?

Alternatively, is there some part of the manual that describes the form
you have as the correct canonical form?

R.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-27 19:22 ` Richard Earnshaw
@ 2011-01-28 13:06   ` Andrew Stubbs
  2011-01-28 15:08     ` Richard Earnshaw
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-01-28 13:06 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On 27/01/11 18:40, Richard Earnshaw wrote:
> I'm worried about this change.  The proposed RTL might be what GCC
> generates for this particular case, but the pattern hardly looks
> canonical: a sign-extend of a multiply that contained sign-extended
> operands.
>
> Are you sure that fixing this particular case doesn't introduce a
> regression elsewhere?
>
> Alternatively, is there some part of the manual that describes the form
> you have as the correct canonical form?

I don't think I said it was the canonical form (although the other 
similar smulbb patch was about that), but if it weren't in the canonical 
form then combine would not work (which it does with my patch), so I 
don't believe that's the problem here.

Anyway, combine only canonicalizes the *order* or the expressions, not 
the modes, which is what I'm adjusting here. Or have I missed something?

The problem here is that the expand pass used mulhisi3 which has the 
'internal' modes set one way, and then combine tries to insert that into 
a multiply-and-accumulate form, but recog rejects it because the 
'internal' modes of maddhidi4 are set a different way.

Now, mulhisi3 is currently written using MULT:SI mode, and I can't see 
how this could be any other way - it's an SI mode insn. If we want to be 
able to insert that into maddhidi4, and we do, I don't see any way to do 
that if it uses a MULT:DI - it just isn't compatible.

I have regression tested the patch for correctness, but I haven't proven 
that I haven't produced worse code in some other case.

I could introduce a second pattern, say "*maddhidi4_2", so that both 
patterns match and then there could be no regressions, right? However, I 
don't believe it's necessary.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-28 13:06   ` Andrew Stubbs
@ 2011-01-28 15:08     ` Richard Earnshaw
  2011-01-28 15:55       ` Andrew Stubbs
  2011-02-09  6:31       ` [patch][ARM] " Hans-Peter Nilsson
  0 siblings, 2 replies; 27+ messages in thread
From: Richard Earnshaw @ 2011-01-28 15:08 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches


On Fri, 2011-01-28 at 10:07 +0000, Andrew Stubbs wrote:
> On 27/01/11 18:40, Richard Earnshaw wrote:
> > I'm worried about this change.  The proposed RTL might be what GCC
> > generates for this particular case, but the pattern hardly looks
> > canonical: a sign-extend of a multiply that contained sign-extended
> > operands.
> >
> > Are you sure that fixing this particular case doesn't introduce a
> > regression elsewhere?
> >
> > Alternatively, is there some part of the manual that describes the form
> > you have as the correct canonical form?
> 
> I don't think I said it was the canonical form (although the other 
> similar smulbb patch was about that), but if it weren't in the canonical 
> form then combine would not work (which it does with my patch), so I 
> don't believe that's the problem here.
> 
> Anyway, combine only canonicalizes the *order* or the expressions, not 
> the modes, which is what I'm adjusting here. Or have I missed something?
> 

Canonical RTL is not just about the order of the operands its about not
having multiple ways of representing the same operation.  Order is part
of that, but not the end of it.  For example, the canonical form of 

	(lt:SI (reg:SI) (const_int 0))

is
	(lshiftrt:SI (reg:SI) (const_int 31))

As many machines have an instruction for the latter but not the former.

My question really, is about whether 

	(sign_extend:DI (mult:SI (sign_extend:SI (reg:HI))
				 (sign_extend:SI (reg:HI)))))

should be canonical, or whether

	(mult:DI (sign_extend:DI (reg:HI)
		 (sign_extend:DI (reg:HI))

is a better expression of this operation.

Ultimately this comes down to what is the correct thing to do in GCC.
If its the latter, then simplify RTX may need some work to make it do
the right thing, rather than tweaking the ARM back-end.  

> The problem here is that the expand pass used mulhisi3 which has the 
> 'internal' modes set one way, and then combine tries to insert that into 
> a multiply-and-accumulate form, but recog rejects it because the 
> 'internal' modes of maddhidi4 are set a different way.
> 
> Now, mulhisi3 is currently written using MULT:SI mode, and I can't see 
> how this could be any other way - it's an SI mode insn. If we want to be 
> able to insert that into maddhidi4, and we do, I don't see any way to do 
> that if it uses a MULT:DI - it just isn't compatible.
> 
> I have regression tested the patch for correctness, but I haven't proven 
> that I haven't produced worse code in some other case.
> 
> I could introduce a second pattern, say "*maddhidi4_2", so that both 
> patterns match and then there could be no regressions, right? However, I 
> don't believe it's necessary.
> 

So what happens to a variation of your testcase:

long long foolong (long long x, short *a, short *b)
   {
       return x + (long long)*a * (long long)*b;
   }

With your patch?  This should generate identical code to your original
test-case.

R.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-28 15:08     ` Richard Earnshaw
@ 2011-01-28 15:55       ` Andrew Stubbs
  2011-01-28 16:01         ` Richard Earnshaw
  2011-02-09  6:31       ` [patch][ARM] " Hans-Peter Nilsson
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-01-28 15:55 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On 28/01/11 14:12, Richard Earnshaw wrote:
> So what happens to a variation of your testcase:
>
> long long foolong (long long x, short *a, short *b)
>     {
>         return x + (long long)*a * (long long)*b;
>     }
>
> With your patch?  This should generate identical code to your original
> test-case.
>

The patch has no effect on that testcase - it's broken in some other 
way, I think, and the same with and without my patch:

         ldrsh   r3, [r3, #0]
         ldrsh   r2, [r2, #0]
         push    {r4, r5}
         asrs    r4, r3, #31
         asrs    r5, r2, #31
         mul     r4, r2, r4
         mla     r4, r3, r5, r4
         umull   r2, r3, r2, r3
         adds    r3, r4, r3
         adds    r0, r0, r2
         adc     r1, r1, r3
         pop     {r4, r5}
         bx      lr

Hmmm, that probably doesn't add anything useful to the discussion. :(

I'll add that one to the todo list ...

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-28 15:55       ` Andrew Stubbs
@ 2011-01-28 16:01         ` Richard Earnshaw
  2011-03-25 16:27           ` Andrew Stubbs
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Earnshaw @ 2011-01-28 16:01 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc-patches


On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:
> On 28/01/11 14:12, Richard Earnshaw wrote:
> > So what happens to a variation of your testcase:
> >
> > long long foolong (long long x, short *a, short *b)
> >     {
> >         return x + (long long)*a * (long long)*b;
> >     }
> >
> > With your patch?  This should generate identical code to your original
> > test-case.
> >
> 
> The patch has no effect on that testcase - it's broken in some other 
> way, I think, and the same with and without my patch:
> 
>          ldrsh   r3, [r3, #0]
>          ldrsh   r2, [r2, #0]
>          push    {r4, r5}
>          asrs    r4, r3, #31
>          asrs    r5, r2, #31
>          mul     r4, r2, r4
>          mla     r4, r3, r5, r4
>          umull   r2, r3, r2, r3
>          adds    r3, r4, r3
>          adds    r0, r0, r2
>          adc     r1, r1, r3
>          pop     {r4, r5}
>          bx      lr
> 
> Hmmm, that probably doesn't add anything useful to the discussion. :(
> 
> I'll add that one to the todo list ...
> 
> Andrew
> 

Ouch!  I though that used to work :-(

R.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-28 16:01         ` Richard Earnshaw
@ 2011-03-25 16:27           ` Andrew Stubbs
  2011-04-15 11:23             ` Andrew Stubbs
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-03-25 16:27 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On 28/01/11 15:20, Richard Earnshaw wrote:
>
> On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:
>> On 28/01/11 14:12, Richard Earnshaw wrote:
>>> So what happens to a variation of your testcase:
>>>
>>> long long foolong (long long x, short *a, short *b)
>>>      {
>>>          return x + (long long)*a * (long long)*b;
>>>      }
>>>
>>> With your patch?  This should generate identical code to your original
>>> test-case.
>>>
>>
>> The patch has no effect on that testcase - it's broken in some other
>> way, I think, and the same with and without my patch:
>>
>>           ldrsh   r3, [r3, #0]
>>           ldrsh   r2, [r2, #0]
>>           push    {r4, r5}
>>           asrs    r4, r3, #31
>>           asrs    r5, r2, #31
>>           mul     r4, r2, r4
>>           mla     r4, r3, r5, r4
>>           umull   r2, r3, r2, r3
>>           adds    r3, r4, r3
>>           adds    r0, r0, r2
>>           adc     r1, r1, r3
>>           pop     {r4, r5}
>>           bx      lr
>>
>> Hmmm, that probably doesn't add anything useful to the discussion. :(
>>
>> I'll add that one to the todo list ...
>>
>> Andrew
>>
>
> Ouch!  I though that used to work :-(

I looked at this one again, but on a second inspection I'm not sure 
there's much wrong with it?

When I wrote the above I thought that there was a 64-bit multiply 
instruction, but now I look more closely I see there isn't, hence the 
above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does 
a 64-bit -> 64-bit multiply, and then adds 'x'.

Can the umull/add/add/adc be optimized using umlal? It's too late on a 
Friday to workout what's going on with the carries ....

Also, I don't fully understand why the compiler can't just disregard the 
casts and use maddhidi4? Isn't that mathematically equivalent in this case?

When you say it used to work, what did it use to look like?

Thanks

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-03-25 16:27           ` Andrew Stubbs
@ 2011-04-15 11:23             ` Andrew Stubbs
  2011-05-03  9:08               ` Bernd Schmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-04-15 11:23 UTC (permalink / raw)
  Cc: Richard Earnshaw, gcc-patches

Ping.

On 25/03/11 16:19, Andrew Stubbs wrote:
> On 28/01/11 15:20, Richard Earnshaw wrote:
>>
>> On Fri, 2011-01-28 at 15:13 +0000, Andrew Stubbs wrote:
>>> On 28/01/11 14:12, Richard Earnshaw wrote:
>>>> So what happens to a variation of your testcase:
>>>>
>>>> long long foolong (long long x, short *a, short *b)
>>>> {
>>>> return x + (long long)*a * (long long)*b;
>>>> }
>>>>
>>>> With your patch? This should generate identical code to your original
>>>> test-case.
>>>>
>>>
>>> The patch has no effect on that testcase - it's broken in some other
>>> way, I think, and the same with and without my patch:
>>>
>>> ldrsh r3, [r3, #0]
>>> ldrsh r2, [r2, #0]
>>> push {r4, r5}
>>> asrs r4, r3, #31
>>> asrs r5, r2, #31
>>> mul r4, r2, r4
>>> mla r4, r3, r5, r4
>>> umull r2, r3, r2, r3
>>> adds r3, r4, r3
>>> adds r0, r0, r2
>>> adc r1, r1, r3
>>> pop {r4, r5}
>>> bx lr
>>>
>>> Hmmm, that probably doesn't add anything useful to the discussion. :(
>>>
>>> I'll add that one to the todo list ...
>>>
>>> Andrew
>>>
>>
>> Ouch! I though that used to work :-(
>
>
> I looked at this one again, but on a second inspection I'm not sure
> there's much wrong with it?
>
> When I wrote the above I thought that there was a 64-bit multiply
> instruction, but now I look more closely I see there isn't, hence the
> above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does
> a 64-bit -> 64-bit multiply, and then adds 'x'.
>
> Can the umull/add/add/adc be optimized using umlal? It's too late on a
> Friday to workout what's going on with the carries ....
>
> Also, I don't fully understand why the compiler can't just disregard the
> casts and use maddhidi4? Isn't that mathematically equivalent in this case?
>
> When you say it used to work, what did it use to look like?
>
> Thanks
>
> Andrew
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-04-15 11:23             ` Andrew Stubbs
@ 2011-05-03  9:08               ` Bernd Schmidt
  2011-05-03  9:36                 ` Andrew Stubbs
                                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Bernd Schmidt @ 2011-05-03  9:08 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On 04/15/2011 12:54 PM, Andrew Stubbs wrote:
> Ping.

Trying to unblock this...

I think the point is that both examples:

long long foolong (long long x, short *a, short *b)
   {
       return x + (long long)*a * (long long)*b;
   }

and


  long long foolong (long long x, short *a, short *b)
  {
      return x + *a * *b;
  }

should produce the same code using the maddhidi pattern, and for that to
happen, simplify_rtx must produce a canonical form of the 16->64 bit
widening multiply. The form that already exists in arm.md looks better
so we should pick that.

I tried to fix it with the patch below, which unfortunately doesn't work
since during combine we don't see the SIGN_EXTEND operations inside the
MULT, but two shift operations instead. Maybe you can complete it from here?


Bernd

[-- Attachment #2: srtx-widen.diff --]
[-- Type: text/plain, Size: 2105 bytes --]

Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	(revision 173242)
+++ gcc/simplify-rtx.c	(working copy)
@@ -1000,6 +1000,23 @@ simplify_unary_operation_1 (enum rtx_cod
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == SIGN_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1088,23 @@ simplify_unary_operation_1 (enum rtx_cod
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == ZERO_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-03  9:08               ` Bernd Schmidt
@ 2011-05-03  9:36                 ` Andrew Stubbs
  2011-05-18 15:32                 ` [patch][combine] " Andrew Stubbs
  2011-05-24 17:51                 ` [patch][simplify-rtx] " Andrew Stubbs
  2 siblings, 0 replies; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-03  9:36 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Earnshaw, gcc-patches

On 03/05/11 10:07, Bernd Schmidt wrote:
> I tried to fix it with the patch below, which unfortunately doesn't work
> since during combine we don't see the SIGN_EXTEND operations inside the
> MULT, but two shift operations instead. Maybe you can complete it from here?

I'll take a look.

Thanks for the help! :)

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][combine] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-03  9:08               ` Bernd Schmidt
  2011-05-03  9:36                 ` Andrew Stubbs
@ 2011-05-18 15:32                 ` Andrew Stubbs
  2011-05-19 13:06                   ` Bernd Schmidt
  2011-05-24 17:51                 ` [patch][simplify-rtx] " Andrew Stubbs
  2 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-18 15:32 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]

On 03/05/11 10:07, Bernd Schmidt wrote:
> On 04/15/2011 12:54 PM, Andrew Stubbs wrote:
>> Ping.
>
> Trying to unblock this...
>
> I think the point is that both examples:
>
> long long foolong (long long x, short *a, short *b)
>     {
>         return x + (long long)*a * (long long)*b;
>     }
>
> and
>
>
>    long long foolong (long long x, short *a, short *b)
>    {
>        return x + *a * *b;
>    }
>
> should produce the same code using the maddhidi pattern, and for that to
> happen, simplify_rtx must produce a canonical form of the 16->64 bit
> widening multiply. The form that already exists in arm.md looks better
> so we should pick that.
>
> I tried to fix it with the patch below, which unfortunately doesn't work
> since during combine we don't see the SIGN_EXTEND operations inside the
> MULT, but two shift operations instead. Maybe you can complete it from here?

This patch fixes my original testcase (the second one above), and so can 
be a complete replacement for that patch, if preferred? I did originally 
want to do it without exporting the function from combine.c (since 
nothing else appears to be exported like that), but it required moving 
too much code between files.

The patch does not fix the first test case above because combine does 
not recognise the testcase as a HI -> DI multiply-and-add. It can't do 
it because it has already combined the HI load with a HI->DI 
sign-extend, and there's neither a multiply that supports mem arguments, 
nor a multiply with DI mode inputs. I could add define_and_split 
patterns to catch these cases, but that seems the wrong way to do it ...

The real problem is that the widening-multiply tree optimization pass 
does not support multiplies that widen by more than one mode. As far as 
I can tell, ARM is the only backend that tries to do this, although cris 
has a comment to the effect that it would use it if it worked. I am 
investigating a way to fix this by introducing an optab matrix akin to 
the existing one for conversions, but that's another patch.

Is this patch OK? (Assuming my tests pass.)

Andrew

[-- Attachment #2: canonical-mla.patch --]
[-- Type: text/x-patch, Size: 3926 bytes --]

2011-05-18  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* combine.c (make_compound_operation): Remove 'static'.
	* rtl.h (make_compound_operation): New prototype.
	* simplify-rtx.c (simplify_unary_operation_1): Create a new
	canonical form for widening multiplies.

--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -427,7 +427,6 @@ static const_rtx expand_field_assignment (const_rtx);
 static rtx make_extraction (enum machine_mode, rtx, HOST_WIDE_INT,
 			    rtx, unsigned HOST_WIDE_INT, int, int, int);
 static rtx extract_left_shift (rtx, int);
-static rtx make_compound_operation (rtx, enum rtx_code);
 static int get_pos_from_mask (unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT *);
 static rtx canon_reg_for_combine (rtx, rtx);
@@ -7516,7 +7515,7 @@ extract_left_shift (rtx x, int count)
    being kludges), it is MEM.  When processing the arguments of a comparison
    or a COMPARE against zero, it is COMPARE.  */
 
-static rtx
+rtx
 make_compound_operation (rtx x, enum rtx_code in_code)
 {
   enum rtx_code code = GET_CODE (x);
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2372,6 +2372,7 @@ extern unsigned int extended_count (const_rtx, enum machine_mode, int);
 extern rtx remove_death (unsigned int, rtx);
 extern void dump_combine_stats (FILE *);
 extern void dump_combine_total_stats (FILE *);
+extern rtx make_compound_operation (rtx, enum rtx_code);
 
 /* In cfgcleanup.c  */
 extern void delete_dead_jumptables (void);
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -989,6 +989,8 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
       break;
 
     case SIGN_EXTEND:
+      op = make_compound_operation (op, SET);
+
       /* (sign_extend (truncate (minus (label_ref L1) (label_ref L2))))
 	 becomes just the MINUS if its mode is MODE.  This allows
 	 folding switch statements on machines using casesi (such as
@@ -1000,6 +1002,23 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == SIGN_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1090,23 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT
+	  && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  && GET_CODE (XEXP (op, 1)) == ZERO_EXTEND)
+	{
+	  rtx op0 = XEXP (XEXP (op, 0), 0);
+	  rtx op1 = XEXP (XEXP (op, 1), 0);
+	  enum machine_mode op0_mode = GET_MODE (op0);
+	  enum machine_mode op1_mode = GET_MODE (op1);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op0, op0_mode),
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  op1, op1_mode));
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][combine] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-18 15:32                 ` [patch][combine] " Andrew Stubbs
@ 2011-05-19 13:06                   ` Bernd Schmidt
  0 siblings, 0 replies; 27+ messages in thread
From: Bernd Schmidt @ 2011-05-19 13:06 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Earnshaw, gcc-patches

On 05/18/2011 11:55 AM, Andrew Stubbs wrote:
>  
>      case SIGN_EXTEND:
> +      op = make_compound_operation (op, SET);
> +

I would have expected that bit to go into the MULT case (doing it for
both operands), so that the effect is limited to just making widening
multiplies more obvious.

Also, using make_compound_operation unchanged is probably not safe - it
calls SUBST which I think is only valid during combine. I think you may
have to break out the non-recursive part into a
make_compound_operation_1 (probably to be placed into simplify-rtx.c).

Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-03  9:08               ` Bernd Schmidt
  2011-05-03  9:36                 ` Andrew Stubbs
  2011-05-18 15:32                 ` [patch][combine] " Andrew Stubbs
@ 2011-05-24 17:51                 ` Andrew Stubbs
  2011-05-24 21:00                   ` Joseph S. Myers
  2 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-24 17:51 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

On 03/05/11 10:07, Bernd Schmidt wrote:
> I tried to fix it with the patch below, which unfortunately doesn't work
> since during combine we don't see the SIGN_EXTEND operations inside the
> MULT, but two shift operations instead. Maybe you can complete it from here?

I've tried to make this patch go various ways, and I always find a test 
case that doesn't quite work. I think we're approaching it from the 
wrong angle.

The problem is that widening multiplies are required to be defined as 
(mult (extend ..) (extend ..)), but when the combiner tries to build a 
widening multiply pattern from a regular multiply it naturally ends up 
as (extend (mult .. ..)). The result is that the patch Benrd posted made 
existing widening multiplies wider, but failed to convert regular 
multiplies to widening ones.

I've created this new, simpler patch that converts

   (extend (mult a b))

into

   (mult (extend a) (extend b))

regardless of what 'a' and 'b' might be. (These are then simplified and 
superfluous extends removed, of course.)

I find that this patch fixes all the testcases I have, and permitted me 
to add support for ARM smlalbt/smlaltb/smlaltt also (I'll post that in a 
separate patch).

It does assume that the outer sign_extend/zero_extend indicates the 
inner extend types though, so I'm not sure if there's a problem there?

OK?

Andrew

[-- Attachment #2: canonical-mul.patch --]
[-- Type: text/x-patch, Size: 3143 bytes --]

2011-05-24  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Create a new
	canonical form for widening multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,11 @@ Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} @var{x} @var{y}))} is converted
+to @code{(mult:@var{m1} (sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise for @code{zero_extract}.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,21 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Convert (sign_extend (mult ..)) to a canonical widening
+	 mulitplication (mult (sign_extend ..) (sign_extend ..)).  */
+      if (GET_CODE (op) == MULT && GET_MODE (op) < mode)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum machine_mode lhs_mode = GET_MODE (lhs);
+	  enum machine_mode rhs_mode = GET_MODE (rhs);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  lhs, lhs_mode),
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  rhs, rhs_mode));
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1086,21 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Convert (zero_extend (mult ..)) to a canonical widening
+	 mulitplication (mult (zero_extend ..) (zero_extend ..)).  */
+      if (GET_CODE (op) == MULT && GET_MODE (op) < mode)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum machine_mode lhs_mode = GET_MODE (lhs);
+	  enum machine_mode rhs_mode = GET_MODE (rhs);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  lhs, lhs_mode),
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  rhs, rhs_mode));
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-24 17:51                 ` [patch][simplify-rtx] " Andrew Stubbs
@ 2011-05-24 21:00                   ` Joseph S. Myers
  2011-05-25  9:47                     ` Andrew Stubbs
  2011-05-25 13:42                     ` Andrew Stubbs
  0 siblings, 2 replies; 27+ messages in thread
From: Joseph S. Myers @ 2011-05-24 21:00 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On Tue, 24 May 2011, Andrew Stubbs wrote:

> I've created this new, simpler patch that converts
> 
>   (extend (mult a b))
> 
> into
> 
>   (mult (extend a) (extend b))
> 
> regardless of what 'a' and 'b' might be. (These are then simplified and
> superfluous extends removed, of course.)

Are there some missing conditions here?  The two aren't equivalent in 
general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode 
(with modulo arithmetic on overflow) before extending the possibly wrapped 
result to SImode.  You'd need a and b themselves to be extended from 
narrower modes in such a way that if you interpret the extended values in 
the signedness of the outer extension, the result of the multiplication is 
exactly representable in the mode of the multiplication.  (For example, if 
both values are extended from QImode, and all extensions have the same 
signedness, that would be OK.  There are cases that are OK where not all 
extensions have the same signedness, e.g. (sign_extend:DI (mult:SI a b)) 
where a and b are zero-extended from HImode or QImode, at least one from 
QImode, though there the outer extension is equivalent to a 
zero-extension.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-24 21:00                   ` Joseph S. Myers
@ 2011-05-25  9:47                     ` Andrew Stubbs
  2011-05-25 12:34                       ` Joseph S. Myers
  2011-05-25 13:42                     ` Andrew Stubbs
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-25  9:47 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On 24/05/11 20:35, Joseph S. Myers wrote:
> On Tue, 24 May 2011, Andrew Stubbs wrote:
>
>> I've created this new, simpler patch that converts
>>
>>    (extend (mult a b))
>>
>> into
>>
>>    (mult (extend a) (extend b))
>>
>> regardless of what 'a' and 'b' might be. (These are then simplified and
>> superfluous extends removed, of course.)
>
> Are there some missing conditions here?  The two aren't equivalent in
> general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode
> (with modulo arithmetic on overflow) before extending the possibly wrapped
> result to SImode.  You'd need a and b themselves to be extended from
> narrower modes in such a way that if you interpret the extended values in
> the signedness of the outer extension, the result of the multiplication is
> exactly representable in the mode of the multiplication.  (For example, if
> both values are extended from QImode, and all extensions have the same
> signedness, that would be OK.  There are cases that are OK where not all
> extensions have the same signedness, e.g. (sign_extend:DI (mult:SI a b))
> where a and b are zero-extended from HImode or QImode, at least one from
> QImode, though there the outer extension is equivalent to a
> zero-extension.)

So, you're saying that promoting a regular multiply to a widening 
multiply isn't a valid transformation anyway? I suppose that does make 
sense. I knew something was too easy.

OK, I'll go try again. :)

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-25  9:47                     ` Andrew Stubbs
@ 2011-05-25 12:34                       ` Joseph S. Myers
  0 siblings, 0 replies; 27+ messages in thread
From: Joseph S. Myers @ 2011-05-25 12:34 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On Wed, 25 May 2011, Andrew Stubbs wrote:

> So, you're saying that promoting a regular multiply to a widening multiply
> isn't a valid transformation anyway? I suppose that does make sense. I knew

In general, yes.  RTL always has modulo semantics (except for division and 
remainder by -1); all optimizations based on undefinedness of overflow (in 
the absence of -fwrapv) happen at tree/GIMPLE level, where signed and 
unsigned types are still distinct.  (So you could promote a regular 
multiply of signed types at GIMPLE level in the absence of 
-fwrapv/-ftrapv, but not at RTL level and not for unsigned types at GIMPLE 
level.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-24 21:00                   ` Joseph S. Myers
  2011-05-25  9:47                     ` Andrew Stubbs
@ 2011-05-25 13:42                     ` Andrew Stubbs
  2011-05-25 13:48                       ` Joseph S. Myers
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-25 13:42 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

On 24/05/11 20:35, Joseph S. Myers wrote:
> On Tue, 24 May 2011, Andrew Stubbs wrote:
>
>> I've created this new, simpler patch that converts
>>
>>    (extend (mult a b))
>>
>> into
>>
>>    (mult (extend a) (extend b))
>>
>> regardless of what 'a' and 'b' might be. (These are then simplified and
>> superfluous extends removed, of course.)
>
> Are there some missing conditions here?  The two aren't equivalent in
> general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode
> (with modulo arithmetic on overflow) before extending the possibly wrapped
> result to SImode.

Ok, I've now modified my patch to prevent it widening regular multiplies.

It now converts

    (extend (mult (extend a) (extend b)))

to

    (mult (newextend a) (newextend b))

But I also have it convert

    (extend (mult (shift a) (extend b)))

to

    (mult (newextend (shift a)) (newextend b))

The latter case is to catch widening multiplies that extract a subreg 
using a shift. I don't understand why it doesn't just use subreg in the 
first place, but apparently it doesn't, and changing it to do that would 
no doubt break many existing machine descriptions.

The latter case also happens to catch cases where an extend is 
represented by (ashiftrt (ashift x)), which is nice.

I know that, potentially, not all shifted operands are going to be 
widening multiplies, but I *think* this should be safe because other 
random shift values are unlikely to match a real widening mult 
instruction (and if they do then the code would already be broken). If 
somebody knows a reason why this isn't safe then I think I'm going to 
need some help figuring out what conditions to use.

OK?

Andrew

[-- Attachment #2: canonical-mul.patch --]
[-- Type: text/x-patch, Size: 4484 bytes --]

2011-05-25  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
	multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,21 @@ Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
+(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
+(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
+for @code{zero_extend}.
+
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
+@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted to
+@code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2} @var{x} @var{s}))
+(sign_extend:@var{m1} @var{y}))}, and likewise for patterns using @code{zero_extend}
+and @code{lshiftrt}.  If the second operand of @code{mult} is also a shift,
+then that is extended also.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,34 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register. We assume
+	     it is safe to widen all such operands because other examples
+	     won't match real instructions.  */
+	  if ((lcode == SIGN_EXTEND || lcode == ASHIFTRT)
+	      && (rcode == SIGN_EXTEND || rcode == ASHIFTRT))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (lhs);
+	      return simplify_gen_binary (MULT, mode,
+					  simplify_gen_unary (SIGN_EXTEND,
+							      mode,
+							      lhs, lmode),
+					  simplify_gen_unary (SIGN_EXTEND,
+							      mode,
+							      rhs, rmode));
+	    }
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1099,34 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register. We assume
+	     it is safe to widen all such operands because other examples
+	     won't match real instructions.  */
+	  if ((lcode == ZERO_EXTEND || lcode == LSHIFTRT)
+	      && (rcode == ZERO_EXTEND || rcode == LSHIFTRT))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (lhs);
+	      return simplify_gen_binary (MULT, mode,
+					  simplify_gen_unary (ZERO_EXTEND,
+							      mode,
+							      lhs, lmode),
+					  simplify_gen_unary (ZERO_EXTEND,
+							      mode,
+							      rhs, rmode));
+	    }
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-25 13:42                     ` Andrew Stubbs
@ 2011-05-25 13:48                       ` Joseph S. Myers
  2011-05-25 14:01                         ` Andrew Stubbs
  0 siblings, 1 reply; 27+ messages in thread
From: Joseph S. Myers @ 2011-05-25 13:48 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On Wed, 25 May 2011, Andrew Stubbs wrote:

> I know that, potentially, not all shifted operands are going to be widening
> multiplies, but I *think* this should be safe because other random shift
> values are unlikely to match a real widening mult instruction (and if they do
> then the code would already be broken). If somebody knows a reason why this
> isn't safe then I think I'm going to need some help figuring out what
> conditions to use.

Random supposition like that is not a sensible basis for modifying GCC.

I haven't managed to produce an example of code demonstrating the problem, 
but that's probably because I'm not sufficiently familiar with all the RTL 
optimizers.  Where is the guarantee that the inputs to these functions 
must represent real instructions, or that the outputs will only be used if 
they represent real instructions?  Where are the assertions to ensure that 
wrong code is not quietly generated if this is not the case?  Where is the 
documentation of what instruction patterns it is not permitted to put in 
.md files because they would violate the assumptions about what 
instructions you are permitted to represent in RTL?  How have you checked 
there are no existing problematic instruction patterns?

RTL has defined abstract semantics and RTL transformations should be ones 
that are valid in accordance with those semantics, with proper assertions 
if there are additional constraints on the input passed to a function.  
This means actually counting the numbers of variable bits in the operands 
to determine whether the multiplication could overflow.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-25 13:48                       ` Joseph S. Myers
@ 2011-05-25 14:01                         ` Andrew Stubbs
  2011-05-25 14:27                           ` Joseph S. Myers
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-25 14:01 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On 25/05/11 14:19, Joseph S. Myers wrote:
> RTL has defined abstract semantics and RTL transformations should be ones
> that are valid in accordance with those semantics, with proper assertions
> if there are additional constraints on the input passed to a function.
> This means actually counting the numbers of variable bits in the operands
> to determine whether the multiplication could overflow.

Ok, fair enough, so how can I identify a valid subreg extraction that is 
defined in terms of shifts?

The case that I care about is simple enough:

    (mult:SI (ashiftrt:SI (reg:SI rM)
                          (const_int 16))
             (sign_extend:SI (subreg:HI (reg:SI rN) 0)))

I guess that's just equivalent to this:

    (mult:SI (sign_extend:SI (subreg:HI (reg:SI rM) 4)))
             (sign_extend:SI (subreg:HI (reg:SI rN) 0)))

but it chooses not to represent it that way, which is less than helpful 
in this case.

So I could just scan for that exact pattern, or perhaps look for shift 
sizes that are half the size of the register, or some such thing, but is 
that general enough? Or is it too general again?

Is there anything else I've missed?

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-25 14:01                         ` Andrew Stubbs
@ 2011-05-25 14:27                           ` Joseph S. Myers
  2011-05-26 13:57                             ` Andrew Stubbs
  0 siblings, 1 reply; 27+ messages in thread
From: Joseph S. Myers @ 2011-05-25 14:27 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches

On Wed, 25 May 2011, Andrew Stubbs wrote:

> On 25/05/11 14:19, Joseph S. Myers wrote:
> > RTL has defined abstract semantics and RTL transformations should be ones
> > that are valid in accordance with those semantics, with proper assertions
> > if there are additional constraints on the input passed to a function.
> > This means actually counting the numbers of variable bits in the operands
> > to determine whether the multiplication could overflow.
> 
> Ok, fair enough, so how can I identify a valid subreg extraction that is
> defined in terms of shifts?

The shift must be by a positive constant amount, strictly less than the 
precision (GET_MODE_PRECISION) of the mode (of the value being shifted).  
If that applies, the relevant number of bits is the precision of the mode 
minus the number of bits of the shift.  For an extension, just take the 
number of bits in the inner mode.  Add the two numbers of bits; if the 
result does not exceed the number of bits in the mode (of the operands and 
the multiplication) then the multiplication won't overflow.

As in your patch, either all the operands must be sign-extensions / 
arithmetic shifts (and then the result is equivalent to a widening signed 
multiply), or all must be zero-extensions / logical shifts (and the result 
is a widening unsigned multiply).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-25 14:27                           ` Joseph S. Myers
@ 2011-05-26 13:57                             ` Andrew Stubbs
  2011-05-26 14:47                               ` Joseph S. Myers
  2011-06-02  9:46                               ` Richard Earnshaw
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Stubbs @ 2011-05-26 13:57 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches, patches

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

On 25/05/11 14:47, Joseph S. Myers wrote:
> The shift must be by a positive constant amount, strictly less than the
> precision (GET_MODE_PRECISION) of the mode (of the value being shifted).
> If that applies, the relevant number of bits is the precision of the mode
> minus the number of bits of the shift.  For an extension, just take the
> number of bits in the inner mode.  Add the two numbers of bits; if the
> result does not exceed the number of bits in the mode (of the operands and
> the multiplication) then the multiplication won't overflow.

I believe the attached should implement what you describe.

Is the patch OK now?

Andrew

[-- Attachment #2: canonical-mul.patch --]
[-- Type: text/x-patch, Size: 5764 bytes --]

2011-05-26  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
	multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,23 @@ Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
+(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
+(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
+for @code{zero_extend}.
+
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
+@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted
+to @code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2}
+@var{x} @var{s})) (sign_extend:@var{m1} @var{y}))}, and likewise for
+patterns using @code{zero_extend} and @code{lshiftrt}.  If the second
+operand of @code{mult} is also a shift, then that is extended also.
+This transformation is only applied when it can be proven that the
+original operation had sufficient precision to prevent overflow.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register.  */
+	  if ((lcode == SIGN_EXTEND
+	       || (lcode == ASHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
+	      && (rcode == SIGN_EXTEND
+		  || (rcode == ASHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (rhs);
+	      int bits;
+
+	      if (lcode == ASHIFTRT)
+		/* Number of bits not shifted off the end.  */
+		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
+	      else /* lcode == SIGN_EXTEND */
+		/* Size of inner mode.  */
+		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
+
+	      if (rcode == ASHIFTRT)
+		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
+	      else /* rcode == SIGN_EXTEND */
+		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
+
+	      /* We can only widen multiplies if the result is mathematiclly
+		 equivalent.  I.e. if overflow was impossible.  */
+	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
+		return simplify_gen_binary
+			 (MULT, mode,
+			  simplify_gen_unary (SIGN_EXTEND, mode, lhs, lmode),
+			  simplify_gen_unary (SIGN_EXTEND, mode, rhs, rmode));
+	    }
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1113,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register.  */
+	  if ((lcode == ZERO_EXTEND
+	       || (lcode == LSHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
+	      && (rcode == ZERO_EXTEND
+		  || (rcode == LSHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (rhs);
+	      int bits;
+
+	      if (lcode == LSHIFTRT)
+		/* Number of bits not shifted off the end.  */
+		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
+	      else /* lcode == ZERO_EXTEND */
+		/* Size of inner mode.  */
+		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
+
+	      if (rcode == LSHIFTRT)
+		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
+	      else /* rcode == ZERO_EXTEND */
+		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
+
+	      /* We can only widen multiplies if the result is mathematiclly
+		 equivalent.  I.e. if overflow was impossible.  */
+	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
+		return simplify_gen_binary
+			 (MULT, mode,
+			  simplify_gen_unary (ZERO_EXTEND, mode, lhs, lmode),
+			  simplify_gen_unary (ZERO_EXTEND, mode, rhs, rmode));
+	    }
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-26 13:57                             ` Andrew Stubbs
@ 2011-05-26 14:47                               ` Joseph S. Myers
  2011-06-02  9:46                               ` Richard Earnshaw
  1 sibling, 0 replies; 27+ messages in thread
From: Joseph S. Myers @ 2011-05-26 14:47 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Bernd Schmidt, Richard Earnshaw, gcc-patches, patches

On Thu, 26 May 2011, Andrew Stubbs wrote:

> On 25/05/11 14:47, Joseph S. Myers wrote:
> > The shift must be by a positive constant amount, strictly less than the
> > precision (GET_MODE_PRECISION) of the mode (of the value being shifted).
> > If that applies, the relevant number of bits is the precision of the mode
> > minus the number of bits of the shift.  For an extension, just take the
> > number of bits in the inner mode.  Add the two numbers of bits; if the
> > result does not exceed the number of bits in the mode (of the operands and
> > the multiplication) then the multiplication won't overflow.
> 
> I believe the attached should implement what you describe.
> 
> Is the patch OK now?

It appears to implement the logic I described, so I have no further 
comments on it (but can't review it).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-05-26 13:57                             ` Andrew Stubbs
  2011-05-26 14:47                               ` Joseph S. Myers
@ 2011-06-02  9:46                               ` Richard Earnshaw
  2011-06-07 11:05                                 ` Andrew Stubbs
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Earnshaw @ 2011-06-02  9:46 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Joseph S. Myers, Bernd Schmidt, gcc-patches, patches


On Thu, 2011-05-26 at 14:35 +0100, Andrew Stubbs wrote:
> On 25/05/11 14:47, Joseph S. Myers wrote:
> > The shift must be by a positive constant amount, strictly less than the
> > precision (GET_MODE_PRECISION) of the mode (of the value being shifted).
> > If that applies, the relevant number of bits is the precision of the mode
> > minus the number of bits of the shift.  For an extension, just take the
> > number of bits in the inner mode.  Add the two numbers of bits; if the
> > result does not exceed the number of bits in the mode (of the operands and
> > the multiplication) then the multiplication won't overflow.
> 
> I believe the attached should implement what you describe.
> 
> Is the patch OK now?
> 
> Andrew

OK.

R.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
  2011-06-02  9:46                               ` Richard Earnshaw
@ 2011-06-07 11:05                                 ` Andrew Stubbs
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Stubbs @ 2011-06-07 11:05 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Joseph S. Myers, Bernd Schmidt, gcc-patches, patches

On 02/06/11 10:46, Richard Earnshaw wrote:
> OK.

Committed, thanks.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate
  2011-01-28 15:08     ` Richard Earnshaw
  2011-01-28 15:55       ` Andrew Stubbs
@ 2011-02-09  6:31       ` Hans-Peter Nilsson
  1 sibling, 0 replies; 27+ messages in thread
From: Hans-Peter Nilsson @ 2011-02-09  6:31 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On Fri, 28 Jan 2011, Richard Earnshaw wrote:
> Canonical RTL is not just about the order of the operands its about not
> having multiple ways of representing the same operation.  Order is part
> of that, but not the end of it.  For example, the canonical form of
>
> 	(lt:SI (reg:SI) (const_int 0))
>
> is
> 	(lshiftrt:SI (reg:SI) (const_int 31))

I kind of doubt that's canon.  At least it shouldn't be that
way, except as a variant attempted by combine.

If so, ISTM i'd be a complicating factor for the cmpM patterns
of most ports. :)  (Likewise unnamed variants for combine.)

brgds, H-P

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-06-07 11:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-09 16:45 [patch][ARM] Fix 16-bit -> 64-bit multiply and accumulate Andrew Stubbs
2011-01-27  2:09 ` Ramana Radhakrishnan
2011-01-27 10:46   ` Andrew Stubbs
2011-01-27 19:22 ` Richard Earnshaw
2011-01-28 13:06   ` Andrew Stubbs
2011-01-28 15:08     ` Richard Earnshaw
2011-01-28 15:55       ` Andrew Stubbs
2011-01-28 16:01         ` Richard Earnshaw
2011-03-25 16:27           ` Andrew Stubbs
2011-04-15 11:23             ` Andrew Stubbs
2011-05-03  9:08               ` Bernd Schmidt
2011-05-03  9:36                 ` Andrew Stubbs
2011-05-18 15:32                 ` [patch][combine] " Andrew Stubbs
2011-05-19 13:06                   ` Bernd Schmidt
2011-05-24 17:51                 ` [patch][simplify-rtx] " Andrew Stubbs
2011-05-24 21:00                   ` Joseph S. Myers
2011-05-25  9:47                     ` Andrew Stubbs
2011-05-25 12:34                       ` Joseph S. Myers
2011-05-25 13:42                     ` Andrew Stubbs
2011-05-25 13:48                       ` Joseph S. Myers
2011-05-25 14:01                         ` Andrew Stubbs
2011-05-25 14:27                           ` Joseph S. Myers
2011-05-26 13:57                             ` Andrew Stubbs
2011-05-26 14:47                               ` Joseph S. Myers
2011-06-02  9:46                               ` Richard Earnshaw
2011-06-07 11:05                                 ` Andrew Stubbs
2011-02-09  6:31       ` [patch][ARM] " Hans-Peter Nilsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).