public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
@ 2007-05-16 20:25 Uros Bizjak
  2007-05-16 20:34 ` Chris Lattner
  0 siblings, 1 reply; 36+ messages in thread
From: Uros Bizjak @ 2007-05-16 20:25 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

Hello!

This patch adds 128bit operations for x86_64 to longlong.h to speed up 
TImode and TFmode arithmetic. The patch also redefines i386's 
definitions of count_trailing/leading_zeros from asm to 
__builtin_ctz/__builtin_clz builtins, as provided by i386 backend.

Patch was bootstrapped on x86_64-pc-linux-gnu, regression tested for all 
default languages with and without -m32.

2007-05-16  Uros Bizjak  <ubizjak@gmail.com>

        * longlong.h (__x86_64__): Add definitions for add_ssaaaa,
        sub_ddmmss, umul_ppmm, udiv_qrnnd, count_leading_zeros and
        count_trailing_zeros.
        (__i386__): Implement count_leading_zeros using __builtin_clz().
        Implement count_trailing_zeros usign __builtin_ctz().

Uros.

[-- Attachment #2: x86_64-longlong.diff --]
[-- Type: text/x-patch, Size: 2243 bytes --]

Index: longlong.h
===================================================================
--- longlong.h	(revision 124771)
+++ longlong.h	(working copy)
@@ -341,19 +341,48 @@
 	   : "0" ((USItype) (n0)),					\
 	     "1" ((USItype) (n1)),					\
 	     "rm" ((USItype) (dv)))
-#define count_leading_zeros(count, x) \
-  do {									\
-    USItype __cbtmp;							\
-    __asm__ ("bsrl %1,%0"						\
-	     : "=r" (__cbtmp) : "rm" ((USItype) (x)));			\
-    (count) = __cbtmp ^ 31;						\
-  } while (0)
-#define count_trailing_zeros(count, x) \
-  __asm__ ("bsfl %1,%0" : "=r" (count) : "rm" ((USItype)(x)))
+#define count_leading_zeros(count, x)	((count) = __builtin_clz (x))
+#define count_trailing_zeros(count, x)	((count) = __builtin_ctz (x))
 #define UMUL_TIME 40
 #define UDIV_TIME 40
 #endif /* 80x86 */
 
+#if (defined (__x86_64__) || defined (__i386__)) && W_TYPE_SIZE == 64
+#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+  __asm__ ("addq %5,%1\n\tadcq %3,%0"					\
+	   : "=r" ((UDItype) (sh)),					\
+	     "=&r" ((UDItype) (sl))					\
+	   : "%0" ((UDItype) (ah)),					\
+	     "rem" ((UDItype) (bh)),					\
+	     "%1" ((UDItype) (al)),					\
+	     "rem" ((UDItype) (bl)))
+#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+  __asm__ ("subq %5,%1\n\tsbbq %3,%0"					\
+	   : "=r" ((UDItype) (sh)),					\
+	     "=&r" ((UDItype) (sl))					\
+	   : "0" ((UDItype) (ah)),					\
+	     "rem" ((UDItype) (bh)),					\
+	     "1" ((UDItype) (al)),					\
+	     "rem" ((UDItype) (bl)))
+#define umul_ppmm(w1, w0, u, v) \
+  __asm__ ("mulq %3"							\
+	   : "=a" ((UDItype) (w0)),					\
+	     "=d" ((UDItype) (w1))					\
+	   : "%0" ((UDItype) (u)),					\
+	     "rm" ((UDItype) (v)))
+#define udiv_qrnnd(q, r, n1, n0, dv) \
+  __asm__ ("divq %4"							\
+	   : "=a" ((UDItype) (q)),					\
+	     "=d" ((UDItype) (r))					\
+	   : "0" ((UDItype) (n0)),					\
+	     "1" ((UDItype) (n1)),					\
+	     "rm" ((UDItype) (dv)))
+#define count_leading_zeros(count, x)	((count) = __builtin_clzl (x))
+#define count_trailing_zeros(count, x)	((count) = __builtin_ctzl (x))
+#define UMUL_TIME 40
+#define UDIV_TIME 40
+#endif /* x86_64 */
+
 #if defined (__i960__) && W_TYPE_SIZE == 32
 #define umul_ppmm(w1, w0, u, v) \
   ({union {UDItype __ll;						\

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-16 20:25 [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations Uros Bizjak
@ 2007-05-16 20:34 ` Chris Lattner
  2007-05-17  5:57   ` Uros Bizjak
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Lattner @ 2007-05-16 20:34 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches

On May 16, 2007, at 1:23 PM, Uros Bizjak wrote:
> Hello!
> This patch adds 128bit operations for x86_64 to longlong.h to speed  
> up TImode and TFmode arithmetic.

Why not implement these in terms of TImode operations themselves?   
Inline asm should be avoided when it can be :)

-Chris



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-16 20:34 ` Chris Lattner
@ 2007-05-17  5:57   ` Uros Bizjak
  2007-05-17  6:09     ` Eric Christopher
  2007-05-17  6:43     ` Chris Lattner
  0 siblings, 2 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-17  5:57 UTC (permalink / raw)
  To: Chris Lattner; +Cc: GCC Patches

On 5/16/07, Chris Lattner <clattner@apple.com> wrote:

> > This patch adds 128bit operations for x86_64 to longlong.h to speed
> > up TImode and TFmode arithmetic.
>
> Why not implement these in terms of TImode operations themselves?
> Inline asm should be avoided when it can be :)

Unfortunatelly longlong.h definitions require their operands to be in
DImode (for x86_64 target). I guess it is not a coincidence, that
these definitions can be implemented with exactly one x86 insn (two
for carry-propagating addition and subtraction). Asm with one insn is
not harmful.

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  5:57   ` Uros Bizjak
@ 2007-05-17  6:09     ` Eric Christopher
  2007-05-17  6:12       ` Andrew Pinski
  2007-05-17  6:43     ` Chris Lattner
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Christopher @ 2007-05-17  6:09 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Chris Lattner, GCC Patches


On May 16, 2007, at 10:57 PM, Uros Bizjak wrote:

> On 5/16/07, Chris Lattner <clattner@apple.com> wrote:
>
>> > This patch adds 128bit operations for x86_64 to longlong.h to speed
>> > up TImode and TFmode arithmetic.
>>
>> Why not implement these in terms of TImode operations themselves?
>> Inline asm should be avoided when it can be :)
>
> Unfortunatelly longlong.h definitions require their operands to be in
> DImode (for x86_64 target). I guess it is not a coincidence, that
> these definitions can be implemented with exactly one x86 insn (two
> for carry-propagating addition and subtraction). Asm with one insn is
> not harmful.

Don't forget that asms are often barriers to optimization so as few asm
as possible is always better.

That said, I don't off the top of my head have a better  
implementation :)

-eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  6:09     ` Eric Christopher
@ 2007-05-17  6:12       ` Andrew Pinski
  2007-05-17  6:22         ` Uros Bizjak
  2007-05-17 11:13         ` Uros Bizjak
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Pinski @ 2007-05-17  6:12 UTC (permalink / raw)
  To: Eric Christopher; +Cc: Uros Bizjak, Chris Lattner, GCC Patches

On 5/16/07, Eric Christopher <echristo@apple.com> wrote:
>
> Don't forget that asms are often barriers to optimization so as few asm
> as possible is always better.
>
> That said, I don't off the top of my head have a better
> implementation :)

Oh and longlong.h belongs to GMP so you should submit your changes
upstream too.  Plus the uses of what is located in longlong.h for GCC
is only in the support library and who uses TImode anyways :).


Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  6:12       ` Andrew Pinski
@ 2007-05-17  6:22         ` Uros Bizjak
  2007-05-17 11:13         ` Uros Bizjak
  1 sibling, 0 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-17  6:22 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Eric Christopher, Chris Lattner, GCC Patches

On 5/17/07, Andrew Pinski <pinskia@gmail.com> wrote:

> > Don't forget that asms are often barriers to optimization so as few asm
> > as possible is always better.

Yes, but please note that these are not __volatile__.

> > That said, I don't off the top of my head have a better
> > implementation :)
>
> Oh and longlong.h belongs to GMP so you should submit your changes
> upstream too.  Plus the uses of what is located in longlong.h for GCC

Thanks, I'll submit it there.

> is only in the support library and who uses TImode anyways :).

And 640k should be enough for everybody ;]

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  5:57   ` Uros Bizjak
  2007-05-17  6:09     ` Eric Christopher
@ 2007-05-17  6:43     ` Chris Lattner
  2007-05-17  7:22       ` Paolo Bonzini
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Lattner @ 2007-05-17  6:43 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches


On May 16, 2007, at 10:57 PM, Uros Bizjak wrote:

> On 5/16/07, Chris Lattner <clattner@apple.com> wrote:
>
>> > This patch adds 128bit operations for x86_64 to longlong.h to speed
>> > up TImode and TFmode arithmetic.
>>
>> Why not implement these in terms of TImode operations themselves?
>> Inline asm should be avoided when it can be :)
>
> Unfortunatelly longlong.h definitions require their operands to be in
> DImode (for x86_64 target). I guess it is not a coincidence, that
> these definitions can be implemented with exactly one x86 insn (two
> for carry-propagating addition and subtraction). Asm with one insn is
> not harmful.

Just combine them together with logical ops.  Something like this:

typedef unsigned DI __attribute__((mode(DI)));
typedef unsigned TI __attribute__((mode(TI)));

void test_add(DI AL, DI AH, DI BL, DI BH, DI *RL, DI *RH) {
   TI A = AL | ((TI)AH << 64);
   TI B = AL | ((TI)BH << 64);

   TI C = A + B;
   *RL = (DI)C;
   *RH = (DI)(C >> 64);
}

void test_sub(DI AL, DI AH, DI BL, DI BH, DI *RL, DI *RH) {
   TI A = AL | ((TI)AH << 64);
   TI B = BL | ((TI)BH << 64);

   TI C = A - B;
   *RL = (DI)C;
   *RH = (DI)(C >> 64);
}


Should produce some thing like this:

_test_add:
         addq %rdi, %rdi
         adcq %rsi, %rcx
         movq %rdi, (%r8)
         movq %rcx, (%r9)
         ret


         .align  4
         .globl  _test_sub
_test_sub:
         subq %rdx, %rdi
         sbbq %rcx, %rsi
         movq %rdi, (%r8)
         movq %rsi, (%r9)
         ret

         .subsections_via_symbols


Inline asms are bad for a number of reasons: as Andrew mentioned,  
they are scheduling barriers, also the compiler can't reason about  
them, constant fold, simplify, can't know the sizes of the  
instructions precisely, etc.

If your compiler isn't producing decent code for simple idioms like  
this, it seems profitable to fix the compiler, then write portable[1]  
C code to express these things.   The compiler already knows how to  
efficiently implement 128-bit arithmetic, why re-implement the smarts  
of the compiler in a header?

-Chris

[1] portable across 64-bit GCC targets, not across other compilers of  
course.  However, gcc-style inline asm isn't portable to other  
compilers either.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  6:43     ` Chris Lattner
@ 2007-05-17  7:22       ` Paolo Bonzini
  2007-05-17  9:25         ` Uros Bizjak
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2007-05-17  7:22 UTC (permalink / raw)
  To: Chris Lattner; +Cc: Uros Bizjak, GCC Patches

> void test_add(DI AL, DI AH, DI BL, DI BH, DI *RL, DI *RH) {
>   TI A = AL | ((TI)AH << 64);
>   TI B = AL | ((TI)BH << 64);
> 
>   TI C = A + B;
>   *RL = (DI)C;
>   *RH = (DI)(C >> 64);
> } 
>
> Should produce some thing like this:
> 
> _test_add:
>         addq %rdi, %rdi
>         adcq %rsi, %rcx
>         movq %rdi, (%r8)
>         movq %rcx, (%r9)
>         ret

Right.  This probably wasn't true before the lower-subreg pass.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  7:22       ` Paolo Bonzini
@ 2007-05-17  9:25         ` Uros Bizjak
  2007-05-17 11:15           ` Rask Ingemann Lambertsen
  2007-05-17 16:49           ` Chris Lattner
  0 siblings, 2 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-17  9:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Chris Lattner, GCC Patches

On 5/17/07, Paolo Bonzini <bonzini@gnu.org> wrote:

> > void test_add(DI AL, DI AH, DI BL, DI BH, DI *RL, DI *RH) {
> >   TI A = AL | ((TI)AH << 64);
> >   TI B = AL | ((TI)BH << 64);
> >
> >   TI C = A + B;
> >   *RL = (DI)C;
> >   *RH = (DI)(C >> 64);
> > }
> >
> > Should produce some thing like this:
> >
> > _test_add:
> >         addq %rdi, %rdi
> >         adcq %rsi, %rcx
> >         movq %rdi, (%r8)
> >         movq %rcx, (%r9)
> >         ret
>
> Right.  This probably wasn't true before the lower-subreg pass.

Unfortunatelly, this approach confuses gcc a lot. Attached testcase
will illustrate the problem on i686 (substitute SI with DI and DI with
TI on 64bit and change shift values for the same result):

--cut here--
typedef unsigned SI __attribute__ ((mode (SI)));
typedef unsigned DI __attribute__ ((mode (DI)));

#define add_ssaaaa_c(sh, sl, ah, al, bh, bl)	\
  {						\
    DI __a = (al) | ((DI) ah << 32);		\
    DI __b = (bl) | ((DI) bh << 32);		\
						\
    DI __c = __a + __b;				\
						\
    (sl) = (SI) (__c & 0xffffffff);		\
    (sh) = (SI) (__c >> 32);			\
  }

#define add_ssaaaa_asm(sh, sl, ah, al, bh, bl)	\
  __asm__ ("addl %5,%1\n\tadcl %3,%0"		\
	   : "=r" ((SI) (sh)),			\
	     "=&r" ((SI) (sl))			\
	   : "%0" ((SI) (ah)),			\
	     "g" ((SI) (bh)),			\
	     "%1" ((SI) (al)),			\
	     "g" ((SI) (bl)))

void test_c (SI a, SI b, SI c, SI d)
{
  volatile SI x, y;

  add_ssaaaa_c (x, y, a, b, c, d);
}

void test_asm (SI a, SI b, SI c, SI d)
{
  volatile SI x, y;

  add_ssaaaa_asm (x, y, a, b, c, d);
}
--cut here--

Compared two approaches head-to-head on i686 target; gcc -O2
-fomit-frame-pointer generates:

test_c:
        subl    $28, %esp       #,
        xorl    %edx, %edx      #
        movl    40(%esp), %eax  # c, tmp66
        movl    44(%esp), %ecx  # d, d
        movl    %esi, 20(%esp)  #,
        movl    %ebx, 16(%esp)  #,
        movl    %eax, %edx      # tmp66,
        movl    $0, %eax        #, tmp66
        movl    %eax, %esi      #, tmp74
        movl    32(%esp), %eax  # a, tmp70
        movl    %edx, %ebx      # tmp75, __c
        orl     %ecx, %esi      # d, tmp74
        xorl    %edx, %edx      #
        movl    %esi, %ecx      # tmp74, __c
        movl    36(%esp), %esi  # b, b
        movl    %edi, 24(%esp)  #,
        movl    24(%esp), %edi  #,
        movl    %eax, %edx      # tmp70,
        movl    $0, %eax        #, tmp70
        orl     %esi, %eax      # b, tmp72
        movl    20(%esp), %esi  #,
        addl    %eax, %ecx      # tmp72, __c
        adcl    %edx, %ebx      #, __c
        movl    %ecx, 8(%esp)   # __c, y
        movl    %ebx, %ecx      # __c, __c
        xorl    %ebx, %ebx      # __c
        movl    16(%esp), %ebx  #,
        movl    %ecx, 12(%esp)  # __c, x
        addl    $28, %esp       #,
        ret

and on the other hand for asm approach:

test_asm:
        subl    $16, %esp       #,
        movl    20(%esp), %eax  # a, a
        movl    24(%esp), %edx  # b, b
#APP
        addl 32(%esp),%edx      # d, tmp63
        adcl 28(%esp),%eax      # c, tmp62
#NO_APP
        movl    %eax, 12(%esp)  # tmp62, x
        movl    %edx, 8(%esp)   # tmp63, y
        addl    $16, %esp       #,
        ret

Although the adition itself generates expected code, assembling SImode
values into DImode really confuses gcc. Also, restore of %ebx from
stack interferes with final DImode->SImode split in a really bad way.

Looking into this mess, I think that gcc has still a long way to go
w.r.t to DImode values on 32bit targets (and TImode values on 64bit
targets) and the only solution for the near future is using asm
here...

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  6:12       ` Andrew Pinski
  2007-05-17  6:22         ` Uros Bizjak
@ 2007-05-17 11:13         ` Uros Bizjak
  1 sibling, 0 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-17 11:13 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Eric Christopher, Chris Lattner, GCC Patches

On 5/17/07, Andrew Pinski <pinskia@gmail.com> wrote:

> Oh and longlong.h belongs to GMP so you should submit your changes
> upstream too.  Plus the uses of what is located in longlong.h for GCC
> is only in the support library and who uses TImode anyways :).

Hm, in gmp-4.2 longlong.h, the same code to support x86_64 is already present.
Perhaps it is time to update longlong.h from GMP source.

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-17  9:25         ` Uros Bizjak
@ 2007-05-17 11:15           ` Rask Ingemann Lambertsen
  2007-05-17 21:38             ` Rask Ingemann Lambertsen
  2007-05-17 16:49           ` Chris Lattner
  1 sibling, 1 reply; 36+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-17 11:15 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Paolo Bonzini, Chris Lattner, GCC Patches

On Thu, May 17, 2007 at 11:25:34AM +0200, Uros Bizjak wrote:
 
> Unfortunatelly, this approach confuses gcc a lot. Attached testcase
> will illustrate the problem on i686 (substitute SI with DI and DI with
> TI on 64bit and change shift values for the same result):
[snip]
> Compared two approaches head-to-head on i686 target; gcc -O2
> -fomit-frame-pointer generates:
> 
> test_c:
>        subl    $28, %esp       #,
>        xorl    %edx, %edx      #
>        movl    40(%esp), %eax  # c, tmp66
>        movl    44(%esp), %ecx  # d, d
>        movl    %esi, 20(%esp)  #,
>        movl    %ebx, 16(%esp)  #,
>        movl    %eax, %edx      # tmp66,
>        movl    $0, %eax        #, tmp66
>        movl    %eax, %esi      #, tmp74
>        movl    32(%esp), %eax  # a, tmp70
>        movl    %edx, %ebx      # tmp75, __c
>        orl     %ecx, %esi      # d, tmp74
>        xorl    %edx, %edx      #
>        movl    %esi, %ecx      # tmp74, __c
>        movl    36(%esp), %esi  # b, b
>        movl    %edi, 24(%esp)  #,
>        movl    24(%esp), %edi  #,
>        movl    %eax, %edx      # tmp70,
>        movl    $0, %eax        #, tmp70
>        orl     %esi, %eax      # b, tmp72
>        movl    20(%esp), %esi  #,
>        addl    %eax, %ecx      # tmp72, __c
>        adcl    %edx, %ebx      #, __c
>        movl    %ecx, 8(%esp)   # __c, y
>        movl    %ebx, %ecx      # __c, __c
>        xorl    %ebx, %ebx      # __c
>        movl    16(%esp), %ebx  #,
>        movl    %ecx, 12(%esp)  # __c, x
>        addl    $28, %esp       #,
>        ret

   You must be running into some sort of target specific problem, because
x86 16-bit -O2 -fno-omit-frame-pointer gets

test_c:
	pushw	%cx		;# 62	*pushhi1
	pushw	%di		;# 63	*pushhi1
	enterw	$6-2,	$0	;# 64	_enter
	movw	14(%bp),%ax	;# 18	*movhi/1
	movw	12(%bp),%di	;# 19	*movhi/1
	movw	10(%bp),%dx	;# 30	*movhi/1
	movw	8(%bp),	%cx	;# 31	*movhi/1
	addw	%dx,	%ax	;# 56	_addhi3_cc_for_carry/1
	movw	%di,	%dx	;# 61	*movhi/1
	adcw	%cx,	%dx	;# 57	_addhi3_carry/1
	movw	%ax,	-4(%bp)	;# 35	*movhi/2
	movw	%dx,	-2(%bp)	;# 40	*movhi/2
	leavew			;# 67	_leave
	popw	%di		;# 68	*pophi1
	popw	%cx		;# 69	*pophi1
	ret			;# 70	*return

with the obvious s/SI/HI/, S/DI/SI/, etc. changes to your testcase. While
the above could be improved a little, notice the complete absence of "or"
and "xor" instructions.

(Why do you have both "movl $0, %reg" and "xorl %reg, %reg"?)

-- 
Rask Ingemann Lambertsen
741 unexpected failures, and counting...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17  9:25         ` Uros Bizjak
  2007-05-17 11:15           ` Rask Ingemann Lambertsen
@ 2007-05-17 16:49           ` Chris Lattner
  2007-05-17 17:05             ` Eric Christopher
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Lattner @ 2007-05-17 16:49 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Paolo Bonzini, GCC Patches

> Unfortunatelly, this approach confuses gcc a lot. Attached testcase
> will illustrate the problem on i686 (substitute SI with DI and DI with
> TI on 64bit and change shift values for the same result):
> Compared two approaches head-to-head on i686 target; gcc -O2
> -fomit-frame-pointer generates:
>
> test_c:
>       subl    $28, %esp       #,
>       xorl    %edx, %edx      #
>       movl    40(%esp), %eax  # c, tmp66
>       movl    44(%esp), %ecx  # d, d
>       movl    %esi, 20(%esp)  #,
>       movl    %ebx, 16(%esp)  #,
>       movl    %eax, %edx      # tmp66,
>       movl    $0, %eax        #, tmp66
>       movl    %eax, %esi      #, tmp74
>       movl    32(%esp), %eax  # a, tmp70
>       movl    %edx, %ebx      # tmp75, __c
>       orl     %ecx, %esi      # d, tmp74
>       xorl    %edx, %edx      #
>       movl    %esi, %ecx      # tmp74, __c
>       movl    36(%esp), %esi  # b, b
>       movl    %edi, 24(%esp)  #,
>       movl    24(%esp), %edi  #,
>       movl    %eax, %edx      # tmp70,
>       movl    $0, %eax        #, tmp70
>       orl     %esi, %eax      # b, tmp72
>       movl    20(%esp), %esi  #,
>       addl    %eax, %ecx      # tmp72, __c
>       adcl    %edx, %ebx      #, __c
>       movl    %ecx, 8(%esp)   # __c, y
>       movl    %ebx, %ecx      # __c, __c
>       xorl    %ebx, %ebx      # __c
>       movl    16(%esp), %ebx  #,
>       movl    %ecx, 12(%esp)  # __c, x
>       addl    $28, %esp       #,
>       ret
>
> Looking into this mess, I think that gcc has still a long way to go
> w.r.t to DImode values on 32bit targets (and TImode values on 64bit
> targets) and the only solution for the near future is using asm
> here...

At some point, hacking around these limitations becomes less useful  
than just fixing them once and for all.  LLVM compiles this:

volatile SI x, y;
void test_asm (SI a, SI b, SI c, SI d){
add_ssaaaa_asm (x, y, a, b, c, d);
}


into this:

_test_c:
	movl 12(%esp), %eax
	movl 16(%esp), %ecx
	addl 8(%esp), %ecx
	adcl 4(%esp), %eax
	movl %ecx, _y
	movl %eax, _x
	ret

with llvm-gcc t.c -S -O -static -fomit-frame-pointer

Inline asm really is bad.  It's much better to fix the compiler so  
that users don't have to rely on it to get performance.

-Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17 16:49           ` Chris Lattner
@ 2007-05-17 17:05             ` Eric Christopher
  2007-05-18 10:23               ` Uros Bizjak
  2007-05-18 19:43               ` Uros Bizjak
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Christopher @ 2007-05-17 17:05 UTC (permalink / raw)
  To: GCC Patches; +Cc: Uros Bizjak, Paolo Bonzini, Ian Lance Taylor, Chris Lattner

>
>
> Inline asm really is bad.  It's much better to fix the compiler so  
> that users don't have to rely on it to get performance.

Agreed, Ian's patches that take advantage of lower-subreg for x86 help  
this quite a bit. He's waiting AFAIK on the df branch to finish them  
up, but from what I recall they helped this particular case.

-eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-17 11:15           ` Rask Ingemann Lambertsen
@ 2007-05-17 21:38             ` Rask Ingemann Lambertsen
  2007-05-18  6:54               ` Ian Lance Taylor
  0 siblings, 1 reply; 36+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-17 21:38 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Paolo Bonzini, Chris Lattner, GCC Patches

On Thu, May 17, 2007 at 01:15:48PM +0200, Rask Ingemann Lambertsen wrote:
 
>    You must be running into some sort of target specific problem,

   The i386 back end is splitting adddi3 after reload, which is too late.
I'm not sure that splitting before reload is early enough, but that is the
easiest change to make.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-17 21:38             ` Rask Ingemann Lambertsen
@ 2007-05-18  6:54               ` Ian Lance Taylor
  2007-05-18  7:10                 ` Eric Botcazou
  0 siblings, 1 reply; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-18  6:54 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen
  Cc: Uros Bizjak, Paolo Bonzini, Chris Lattner, GCC Patches

Rask Ingemann Lambertsen <rask@sygehus.dk> writes:

> On Thu, May 17, 2007 at 01:15:48PM +0200, Rask Ingemann Lambertsen wrote:
>  
> >    You must be running into some sort of target specific problem,
> 
>    The i386 back end is splitting adddi3 after reload, which is too late.
> I'm not sure that splitting before reload is early enough, but that is the
> easiest change to make.

With the new lower-subreg patch, insns like adddi3 should split before
reload when possible.  But you can't use the existing split in
i386.md, because it relies on FLAGS_REG, and reload may accidentally
clobber FLAGS_REG.  In my uncommitted follow-on patch to lower-subreg,
I did it like this:

(define_split
  [(set (match_operand:DI 0 "nonimmediate_operand" "")
	(plus:DI (match_operand:DI 1 "nonimmediate_operand" "")
		 (match_operand:DI 2 "general_operand" "")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT"
  [(parallel [(set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))
	      (set (match_dup 3)
		   (plus:SI (plus:SI (match_dup 4) (match_dup 5))
			    (truncate:SI
			     (lshiftrt:DI
			      (plus:DI
			       (zero_extend:DI (match_dup 1))
			       (zero_extend:DI (match_dup 2)))
			      (const_int 32)))))
	      (clobber (reg:CC FLAGS_REG))])]
{
  split_di (operands + 0, 1, operands + 0, operands + 3);
  split_di (operands + 1, 1, operands + 1, operands + 4);
  split_di (operands + 2, 1, operands + 2, operands + 5);
})

(define_insn_and_split "*adddi3_2"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,rm,r")
	(plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,0,0,0")
		 (match_operand:SI 2 "general_operand" "ri,rm,ri,rm")))
   (set (match_operand:SI 3 "nonimmediate_operand" "=rm,r,rm,r")
	(plus:SI (plus:SI (match_operand:SI 4 "general_operand" "3,3,ri,rm")
			  (match_operand:SI 5 "general_operand" "ri,rm,3,3"))
		 (truncate:SI
		  (lshiftrt:DI
		   (plus:DI
		    (zero_extend:DI (match_dup 1))
		    (zero_extend:DI (match_dup 2)))
		   (const_int 32)))))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT"
  "#"
  "&& reload_completed"
  [(parallel [(set (reg:CC FLAGS_REG) (unspec:CC [(match_dup 1) (match_dup 2)]
					  UNSPEC_ADD_CARRY))
	      (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
   (parallel [(set (match_dup 3)
		   (plus:SI (plus:SI (ltu:SI (reg:CC FLAGS_REG) (const_int 0))
				     (match_dup 4))
			    (match_dup 5)))
	      (clobber (reg:CC FLAGS_REG))])]
{
  /* We can only have one pair of commutative operands in an insn, so
     we commute operands 4 and 5 manually.  */
  if (rtx_equal_p (operands[5], operands[3]))
    {
      rtx tmp = operands[4];
      operands[4] = operands[5];
      operands[5] = tmp;
    }
})

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-18  6:54               ` Ian Lance Taylor
@ 2007-05-18  7:10                 ` Eric Botcazou
  2007-05-18 16:48                   ` Ian Lance Taylor
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Botcazou @ 2007-05-18  7:10 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: gcc-patches, Rask Ingemann Lambertsen, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

> With the new lower-subreg patch, insns like adddi3 should split before
> reload when possible.

If you haven't already done it, could you sketch the modifications that should 
be made to a 32-bit back-end, now that the lower-subreg patch is integrated?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17 17:05             ` Eric Christopher
@ 2007-05-18 10:23               ` Uros Bizjak
  2007-05-18 18:31                 ` Chris Lattner
  2007-05-18 19:43               ` Uros Bizjak
  1 sibling, 1 reply; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 10:23 UTC (permalink / raw)
  To: Eric Christopher
  Cc: GCC Patches, Paolo Bonzini, Ian Lance Taylor, Chris Lattner

On 5/17/07, Eric Christopher <echristo@apple.com> wrote:

> > Inline asm really is bad.  It's much better to fix the compiler so
> > that users don't have to rely on it to get performance.
>
> Agreed, Ian's patches that take advantage of lower-subreg for x86 help
> this quite a bit. He's waiting AFAIK on the df branch to finish them
> up, but from what I recall they helped this particular case.

OK, I agree with that. However, I'd like too propose the path to this
final solution:

- for now, we implement asm solution, as it optimizes library code
_now_ and follows what master copy of longlong.h does. Also, there is
no simple solution for 128bit "mul" and "div" insns, which are of most
interest here.
- an enhancement PR is opened with the comments from this thread (also
linking to Ian's follow-on patch), so we won't forget this issue.
- once the generated code is equal or better then asm, we change
longlong.h, probably for both, i386 and x86_64.

My opinion is that asm solution is currently "good enough" for current
uses (libgcc)  and code inspection confirms this. It is not "the best
solution" but it is definitelly better than current state. For the
record, 3 "imul" insns are substituted by just one "mul" insns.

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-18  7:10                 ` Eric Botcazou
@ 2007-05-18 16:48                   ` Ian Lance Taylor
  2007-05-18 20:37                     ` Eric Botcazou
  2007-05-22 17:27                     ` Roman Zippel
  0 siblings, 2 replies; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-18 16:48 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Rask Ingemann Lambertsen, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

Eric Botcazou <ebotcazou@libertysurf.fr> writes:

> > With the new lower-subreg patch, insns like adddi3 should split before
> > reload when possible.
> 
> If you haven't already done it, could you sketch the modifications that should 
> be made to a 32-bit back-end, now that the lower-subreg patch is integrated?

First I'll say that the lower-subreg patch is not yet completely
integrated.  There is another part of it, which I have working, but
which I'm holding in the hopes that it will work better in the DF
framework.  That is the part which tracks subreg lifetimes during
register allocation.

From the backend perspective, the best way to take advantage of
lower-subreg is conceptually simple if sometimes awkward in practice:
for multi-word operations, use define_insn_and_split and split the
insns before reload.

For CC0-style machines, in which reload may clobber the flags
register, some multi-word operations will need two
define_insn_and_splits.  The first split should run before reload, and
should generate one or more insns which express the computation
entirely in terms of register-sized subregs.  The second split should
run after reload and should generate the actual instructions which use
the flags register.

The idea is that the first split into subregs will permit the
lower-subreg pass and the register allocator to allocate the values
involved as separate single registers rather than as a single
multi-word register.  The second split, which will be similar to
existing code, will then generate the real machine instructions for
the second scheduling pass.

For a non-CC0 style machine things are simpler.  Just split multi-word
operations before reload.  There was never any particular need to wait
until after reload to split them in the past.  And now you really
don't want to wait.

For clarity, the relevant distinction between a CC0 machine and a
non-CC0 machine is whether reload is going to generate loads and
stores which clobber the flags register.  On a non-CC0 machine the
flags register is not set by memory moves.  On a CC0 machine it is.

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 10:23               ` Uros Bizjak
@ 2007-05-18 18:31                 ` Chris Lattner
  2007-05-18 19:58                   ` Uros Bizjak
  2007-05-18 20:07                   ` Uros Bizjak
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Lattner @ 2007-05-18 18:31 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor


On May 18, 2007, at 3:22 AM, Uros Bizjak wrote:

> On 5/17/07, Eric Christopher <echristo@apple.com> wrote:
>
>> > Inline asm really is bad.  It's much better to fix the compiler so
>> > that users don't have to rely on it to get performance.
>>
>> Agreed, Ian's patches that take advantage of lower-subreg for x86  
>> help
>> this quite a bit. He's waiting AFAIK on the df branch to finish them
>> up, but from what I recall they helped this particular case.
>
> OK, I agree with that. However, I'd like too propose the path to this
> final solution:
>
> - for now, we implement asm solution, as it optimizes library code
> - an enhancement PR is opened with the comments from this thread (also
> linking to Ian's follow-on patch), so we won't forget this issue.
> - once the generated code is equal or better then asm, we change
> longlong.h, probably for both, i386 and x86_64.

This all sounds fine to me.

> _now_ and follows what master copy of longlong.h does. Also, there is
> no simple solution for 128bit "mul" and "div" insns, which are of most
> interest here.

What is the issue here?

-Chris



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-17 17:05             ` Eric Christopher
  2007-05-18 10:23               ` Uros Bizjak
@ 2007-05-18 19:43               ` Uros Bizjak
  1 sibling, 0 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 19:43 UTC (permalink / raw)
  To: Eric Christopher
  Cc: GCC Patches, Paolo Bonzini, Ian Lance Taylor, Chris Lattner

Eric Christopher wrote:

>> Inline asm really is bad.  It's much better to fix the compiler so 
>> that users don't have to rely on it to get performance.
>
> Agreed, Ian's patches that take advantage of lower-subreg for x86 help 
> this quite a bit. He's waiting AFAIK on the df branch to finish them 
> up, but from what I recall they helped this particular case.
Here are some representative x86_64 code sizes with and without a patch 
to longlong.h (these are actually TImode and TFmode functions):

(+++ are patched, --- are unpatched libgcc.a sizes)

-    238              0       0     238      ee _muldi3.o (ex libgcc.a)
+    138              0       0     138      8a _muldi3.o (ex libgcc.a)

-   1529              0       0    1529     5f9 _divdi3.o (ex libgcc.a)
-   1650              0       0    1650     672 _moddi3.o (ex libgcc.a)
-   1322              0       0    1322     52a _udivdi3.o (ex libgcc.a)
-   1418              0       0    1418     58a _umoddi3.o (ex libgcc.a)
+    541              0       0     541     21d _divdi3.o (ex libgcc.a)
+    650              0       0     650     28a _moddi3.o (ex libgcc.a)
+    358              0       0     358     166 _udivdi3.o (ex libgcc.a)
+    445              0       0     445     1bd _umoddi3.o (ex libgcc.a)

-   1669              0       0    1669     685 _udivmoddi4.o (ex libgcc.a)
+    610              0       0     610     262 _udivmoddi4.o (ex libgcc.a)

-   3030              0       0    3030     bd6 divtf3.o (ex libgcc.a)
+   2604              0       0    2604     a2c divtf3.o (ex libgcc.a)

-   2772              0       0    2772     ad4 multf3.o (ex libgcc.a)
+   2508              0       0    2508     9cc multf3.o (ex libgcc.a)

current divdi/moddi code looks quite pathological...

I think that this clears the way for the patch to be committed to 
mainline, even as an interim solution.

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 18:31                 ` Chris Lattner
@ 2007-05-18 19:58                   ` Uros Bizjak
  2007-05-18 20:15                     ` Chris Lattner
  2007-05-18 20:07                   ` Uros Bizjak
  1 sibling, 1 reply; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 19:58 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor

Chris Lattner wrote:

>> _now_ and follows what master copy of longlong.h does. Also, there is
>> no simple solution for 128bit "mul" and "div" insns, which are of most
>> interest here.
>
> What is the issue here?
unsigned long long int t(unsigned int a, unsigned int b)
{
        return a * b;
}

doesn't compile to "mul" as one would expect (but to "imul"):

gcc -O2 -m32 -fomit-frame-pointer:

t:
        movl    8(%esp), %eax
        xorl    %edx, %edx
        imull   4(%esp), %eax
        ret

Uros.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 18:31                 ` Chris Lattner
  2007-05-18 19:58                   ` Uros Bizjak
@ 2007-05-18 20:07                   ` Uros Bizjak
  1 sibling, 0 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 20:07 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor

Chris Lattner wrote:

>> OK, I agree with that. However, I'd like too propose the path to this
>> final solution:
>>
>> - for now, we implement asm solution, as it optimizes library code
>> - an enhancement PR is opened with the comments from this thread (also
>> linking to Ian's follow-on patch), so we won't forget this issue.
>> - once the generated code is equal or better then asm, we change
>> longlong.h, probably for both, i386 and x86_64.
>
> This all sounds fine to me.
The bugreport is at PR target/31985.

(BTW: It would be nice if someone could confirm it, so it won't be 
self-confirmed ;)

Uros.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 19:58                   ` Uros Bizjak
@ 2007-05-18 20:15                     ` Chris Lattner
  2007-05-18 21:17                       ` Uros Bizjak
  2007-05-18 21:49                       ` Uros Bizjak
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Lattner @ 2007-05-18 20:15 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor


On May 18, 2007, at 12:58 PM, Uros Bizjak wrote:

> Chris Lattner wrote:
>
>>> _now_ and follows what master copy of longlong.h does. Also,  
>>> there is
>>> no simple solution for 128bit "mul" and "div" insns, which are of  
>>> most
>>> interest here.
>>
>> What is the issue here?
> unsigned long long int t(unsigned int a, unsigned int b)
> {
>        return a * b;
> }
>
> doesn't compile to "mul" as one would expect (but to "imul"):
>
> gcc -O2 -m32 -fomit-frame-pointer:
>
> t:
>        movl    8(%esp), %eax
>        xorl    %edx, %edx
>        imull   4(%esp), %eax
>        ret

Don't you want:

unsigned long long int t2(unsigned int a, unsigned int b)
{
        return (unsigned long long)a * b;
}

which produces (gcc 4.0.1):

_t2:
         movl    8(%esp), %eax
         mull    4(%esp)
         ret

?

-Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-18 16:48                   ` Ian Lance Taylor
@ 2007-05-18 20:37                     ` Eric Botcazou
  2007-05-18 22:32                       ` Ian Lance Taylor
  2007-05-22 17:27                     ` Roman Zippel
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Botcazou @ 2007-05-18 20:37 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: gcc-patches, Rask Ingemann Lambertsen, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

> From the backend perspective, the best way to take advantage of
> lower-subreg is conceptually simple if sometimes awkward in practice:
> for multi-word operations, use define_insn_and_split and split the
> insns before reload.

OK, I see, thanks.

> For a non-CC0 style machine things are simpler.  Just split multi-word
> operations before reload.  There was never any particular need to wait
> until after reload to split them in the past.  And now you really
> don't want to wait.

Of course SPARC-V8 is the perfect counter-example since you have a pairing 
instruction "ldd" for consecutive integer registers. :-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 20:15                     ` Chris Lattner
@ 2007-05-18 21:17                       ` Uros Bizjak
  2007-05-18 21:49                       ` Uros Bizjak
  1 sibling, 0 replies; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 21:17 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor

Chris Lattner wrote:

> Don't you want:
>
> unsigned long long int t2(unsigned int a, unsigned int b)
> {
>        return (unsigned long long)a * b;
> }

Ouch... yes, indeed.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 20:15                     ` Chris Lattner
  2007-05-18 21:17                       ` Uros Bizjak
@ 2007-05-18 21:49                       ` Uros Bizjak
  2007-05-18 22:49                         ` Chris Lattner
  1 sibling, 1 reply; 36+ messages in thread
From: Uros Bizjak @ 2007-05-18 21:49 UTC (permalink / raw)
  To: Chris Lattner
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor

Chris Lattner wrote:

> Don't you want:
>
> unsigned long long int t2(unsigned int a, unsigned int b)
> {
>        return (unsigned long long)a * b;
> }

Please find a testcase for muls below, implemented in asm and C. Again, 
a couple of strange moves are present at the end of a function, 
implemented in pure C.

--cut here--
typedef unsigned SI __attribute__ ((mode (SI)));
typedef unsigned DI __attribute__ ((mode (DI)));

#define umul_ppmm_c(w1, w0, u, v)               \
  {                                             \
    DI __c = (DI) u * v;                        \
                                                \
    (w0) = (SI) (__c & 0xffffffff);             \
    (w1) = (SI) (__c >> 32);                    \
  }

#define umul_ppmm_asm(w1, w0, u, v)                                     \
  __asm__ ("mull %3"                                                    \
           : "=a" ((SI) (w0)),                                          \
           "=d" ((SI) (w1))                                             \
           : "%0" ((SI) (u)),                                           \
           "rm" ((SI) (v)))

void test_c (SI a, SI b)
{
 volatile SI x, y;

 umul_ppmm_c (x, y, a, b);
}

void test_asm (SI a, SI b)
{
 volatile SI x, y;

 umul_ppmm_asm (x, y, a, b);
}
--cut here--

gcc -O2 -fomit-frame-pointer:

test_c:
        subl    $16, %esp
        movl    24(%esp), %eax
        mull    20(%esp)
        movl    %eax, 8(%esp)
        movl    %edx, %eax
        xorl    %edx, %edx
        movl    %eax, 12(%esp)
        addl    $16, %esp
        ret

test_asm:
        subl    $16, %esp
        movl    20(%esp), %eax
#APP
        mull 24(%esp)
#NO_APP
        movl    %eax, 8(%esp)
        movl    %edx, 12(%esp)
        addl    $16, %esp
        ret

So even for wide multiply, it is better to stay with asm implementation ATM.

Uros.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-18 20:37                     ` Eric Botcazou
@ 2007-05-18 22:32                       ` Ian Lance Taylor
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-18 22:32 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: gcc-patches, Rask Ingemann Lambertsen, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

Eric Botcazou <ebotcazou@libertysurf.fr> writes:

> > For a non-CC0 style machine things are simpler.  Just split multi-word
> > operations before reload.  There was never any particular need to wait
> > until after reload to split them in the past.  And now you really
> > don't want to wait.
> 
> Of course SPARC-V8 is the perfect counter-example since you have a pairing 
> instruction "ldd" for consecutive integer registers. :-)

Yes, good point, you probably don't want to split a DImode load or
store there.  You can still split everything else, though, and the
right thing should happen.  Unfortunately this means that if one of
the halves of the load is unused, we won't convert back to an SImode
load.

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-18 21:49                       ` Uros Bizjak
@ 2007-05-18 22:49                         ` Chris Lattner
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Lattner @ 2007-05-18 22:49 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Eric Christopher, GCC Patches, Paolo Bonzini, Ian Lance Taylor


On May 18, 2007, at 2:48 PM, Uros Bizjak wrote:

> Chris Lattner wrote:
>
>> Don't you want:
>>
>> unsigned long long int t2(unsigned int a, unsigned int b)
>> {
>>        return (unsigned long long)a * b;
>> }
>
> Please find a testcase for muls below, implemented in asm and C.  
> Again, a couple of strange moves are present at the end of a  
> function, implemented in pure C.
>
> --cut here--
> typedef unsigned SI __attribute__ ((mode (SI)));
> typedef unsigned DI __attribute__ ((mode (DI)));
>
> #define umul_ppmm_c(w1, w0, u, v)               \
>  {                                             \
>    DI __c = (DI) u * v;                        \
>                                                \
>    (w0) = (SI) (__c & 0xffffffff);             \
>    (w1) = (SI) (__c >> 32);                    \
>  }
>
> test_c:
>        subl    $16, %esp
>        movl    24(%esp), %eax
>        mull    20(%esp)
>        movl    %eax, 8(%esp)
>        movl    %edx, %eax
>        xorl    %edx, %edx
>        movl    %eax, 12(%esp)
>        addl    $16, %esp
>        ret
>
> So even for wide multiply, it is better to stay with asm  
> implementation ATM.

Yuck.  Please file a bugzilla.

-Chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit   operations
  2007-05-18 16:48                   ` Ian Lance Taylor
  2007-05-18 20:37                     ` Eric Botcazou
@ 2007-05-22 17:27                     ` Roman Zippel
  2007-05-22 17:36                       ` Ian Lance Taylor
  2007-05-22 22:19                       ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 36+ messages in thread
From: Roman Zippel @ 2007-05-22 17:27 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Eric Botcazou, gcc-patches, Rask Ingemann Lambertsen,
	Uros Bizjak, Paolo Bonzini, Chris Lattner

Hi,

On Fri, 18 May 2007, Ian Lance Taylor wrote:

> >From the backend perspective, the best way to take advantage of
> lower-subreg is conceptually simple if sometimes awkward in practice:
> for multi-word operations, use define_insn_and_split and split the
> insns before reload.

I'm playing around with it a little on m68k and there is sometimes a bit 
of a conflict. E.g. I would split somthing like adddi3/subdi3 already 
before combine, because the addx/subx instruction is much more limited in 
its addressing mode (only accepts register), so combine can generate the 
right instructions.
The problem is now that other optimizations can't do very much with these 
instructions, for example something like this:

	unsigned long long x, y;
	return x + (y >> 63);

is better handled as:

	unsigned long long x, y;
	return x - ((long long)y >> 63);

It's not a very common thing, so it wouldn't be a big loss, but something 
like this is currently part of the m68k description (and something very 
early already converts the second form into the first form...).
The point is that as soon as we expose the two seperate adds, 
transformations like above which require to change both adds to subs 
pretty much become impossible. The only alternative I can think of is to 
have a port specific tree pass and change this before the RTL is 
generated.

In this context I'm curious about another problem, how should it work in 
general (i.e. independent of cc0) to combine a subdi/cmpdi or subdi/tstdi 
pattern?

> For clarity, the relevant distinction between a CC0 machine and a
> non-CC0 machine is whether reload is going to generate loads and
> stores which clobber the flags register.  On a non-CC0 machine the
> flags register is not set by memory moves.  On a CC0 machine it is.

BTW I was thinking about this a little (possible conversion of m68k from 
cc0 in general and this reload problem specifically). 
I don't really like the idea of hiding cc0 until after reload, as it 
prevents many optimizations (e.g. smi is heavily used for sign extensions, 
but it needs the cc flags from the previous instructions, so one would 
need many patterns to combine a <op>/set/smi sequence).

I think the reload problem may not be that bad, while a move instruction 
sets the cc register, it often just sets it in the same way as the 
previous instructions, especially the Z and M flag, which is enough for 
the common test against zero. For these tests the move wouldn't really 
clobber the flags, but just recreate them.
A bit more difficult is a sub/cmp combination, here move would really 
clobber flags, but luckily on m68k the sub instruction is quite flexible 
and e.g. also accepts memory arguments as destination, so it would be 
possible to convert a output reload into one or two input reloads.
So what basically would be needed is a port specific check in reload as to 
whether an output reload is acceptable and possibly reject the 
alternative and try it with another one.
That's currently the idea I would favour at least for m68k, does that 
sound reasonable (or at least understandable :) ).

bye, Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-22 17:27                     ` Roman Zippel
@ 2007-05-22 17:36                       ` Ian Lance Taylor
  2007-05-22 22:27                         ` Rask Ingemann Lambertsen
  2007-05-23 17:27                         ` Roman Zippel
  2007-05-22 22:19                       ` Rask Ingemann Lambertsen
  1 sibling, 2 replies; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-22 17:36 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Eric Botcazou, gcc-patches, Rask Ingemann Lambertsen,
	Uros Bizjak, Paolo Bonzini, Chris Lattner

Roman Zippel <zippel@linux-m68k.org> writes:

> I'm playing around with it a little on m68k and there is sometimes a bit 
> of a conflict. E.g. I would split somthing like adddi3/subdi3 already 
> before combine, because the addx/subx instruction is much more limited in 
> its addressing mode (only accepts register), so combine can generate the 
> right instructions.

That's right.  That's why I recommend a define_insn_and_split.  The
insn will be complete for most of the RTL optimization passes,
including combine.  It will then be split before the second lower
subreg pass and before register allocation.

Of course, that won't help your case, because you want to split before
combine for other reasons.


> I think the reload problem may not be that bad, while a move instruction 
> sets the cc register, it often just sets it in the same way as the 
> previous instructions, especially the Z and M flag, which is enough for 
> the common test against zero. For these tests the move wouldn't really 
> clobber the flags, but just recreate them.

It's unfortunately not that simple.  In some cases reload will need to
load some part of an address.  When it does that the flags will no
longer have any relation to what they are supposed to have.

> A bit more difficult is a sub/cmp combination, here move would really 
> clobber flags, but luckily on m68k the sub instruction is quite flexible 
> and e.g. also accepts memory arguments as destination, so it would be 
> possible to convert a output reload into one or two input reloads.
> So what basically would be needed is a port specific check in reload as to 
> whether an output reload is acceptable and possibly reject the 
> alternative and try it with another one.
> That's currently the idea I would favour at least for m68k, does that 
> sound reasonable (or at least understandable :) ).

That sounds unreasonable.  reload is already very complicated, too
complicated, and there isn't going to be any way to drop in something
like that without making it significantly more complicated.

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-22 17:27                     ` Roman Zippel
  2007-05-22 17:36                       ` Ian Lance Taylor
@ 2007-05-22 22:19                       ` Rask Ingemann Lambertsen
  1 sibling, 0 replies; 36+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-22 22:19 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ian Lance Taylor, Eric Botcazou, gcc-patches, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

On Tue, May 22, 2007 at 07:26:50PM +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 18 May 2007, Ian Lance Taylor wrote:
> 
> > >From the backend perspective, the best way to take advantage of
> > lower-subreg is conceptually simple if sometimes awkward in practice:
> > for multi-word operations, use define_insn_and_split and split the
> > insns before reload.
> 
> I'm playing around with it a little on m68k and there is sometimes a bit 
> of a conflict. E.g. I would split somthing like adddi3/subdi3 already 
> before combine, because the addx/subx instruction is much more limited in 
> its addressing mode (only accepts register), so combine can generate the 
> right instructions.

   It can be done:

1) Define predicates and constraints which match the available addressing
modes.
2) Define an insn(_and_split) which uses those predicates and constraints.
3) Define an insn similiar to the one in 2), but where the (mem ...)
expression is not an operand. Instead, you have something like

	... (mem:SI (match_operand:SI "register_operand" "a"))

4) Split the insn in 2) to the one in 3) before reload.
5) Put the insn in 3) before the one in 2) in the .md file.

   I think that since the address is so simple in this case, you can skip
part 1), 2), 4) and 5). Just be careful that you don't use the template in
3) as an output template, or you'll lose the alias set.

(The danger of losing the alias set really ought to be documented.)

> 	unsigned long long x, y;
> 	return x + (y >> 63);
> 
> is better handled as:
> 
> 	unsigned long long x, y;
> 	return x - ((long long)y >> 63);

   Interesting optimization when signed shifts are faster than unsigned
ones.

> In this context I'm curious about another problem, how should it work in 
> general (i.e. independent of cc0) to combine a subdi/cmpdi or subdi/tstdi 
> pattern?

   Combine will do this automatically if you define the insn to do so. In
this case, it will just be a define_insn_and_split instead of the usual
define_insn, but that makes no difference to combine. If your question is
how to write the combined patterns in general, see
<URL:http://gcc.gnu.org/ml/gcc/2004-02/msg00903.html>.

   You will have to define a CCmode for the multi-word instructions since
the zero flag won't be usable. You'll therefore also have to delay the
output of the comparison instruction until you know the comparison operator.
This is documented in the GCC Internals manual.

   (Also check that your overflow flag is set to something useful.)

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-22 17:36                       ` Ian Lance Taylor
@ 2007-05-22 22:27                         ` Rask Ingemann Lambertsen
  2007-05-22 22:32                           ` Ian Lance Taylor
  2007-05-23 17:27                         ` Roman Zippel
  1 sibling, 1 reply; 36+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-22 22:27 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Roman Zippel, Eric Botcazou, gcc-patches, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

On Tue, May 22, 2007 at 10:35:25AM -0700, Ian Lance Taylor wrote:
> Roman Zippel <zippel@linux-m68k.org> writes:
> 
> > I think the reload problem may not be that bad, while a move instruction 
> > sets the cc register, it often just sets it in the same way as the 
> > previous instructions, especially the Z and M flag, which is enough for 
> > the common test against zero. For these tests the move wouldn't really 
> > clobber the flags, but just recreate them.
> 
> It's unfortunately not that simple.  In some cases reload will need to
> load some part of an address.  When it does that the flags will no
> longer have any relation to what they are supposed to have.

   On the m68k, instructions with an address register destination don't set
the flags. It might actually work.

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations
  2007-05-22 22:27                         ` Rask Ingemann Lambertsen
@ 2007-05-22 22:32                           ` Ian Lance Taylor
  2007-05-24 20:00                             ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-22 22:32 UTC (permalink / raw)
  To: Rask Ingemann Lambertsen
  Cc: Roman Zippel, Eric Botcazou, gcc-patches, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

Rask Ingemann Lambertsen <rask@sygehus.dk> writes:

> On Tue, May 22, 2007 at 10:35:25AM -0700, Ian Lance Taylor wrote:
> > Roman Zippel <zippel@linux-m68k.org> writes:
> > 
> > > I think the reload problem may not be that bad, while a move instruction 
> > > sets the cc register, it often just sets it in the same way as the 
> > > previous instructions, especially the Z and M flag, which is enough for 
> > > the common test against zero. For these tests the move wouldn't really 
> > > clobber the flags, but just recreate them.
> > 
> > It's unfortunately not that simple.  In some cases reload will need to
> > load some part of an address.  When it does that the flags will no
> > longer have any relation to what they are supposed to have.
> 
>    On the m68k, instructions with an address register destination don't set
> the flags. It might actually work.

That is true, but addresses can include data registers, as in
%a0@(%d0) (aka 0(a0,d0) ).

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit   operations
  2007-05-22 17:36                       ` Ian Lance Taylor
  2007-05-22 22:27                         ` Rask Ingemann Lambertsen
@ 2007-05-23 17:27                         ` Roman Zippel
  2007-05-23 17:48                           ` Ian Lance Taylor
  1 sibling, 1 reply; 36+ messages in thread
From: Roman Zippel @ 2007-05-23 17:27 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Eric Botcazou, gcc-patches, Rask Ingemann Lambertsen,
	Uros Bizjak, Paolo Bonzini, Chris Lattner

Hi,

On Tue, 22 May 2007, Ian Lance Taylor wrote:

> Roman Zippel <zippel@linux-m68k.org> writes:
> 
> > I'm playing around with it a little on m68k and there is sometimes a bit 
> > of a conflict. E.g. I would split somthing like adddi3/subdi3 already 
> > before combine, because the addx/subx instruction is much more limited in 
> > its addressing mode (only accepts register), so combine can generate the 
> > right instructions.
> 
> That's right.  That's why I recommend a define_insn_and_split.  The
> insn will be complete for most of the RTL optimization passes,
> including combine.  It will then be split before the second lower
> subreg pass and before register allocation.

Somehow I think this can't be everything. Let's take a different example, 
which is also relevant for other ports: x + (y << 32). Here one has a 
partial constant argument, but one doesn't really know it until after 
splitting the shift. This could be optimized to a simple move/add (but 
currently isn't).
With a post-combine-split one could now generate a combined add/shift 
pattern which could be splitted into the simpler operations, but this is 
too late for combine. A pre-combine-split is also problematic, as the 
information that the carry flag is zero, can't get to the next 
instruction, as it's usually hidden in a rather complex parallel pattern.

To handle this one could either rerun combine after split (which may 
be done more than once) or we need a combined combine/split pass (which 
selectively just combines the newly created instructions with other 
instructions).

> Of course, that won't help your case, because you want to split before
> combine for other reasons.

There are more reasons, e.g. splitting post_inc/pre_dec operations later 
is rather painful, if would be so much easier if one only had to deal with 
simple memory operands and leave the bulk work to combine.

> > I think the reload problem may not be that bad, while a move instruction 
> > sets the cc register, it often just sets it in the same way as the 
> > previous instructions, especially the Z and M flag, which is enough for 
> > the common test against zero. For these tests the move wouldn't really 
> > clobber the flags, but just recreate them.
> 
> It's unfortunately not that simple.  In some cases reload will need to
> load some part of an address.  When it does that the flags will no
> longer have any relation to what they are supposed to have.

This requires an output reload, doesn't it? If we only have single output, 
it would mean the final move would recreate the right flags.
For floating point values this would be even easier, the post processing 
for setting the condition codes is always the same and move to memory 
doesn't even change them.

> > A bit more difficult is a sub/cmp combination, here move would really 
> > clobber flags, but luckily on m68k the sub instruction is quite flexible 
> > and e.g. also accepts memory arguments as destination, so it would be 
> > possible to convert a output reload into one or two input reloads.
> > So what basically would be needed is a port specific check in reload as to 
> > whether an output reload is acceptable and possibly reject the 
> > alternative and try it with another one.
> > That's currently the idea I would favour at least for m68k, does that 
> > sound reasonable (or at least understandable :) ).
> 
> That sounds unreasonable.  reload is already very complicated, too
> complicated, and there isn't going to be any way to drop in something
> like that without making it significantly more complicated.

Considering the alternatives I don't want dismiss this option too quickly.
What is suggested in the wiki would only trade one problem with another, 
cc0 might be gone this way, but one also had none of the benefits and 
needs a lot of extra work to get anywhere near where it was before.
The problem _is_ reload, so I'd prefer to look at the real problem instead 
of wasting a lot of effort to work around it.
Why are you so sure that there is not "any way"? Yes, reload is 
complicated, but I don't think these changes would make it significantly 
more complicated than it already is and I'm pretty sure that it will be 
less complicated than any of the alternatives.

BTW there is a move instruction, which doesn't modify the flags - movem,
but I would really prefer to only use it as last effort.

bye, Roman

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-23 17:27                         ` Roman Zippel
@ 2007-05-23 17:48                           ` Ian Lance Taylor
  0 siblings, 0 replies; 36+ messages in thread
From: Ian Lance Taylor @ 2007-05-23 17:48 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Eric Botcazou, gcc-patches, Rask Ingemann Lambertsen,
	Uros Bizjak, Paolo Bonzini, Chris Lattner

Roman Zippel <zippel@linux-m68k.org> writes:

> To handle this one could either rerun combine after split (which may 
> be done more than once) or we need a combined combine/split pass (which 
> selectively just combines the newly created instructions with other 
> instructions).

I agree that we don't handle all cases at present.


> > > I think the reload problem may not be that bad, while a move instruction 
> > > sets the cc register, it often just sets it in the same way as the 
> > > previous instructions, especially the Z and M flag, which is enough for 
> > > the common test against zero. For these tests the move wouldn't really 
> > > clobber the flags, but just recreate them.
> > 
> > It's unfortunately not that simple.  In some cases reload will need to
> > load some part of an address.  When it does that the flags will no
> > longer have any relation to what they are supposed to have.
> 
> This requires an output reload, doesn't it? If we only have single output, 
> it would mean the final move would recreate the right flags.
> For floating point values this would be even easier, the post processing 
> for setting the condition codes is always the same and move to memory 
> doesn't even change them.

No, reloading an address can be for either an input or an output.  It
will typically but not exclusively be expressed as
RELOAD_FOR_INPUT_ADDRESS or RELOAD_FOR_OUTPUT_ADDRESS.


> > > A bit more difficult is a sub/cmp combination, here move would really 
> > > clobber flags, but luckily on m68k the sub instruction is quite flexible 
> > > and e.g. also accepts memory arguments as destination, so it would be 
> > > possible to convert a output reload into one or two input reloads.
> > > So what basically would be needed is a port specific check in reload as to 
> > > whether an output reload is acceptable and possibly reject the 
> > > alternative and try it with another one.
> > > That's currently the idea I would favour at least for m68k, does that 
> > > sound reasonable (or at least understandable :) ).
> > 
> > That sounds unreasonable.  reload is already very complicated, too
> > complicated, and there isn't going to be any way to drop in something
> > like that without making it significantly more complicated.
> 
> Considering the alternatives I don't want dismiss this option too quickly.
> What is suggested in the wiki would only trade one problem with another, 
> cc0 might be gone this way, but one also had none of the benefits and 
> needs a lot of extra work to get anywhere near where it was before.
> The problem _is_ reload, so I'd prefer to look at the real problem instead 
> of wasting a lot of effort to work around it.
> Why are you so sure that there is not "any way"? Yes, reload is 
> complicated, but I don't think these changes would make it significantly 
> more complicated than it already is and I'm pretty sure that it will be 
> less complicated than any of the alternatives.

Certainly the problem is reload and the answer is to eliminate it.
That is extremely hard but will solve the problem for good.

If you want to look at target specific routines to guide instruction
selection in reload, that is your choice.  I do not think it would be
a wise way to approach this problem.

Ian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, x86_64]: Provide longlong.h definitions for 128bit  operations
  2007-05-22 22:32                           ` Ian Lance Taylor
@ 2007-05-24 20:00                             ` Rask Ingemann Lambertsen
  0 siblings, 0 replies; 36+ messages in thread
From: Rask Ingemann Lambertsen @ 2007-05-24 20:00 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Roman Zippel, Eric Botcazou, gcc-patches, Uros Bizjak,
	Paolo Bonzini, Chris Lattner

On Tue, May 22, 2007 at 03:31:36PM -0700, Ian Lance Taylor wrote:
> Rask Ingemann Lambertsen <rask@sygehus.dk> writes:
>
> >    On the m68k, instructions with an address register destination don't set
> > the flags. It might actually work.
> 
> That is true, but addresses can include data registers, as in
> %a0@(%d0) (aka 0(a0,d0) ).

   Yes, but you can tell reload not to use data registers in addresses.
IIRC, wherever disp(An,Dn) is valid, so is disp(An,Am).

-- 
Rask Ingemann Lambertsen

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2007-05-24 19:56 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-16 20:25 [PATCH, x86_64]: Provide longlong.h definitions for 128bit operations Uros Bizjak
2007-05-16 20:34 ` Chris Lattner
2007-05-17  5:57   ` Uros Bizjak
2007-05-17  6:09     ` Eric Christopher
2007-05-17  6:12       ` Andrew Pinski
2007-05-17  6:22         ` Uros Bizjak
2007-05-17 11:13         ` Uros Bizjak
2007-05-17  6:43     ` Chris Lattner
2007-05-17  7:22       ` Paolo Bonzini
2007-05-17  9:25         ` Uros Bizjak
2007-05-17 11:15           ` Rask Ingemann Lambertsen
2007-05-17 21:38             ` Rask Ingemann Lambertsen
2007-05-18  6:54               ` Ian Lance Taylor
2007-05-18  7:10                 ` Eric Botcazou
2007-05-18 16:48                   ` Ian Lance Taylor
2007-05-18 20:37                     ` Eric Botcazou
2007-05-18 22:32                       ` Ian Lance Taylor
2007-05-22 17:27                     ` Roman Zippel
2007-05-22 17:36                       ` Ian Lance Taylor
2007-05-22 22:27                         ` Rask Ingemann Lambertsen
2007-05-22 22:32                           ` Ian Lance Taylor
2007-05-24 20:00                             ` Rask Ingemann Lambertsen
2007-05-23 17:27                         ` Roman Zippel
2007-05-23 17:48                           ` Ian Lance Taylor
2007-05-22 22:19                       ` Rask Ingemann Lambertsen
2007-05-17 16:49           ` Chris Lattner
2007-05-17 17:05             ` Eric Christopher
2007-05-18 10:23               ` Uros Bizjak
2007-05-18 18:31                 ` Chris Lattner
2007-05-18 19:58                   ` Uros Bizjak
2007-05-18 20:15                     ` Chris Lattner
2007-05-18 21:17                       ` Uros Bizjak
2007-05-18 21:49                       ` Uros Bizjak
2007-05-18 22:49                         ` Chris Lattner
2007-05-18 20:07                   ` Uros Bizjak
2007-05-18 19:43               ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).