From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16266 invoked by alias); 9 Oct 2007 16:53:46 -0000 Received: (qmail 15866 invoked by uid 48); 9 Oct 2007 16:53:35 -0000 Date: Tue, 09 Oct 2007 16:53:00 -0000 Subject: [Bug rtl-optimization/33717] New: slow code generated for 64-bit arithmetic X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "felix-gcc at fefe dot de" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-10/txt/msg00815.txt.bz2 gcc generates very poor code on some bignum code I wrote. I put the sample code to http://dl.fefe.de/bignum-add.c for you to look at. The crucial loop is this (x, y and z are arrays of unsigned int). for (i=0; i<100; ++i) { l += (unsigned long long)x[i] + y[i]; z[i]=l; l>>=32; } gcc code (-O3 -march=athlon64): movl -820(%ebp,%esi,4), %eax movl -420(%ebp,%esi,4), %ecx xorl %edx, %edx xorl %ebx, %ebx addl %ecx, %eax adcl %ebx, %edx addl -1224(%ebp), %eax adcl -1220(%ebp), %edx movl %eax, -4(%edi,%esi,4) incl %esi movl %edx, %eax xorl %edx, %edx cmpl $101, %esi movl %eax, -1224(%ebp) movl %edx, -1220(%ebp) jne .L4 As you can see, gcc keeps the long long accumulator in memory. icc keeps it in registers instead: movl 4(%esp,%edx,4), %eax #25.30 xorl %ebx, %ebx #25.5 addl 404(%esp,%edx,4), %eax #25.5 adcl $0, %ebx #25.5 addl %esi, %eax #25.37 movl %ebx, %esi #25.37 adcl $0, %esi #25.37 movl %eax, 804(%esp,%edx,4) #26.5 addl $1, %edx #24.22 cmpl $100, %edx #24.15 jb ..B1.4 # Prob 99% #24.15 The difference is staggering: 2000 cycles for gcc, 1000 for icc. This only happens on x86, btw. On amd64 there are enough registers, so gcc and icc are closer (840 vs 924, icc still generates better code here). Still: both compilers could generate even better code. I put some inline asm in the file for comparison, which could be improved further by loop unrolling. -- Summary: slow code generated for 64-bit arithmetic Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: felix-gcc at fefe dot de GCC build triplet: i386-pc-linux-gnu GCC host triplet: i386-pc-linux-gnu GCC target triplet: i386-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33717