From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12348 invoked by alias); 12 Sep 2006 20:28:36 -0000 Received: (qmail 11283 invoked by uid 48); 12 Sep 2006 20:28:25 -0000 Date: Tue, 12 Sep 2006 20:28:00 -0000 Subject: [Bug target/29042] New: Useless floating-point stores and loads on x86 X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "guillaume dot melquiond at ens-lyon dot fr" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2006-09/txt/msg01082.txt.bz2 List-Id: This is the same testcase as PR26778. This bug is marked as resolved, and the patch indeed prevents GCC from using useless mmx registers. Concerning integer operations, the generated assembly got even better than GCC 3.4, as the values are directly incremented in memory instead of being loaded and stored; no callee-save register is used anymore. Unfortunately, there is still a regression with respect to the floating-point stack. Testcase compiled with -march=pentium3 -O3: typedef union { long long l; double d; } db_number; double test(double x[3]) { double th = x[1] + x[2]; if (x[2] != th - x[1]) { db_number thdb; thdb.d = th; thdb.l++; th = thdb.d; } return x[0] + th; } It may be clearer with a unified diff between the assembly code generated by 3.4.6 and the one by 4.2.0 (svn 2006-09-12). Note: the assembly code for 3.4 was edited by hand in order to reduce the noise due to mismatching integer registers. pushl %ebp movl %esp, %ebp subl $16, %esp movl 8(%ebp), %eax fldl 8(%eax) fldl 16(%eax) fld %st(1) fadd %st(1), %st + fstl -8(%ebp) fsub %st, %st(2) fxch %st(1) fucomip %st(2), %st fstp %st(1) jp .L7 - je .L2 + je .L9 .L7: fstpl -16(%ebp) addl $1, -16(%ebp) adcl $0, -12(%ebp) fldl -16(%ebp) + fstpl -8(%ebp) + jmp .L2 + .p2align 4,,7 +.L9: + fstp %st(0) + .p2align 4,,15 .L2: + fldl -8(%ebp) faddl (%eax) leave ret The 3.4 code never stores the value of th; it is kept at the top of the floating-point stack. In my opinion, this is optimal. This is no longer the case with 4.2 code. The value of "th" is stored in -8(%ebp). Then, on one branch (L7), it is overwritten with the content of -16(%ebp). And on the other branch (L9), the value is discarded from the top of the stack and then immediatly (L2) reloaded from memory. Each line prefixed by + is useless: if none is present, the code will still behave correctly and it will contain five assembly instructions less. -- Summary: Useless floating-point stores and loads on x86 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guillaume dot melquiond at ens-lyon dot fr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29042