From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17649 invoked by alias); 4 Dec 2013 01:49:53 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 15772 invoked by uid 48); 4 Dec 2013 01:49:47 -0000 From: "macro@linux-mips.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/59371] [4.8/4.9 Regression] Performance regression in GCC 4.8 and later versions. Date: Wed, 04 Dec 2013 01:49:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 4.9.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: macro@linux-mips.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-12/txt/msg00246.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371 --- Comment #5 from Maciej W. Rozycki --- (In reply to Andrew Pinski from comment #4) > Well that corrects how i++ is done. Old MIPS assembly code produced was AFAICT correct. The loop termination condition was expressed as: bne $3,$6,$L3 that represented (i != c) rather than (i < c), but we start `i' from 0 and increment by one at a time, so both expressions are equivalent in this context. Here I believe the following C language standard clause applies[1]: "Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type." so for both operands the expression is supposed to use the "unsigned short" type, that is 16-bit on the MIPS target. There are no 16-bit ALU operations defined in the MIPS architecture though, so at the assembly (and therefore machine-level) level both `c' and `i' were sign-extended to 32-bits: andi $5,$5,0xffff seh $6,$5 and: seh $3,$3 respectively (of course ANDI is redundant here, there's no need to zero-extend before sign-extending, SEH does not require it), before the BNE comparison quoted above was made. That correctly mimicked 16-bit operations required by the language here (of course zero-extension of both `c' and `i' would do as well). Now after the change `c' is zero-extended only (no sign-extension afterwards): andi $5,$5,0xffff while `i' is still sign-extended: seh $3,$3 Then the loop termination condition is expressed as: slt $6,$3,$5 bne $6,$0,$L3 instead. Notice the SLT instruction, that accurately represents the (i < c) termination condition, however using *signed* arithmetic. Which means that for `c' equal e.g. to 32768 the loop will never terminate. I believe this is not what the clause of the C language standard quoted above implies. For unsigned arithmetic SLTU would have to be used instead. So it looks to me like the performance regression merely happens to be a visible sign of a bigger correctness problem. Have I missed anything? [1] "Programming languages -- C", ISO/IEC 9899:1999(E), Section 6.3.1.8 "Usual arithmetic conversions".