From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-436591-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 17649 invoked by alias); 4 Dec 2013 01:49:53 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 15772 invoked by uid 48); 4 Dec 2013 01:49:47 -0000
From: "macro@linux-mips.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/59371] [4.8/4.9 Regression] Performance regression in GCC 4.8 and later versions.
Date: Wed, 04 Dec 2013 01:49:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: macro@linux-mips.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.8.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-59371-4-kbIYqHEqxn@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59371-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59371-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg00246.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371

--- Comment #5 from Maciej W. Rozycki <macro@linux-mips.org> ---
(In reply to Andrew Pinski from comment #4)
> Well that corrects how i++ is done.

Old MIPS assembly code produced was AFAICT correct.  The loop termination
condition was expressed as:

    bne    $3,$6,$L3

that represented (i != c) rather than (i < c), but we start `i' from 0
and increment by one at a time, so both expressions are equivalent in
this context.

Here I believe the following C language standard clause applies[1]:

"Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then the
operand with signed integer type is converted to the type of the operand
with unsigned integer type."

so for both operands the expression is supposed to use the "unsigned
short" type, that is 16-bit on the MIPS target.  There are no 16-bit ALU
operations defined in the MIPS architecture though, so at the assembly
(and therefore machine-level) level both `c' and `i' were sign-extended
to 32-bits:

    andi    $5,$5,0xffff
    seh    $6,$5

and:

    seh    $3,$3

respectively (of course ANDI is redundant here, there's no need to
zero-extend before sign-extending, SEH does not require it), before the
BNE comparison quoted above was made.  That correctly mimicked 16-bit
operations required by the language here (of course zero-extension of
both `c' and `i' would do as well).

Now after the change `c' is zero-extended only (no sign-extension
afterwards):

    andi    $5,$5,0xffff

while `i' is still sign-extended:

    seh    $3,$3

Then the loop termination condition is expressed as:

    slt    $6,$3,$5
    bne    $6,$0,$L3

instead.  Notice the SLT instruction, that accurately represents the
(i < c) termination condition, however using *signed* arithmetic.  Which
means that for `c' equal e.g. to 32768 the loop will never terminate.  I
believe this is not what the clause of the C language standard quoted
above implies.  For unsigned arithmetic SLTU would have to be used
instead.

So it looks to me like the performance regression merely happens to be
a visible sign of a bigger correctness problem.  Have I missed anything?

[1] "Programming languages -- C", ISO/IEC 9899:1999(E), Section 6.3.1.8
    "Usual arithmetic conversions".