public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl
@ 2006-12-28 7:09 raeburn at raeburn dot org
2007-06-06 3:39 ` [Bug target/30315] " scovich at gmail dot com
` (18 more replies)
0 siblings, 19 replies; 21+ messages in thread
From: raeburn at raeburn dot org @ 2006-12-28 7:09 UTC (permalink / raw)
To: gcc-bugs
Using gcc-4.3-20061223 targeting i386-linux and options "-O9
-fomit-frame-pointer -fexpensive-optimizations -S", my test program to detect
unsigned integer overflow:
extern void abort(void);
unsigned foo(unsigned a, unsigned b) {
unsigned sum = a + b;
if (sum < a) abort(); /* check for overflow (wrapping) */
if (sum < b) abort(); /* redundant */
return sum;
}
compiles to:
foo:
subl $12, %esp
movl 16(%esp), %eax
movl 20(%esp), %edx
addl %eax, %edx
cmpl %edx, %eax
ja .L7
cmpl %edx, 20(%esp)
ja .L7
movl %edx, %eax
addl $12, %esp
ret
.L7:
call abort
After the addition, which sets the ALU flags, the compiler issues two compare
instructions and conditional branches. This sequence could be replaced by a
conditional branch following the addl, testing one of the flags (overflow?
carry? I forget which) set by it.
I realize for other processors (e.g., with -march=pentiumpro), the addl may be
replaced by something like leal, which may not set the flags the same way. But
if the flags are set when building for a certain architecture, use them....
And is leal+cmp with data dependence still going to be better than addl?
(Also, I think a different set of register selections could eliminate the last
"movl" instruction, by putting the result of the addition into the return-value
register.)
--
Summary: optimize unsigned-add overflow test on x86 to use cpu
flags from addl
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: raeburn at raeburn dot org
GCC build triplet: powerpc-apple-darwin
GCC host triplet: powerpc-apple-darwin
GCC target triplet: i386-unknown-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
@ 2007-06-06 3:39 ` scovich at gmail dot com
2007-06-06 8:16 ` ubizjak at gmail dot com
` (17 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: scovich at gmail dot com @ 2007-06-06 3:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from scovich at gmail dot com 2007-06-06 03:39 -------
Happens on x86_64-unknown-linux-gnu as well, for both 4.2.0 and 4.3 (20070605)
The problem is even worse for 128-bit arithmetic because it has to check two
registers (with associated branches) before making a decision. This in spite of
the fact that sbb sets the flags properly AFAIK:
bool sub128(__uint128_t &dest, __uint128_t a, __uint128_t b) {
dest = a - b;
if(dest > a) abort();
}
_Z6sub128Rooo:
.LFB557:
movq %rsi, %rax
movq %rdx, %r10
pushq %rbx
.LCFI0:
subq %rcx, %rax
sbbq %r8, %rdx
movq %rax, (%rdi)
cmpq %rdx, %r10
movq %rdx, 8(%rdi)
ja .L23
jae .L24
.L21:
call abort
.p2align 4,,7
.L24:
cmpq %rax, %rsi
.p2align 4,,6
jb .L21
.p2align 4,,7
.L23:
popq %rbx
.p2align 4,,5
ret
There's not really a way to work around it with inline asm, either, because of
the branch on overflow that will most likely come right afterward...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
2007-06-06 3:39 ` [Bug target/30315] " scovich at gmail dot com
@ 2007-06-06 8:16 ` ubizjak at gmail dot com
2007-06-06 9:10 ` Andrew Pinski
2007-06-06 9:10 ` pinskia at gmail dot com
` (16 subsequent siblings)
18 siblings, 1 reply; 21+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-06 8:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ubizjak at gmail dot com 2007-06-06 08:15 -------
(In reply to comment #0)
> After the addition, which sets the ALU flags, the compiler issues two compare
> instructions and conditional branches. This sequence could be replaced by a
> conditional branch following the addl, testing one of the flags (overflow?
> carry? I forget which) set by it.
in config/i386/i386-modes.def, documentation says:
Add CCGOC to indicate comparisons against zero that allows
unspecified garbage in the Carry and Overflow flag. This
mode is used to simulate comparisons of (a-b) and (a+b)
against zero using sub/cmp/add operations.
addl operates in CCGOCmode, but you are requesting GTU conditional jump that
requires Carry flag=0 and Zero flag=0. GTU is incompatible with CCGOCmode due
to Carry flag handling.
Change one of conditions to "if (sum == 0) abort();" and you will see insn
elimination in action.
And BTW - wrapping is undefined operation.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
2007-06-06 3:39 ` [Bug target/30315] " scovich at gmail dot com
2007-06-06 8:16 ` ubizjak at gmail dot com
@ 2007-06-06 9:10 ` pinskia at gmail dot com
2007-06-06 10:33 ` rask at sygehus dot dk
` (15 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gmail dot com @ 2007-06-06 9:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gmail dot com 2007-06-06 09:10 -------
Subject: Re: optimize unsigned-add overflow test on x86 to use cpu flags from
addl
> And BTW - wrapping is undefined operation.
One for signed integers for unsigned it is actually defined as
wrapping and the reporter used unsigned integers here.
Thanks,
Andrew Pinski
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2007-06-06 8:16 ` ubizjak at gmail dot com
@ 2007-06-06 9:10 ` Andrew Pinski
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Pinski @ 2007-06-06 9:10 UTC (permalink / raw)
To: gcc-bugzilla; +Cc: gcc-bugs
> And BTW - wrapping is undefined operation.
One for signed integers for unsigned it is actually defined as
wrapping and the reporter used unsigned integers here.
Thanks,
Andrew Pinski
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (2 preceding siblings ...)
2007-06-06 9:10 ` pinskia at gmail dot com
@ 2007-06-06 10:33 ` rask at sygehus dot dk
2007-06-06 13:00 ` ubizjak at gmail dot com
` (14 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at sygehus dot dk @ 2007-06-06 10:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rask at sygehus dot dk 2007-06-06 10:33 -------
I see no reason to mark this enhancement request as invalid.
As to generating reasonable x86 code for overflow checks written in C, it
isn't completely hopeless. I did an experiment with my 16-bit x86 port. First,
I wrote the overflow check differently (and of course changed the data size to
16-bits):
#include <limits.h>
void abort (void);
unsigned short int foo (unsigned short int a, unsigned short int b)
{
unsigned short int sum;
sum = a + b;
if ((unsigned long int) a + (unsigned long int) b > USHRT_MAX)
abort ();
return sum;
}
Second, I added new instructions, of which combine was asking for the first
one:
(define_insn_and_split "*addsi3_cconly_highpart_ccz_nc"
[(set (reg:CCZ_NC CC_REG)
(compare:CCZ_NC
(subreg:HI (plus:SI (match_operand:SI 1 "general_operand" "0")
(match_operand:SI 2 "general_operand" "g")) 2)
(const_int 0)))
(clobber (match_scratch:HI 0 "=r"))]
"ia16_arith_operands_p (PLUS, operands)"
"#"
""
[(parallel
[(set (match_dup 0) (plus:HI (match_dup 3) (match_dup 4)))
(set (reg:CCZ_NC CC_REG)
(compare:CCZ_NC (subreg:HI (plus:SI (zero_extend:SI (match_dup 3))
(zero_extend:SI (match_dup 4))) 2)
(const_int 0)))]
)]
{
operands[3] = simplify_gen_subreg (HImode, operands[1], SImode, 0);
operands[4] = simplify_gen_subreg (HImode, operands[2], SImode, 0);
})
(define_insn "*addhi3_cc_carry_ccz_nc"
[(set (match_operand:HI 0 "nonimmediate_operand" "=qm,r,m")
(plus:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,0")
(match_operand:HI 2 "general_operand" "Uo,g,ri")))
(set (reg:CCZ_NC CC_REG)
(compare:CCZ_NC
(subreg:HI (plus:SI (zero_extend:SI (match_dup 1))
(zero_extend:SI (match_dup 2))) 2)
(const_int 0)))]
"ia16_arith_operands_p (PLUS, operands)"
"@
addb\t%H2,\t%H0
addw\t%2,\t%0
addw\t%2,\t%0"
)
(define_insn "*addhi3_cconly_carry_ccz_nc"
[(set (match_scratch:HI 0 "=qm,r,m")
(plus:HI (match_operand:HI 1 "nonimmediate_operand" "%0,0,0")
(match_operand:HI 2 "general_operand" "Uo,g,ri")))
(set (reg:CCZ_NC CC_REG)
(compare:CCZ_NC
(subreg:HI (plus:SI (zero_extend:SI (match_dup 1))
(zero_extend:SI (match_dup 2))) 2)
(const_int 0)))]
"ia16_arith_operands_p (PLUS, operands)"
"@
addb\t%H2,\t%H0
addw\t%2,\t%0
addw\t%2,\t%0"
)
(define_insn "*beq_ccz_nc"
[(set (pc)
(if_then_else (eq (reg:CCZ_NC CC_REG)
(const_int 0))
(label_ref (match_operand 0))
(pc)))]
""
"jnc\t%l0"
)
Third, I created CCZ_NCmode and modified my SELECT_CC_MODE callback to use
it for the comparison in "*addsi3_cconly_highpart_ccz_nc". Result:
foo:
pushw %cx ;# 69 *pushhi1_nonimm
pushw %si ;# 70 *pushhi1_nonimm
pushw %di ;# 71 *pushhi1_nonimm
pushw %bp ;# 72 *pushhi1_nonimm
movw %sp, %bp ;# 67 *movhi/1
movw 10(%bp),%di ;# 2 *movhi/1
movw %sp, %si ;# 68 *movhi/1
movw 12(%si),%bp ;# 3 *movhi/1
movw %bp, %ax ;# 7 *movhi/1
addw %di, %ax ;# 63 *addhi3_cc_carry_ccz_nc/2
jnc .L7 ;# 15 *beq_ccz_nc
call abort ;# 25 *call
.L7:
popw %bp ;# 75 *pophi1
popw %di ;# 76 *pophi1
popw %si ;# 77 *pophi1
popw %cx ;# 78 *pophi1
ret ;# 79 *return
I see no reason the i386 back end can't be improved in a similiar fashion.
The original example is a lot more difficult because it uses two
comparisons. The middle end would have to recognize that the two comparisons
test the same thing.
--
rask at sygehus dot dk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rask at sygehus dot dk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (3 preceding siblings ...)
2007-06-06 10:33 ` rask at sygehus dot dk
@ 2007-06-06 13:00 ` ubizjak at gmail dot com
2007-06-06 14:51 ` raeburn at raeburn dot org
` (13 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-06 13:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from ubizjak at gmail dot com 2007-06-06 13:00 -------
(In reply to comment #4)
> I see no reason the i386 back end can't be improved in a similiar fashion.
Interesting...
I agree that this bugreport can be an enhancement request for the code you
provided. We can perhaps operate in just introduced CCCmode, but it looks a lot
of added functionality to cover this corner case. I'll try to code something.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (4 preceding siblings ...)
2007-06-06 13:00 ` ubizjak at gmail dot com
@ 2007-06-06 14:51 ` raeburn at raeburn dot org
2007-06-13 13:37 ` rask at sygehus dot dk
` (12 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: raeburn at raeburn dot org @ 2007-06-06 14:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from raeburn at raeburn dot org 2007-06-06 14:51 -------
Subject: Re: optimize unsigned-add overflow test on x86 to use cpu flags from
addl
On Jun 6, 2007, at 04:15, ubizjak at gmail dot com wrote:
> in config/i386/i386-modes.def, documentation says:
>
> Add CCGOC to indicate comparisons against zero that allows
> unspecified garbage in the Carry and Overflow flag. This
> mode is used to simulate comparisons of (a-b) and (a+b)
> against zero using sub/cmp/add operations.
>
> addl operates in CCGOCmode, but you are requesting GTU conditional
> jump that
> requires Carry flag=0 and Zero flag=0. GTU is incompatible with
> CCGOCmode due
> to Carry flag handling.
Actually, I think the test I want is just for the carry flag, which
can be done with one branch instruction using the flags set by addl.
If gcc's description of addl doesn't support that, well, then that's
what I'm requesting be enhanced. :-)
Rask's approach looks interesting, though I wouldn't want to have to
rewrite simple, portable tests for unsigned overflow to use wider
types, especially if I'm adding unsigned long long values, where I
don't have portable wider types.
Ken
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (5 preceding siblings ...)
2007-06-06 14:51 ` raeburn at raeburn dot org
@ 2007-06-13 13:37 ` rask at sygehus dot dk
2007-08-03 19:40 ` patchapp at dberlin dot org
` (11 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at sygehus dot dk @ 2007-06-13 13:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rask at sygehus dot dk 2007-06-13 13:36 -------
Looking at this again, I don't think the transformation I'm making with the
splitter is valid, because I'm making up a zero extension which wasn't there to
begin with. The upper part could have been non-zero before the instruction.
As to the original example, it should be possible to optimize
addl %eax, %edx
cmpl %edx, %eax
ja .L7
into
addl %eax, %edx
jc .L7
with a CCmode expressing that the carry flag is set if %edx is smaller than
%eax.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (6 preceding siblings ...)
2007-06-13 13:37 ` rask at sygehus dot dk
@ 2007-08-03 19:40 ` patchapp at dberlin dot org
2007-08-04 13:10 ` patchapp at dberlin dot org
` (10 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: patchapp at dberlin dot org @ 2007-08-03 19:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from patchapp at dberlin dot org 2007-08-03 19:40 -------
Subject: Bug number PR target/30315
A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00204.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (7 preceding siblings ...)
2007-08-03 19:40 ` patchapp at dberlin dot org
@ 2007-08-04 13:10 ` patchapp at dberlin dot org
2007-08-14 14:39 ` rask at gcc dot gnu dot org
` (9 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: patchapp at dberlin dot org @ 2007-08-04 13:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from patchapp at dberlin dot org 2007-08-04 13:10 -------
Subject: Bug number PR target/30315
A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00242.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (8 preceding siblings ...)
2007-08-04 13:10 ` patchapp at dberlin dot org
@ 2007-08-14 14:39 ` rask at gcc dot gnu dot org
2007-08-18 17:56 ` raeburn at raeburn dot org
` (8 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-08-14 14:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rask at gcc dot gnu dot org 2007-08-14 14:39 -------
Subject: Bug 30315
Author: rask
Date: Tue Aug 14 14:39:24 2007
New Revision: 127481
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=127481
Log:
PR target/30315
* config/i386/i386.h (CANONICALIZE_COMPARISON): New.
* config/i386/i386.md (plusminus)(addsub)(SWI): New.
(*<addsub><mode>3_cc_overflow): New.
(*add<mode>3_cconly_overflow): New.
(*sub<mode>3_cconly_overflow): New.
(*<addsub>si3_zext_cc_overflow): New.
* config/i386/predicates.md (fcmov_comparison_operator): Accept
CCCmode for LTU, GTU, LEU and GEU.
(ix86_comparison_operator): Likewise.
(ix86_carry_flag_operator): Carry flag is set if LTU or GTU in CCCmode.
* gcc/config/i386/i386.c (put_condition_code): Support CCCmode.
(ix86_cc_mode): Use CCCmode when testing for overflow of PLUS
or MINUS expressions.
testsuite/
PR target/30315
* gcc.target/i386/pr30315.c: New.
Added:
trunk/gcc/testsuite/gcc.target/i386/pr30315.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.h
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/predicates.md
trunk/gcc/testsuite/ChangeLog
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (9 preceding siblings ...)
2007-08-14 14:39 ` rask at gcc dot gnu dot org
@ 2007-08-18 17:56 ` raeburn at raeburn dot org
2007-08-19 13:01 ` rask at gcc dot gnu dot org
` (7 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: raeburn at raeburn dot org @ 2007-08-18 17:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from raeburn at raeburn dot org 2007-08-18 17:55 -------
Snapshot gcc-4.3-20070817 seems to do just fine with the sample code I
supplied.
Though perhaps I should've given a bigger test case... :-)
One of the places where I'd want to check for overflow is in computing memory
allocation sizes (hence my filing #30314 at the same time, and wanting a
carry-flag check there too). This patch lets gcc check the carry flag after
addition, which is great. However, it seems to prevent gcc from being able to
look at the zero flag afterwards. As in, "if the allocation size is zero, ask
for one byte so malloc won't return a null pointer":
void *foo(unsigned a, unsigned b) {
unsigned sum = a + b;
if (sum < a) return 0;
if (sum == 0) sum = 1;
return malloc(sum);
}
Without the overflow check, gcc just uses the zero flag resulting from the
addition. With the overflow check in there, gcc uses the carry flag for the
overflow check, but then re-tests the value to see if it's zero:
addl %eax, %edx
jb L3 ; L3 just returns null
testl %edx, %edx
movl $1, %eax
cmove %eax, %edx
Still, this is better than doing an explicit comparison for the overflow check
and then the test for zero.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (10 preceding siblings ...)
2007-08-18 17:56 ` raeburn at raeburn dot org
@ 2007-08-19 13:01 ` rask at gcc dot gnu dot org
2007-08-19 18:04 ` ubizjak at gmail dot com
` (6 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-08-19 13:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from rask at gcc dot gnu dot org 2007-08-19 13:00 -------
void *foo(unsigned a, unsigned b) {
unsigned sum = a + b;
if (sum < a) return 0;
if (sum == 0) sum = 1;
To be able to optimize both comparisons, we need to have the same comparison
operands. The only way I can see this happening is to canonicalize the overflow
check as "a + b < 0" with the obvious fear that since a and b are both
unsigned, GCC will optimize the comparison away as always false - unless taught
not to do so. I'll see what I can do.
Additionally, the resulting asm seems to be a bit stupid:
testl %edx, %edx
movl $1, %eax
cmove %eax, %edx
should be just
testl %edx, %edx
sete %dl
--
rask at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rask at gcc dot gnu dot org
|dot org |
Status|UNCONFIRMED |ASSIGNED
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-08-19 13:00:47
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (11 preceding siblings ...)
2007-08-19 13:01 ` rask at gcc dot gnu dot org
@ 2007-08-19 18:04 ` ubizjak at gmail dot com
2007-08-30 19:45 ` patchapp at dberlin dot org
` (5 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: ubizjak at gmail dot com @ 2007-08-19 18:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from ubizjak at gmail dot com 2007-08-19 18:03 -------
(In reply to comment #12)
> if (sum == 0) sum = 1;
>
> Additionally, the resulting asm seems to be a bit stupid:
>
> testl %edx, %edx
> movl $1, %eax
> cmove %eax, %edx
>
> should be just
>
> testl %edx, %edx
> sete %dl
Not, because sete sets dl to 0 (zero) when condition is not met. The c code
above doesn't change sum when sum != 0.
Regarding the problem, described in comment #11:
Perhaps we can add CCCZmode as a common mode to otherwise orthogonal CCCmode
and CCZmode. Playing with i386_cc_modes_compatible() and ix86_match_cc_mode()
we can perhaps teach combine to CSE and merge two comparisons. To achieve this,
I think we should use (enhanced) ix86_match_cc_mode() in just added RTL
patterns.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (12 preceding siblings ...)
2007-08-19 18:04 ` ubizjak at gmail dot com
@ 2007-08-30 19:45 ` patchapp at dberlin dot org
2007-09-09 19:22 ` rask at gcc dot gnu dot org
` (4 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: patchapp at dberlin dot org @ 2007-08-30 19:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from patchapp at dberlin dot org 2007-08-30 19:45 -------
Subject: Bug number PR 30315
A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg02217.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (13 preceding siblings ...)
2007-08-30 19:45 ` patchapp at dberlin dot org
@ 2007-09-09 19:22 ` rask at gcc dot gnu dot org
2007-11-10 1:32 ` rask at gcc dot gnu dot org
` (3 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-09-09 19:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from rask at gcc dot gnu dot org 2007-09-09 19:22 -------
Subject: Bug 30315
Author: rask
Date: Sun Sep 9 19:21:59 2007
New Revision: 128305
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128305
Log:
PR target/30315
* config/i386/i386.h (CANONICALIZE_COMPARISON): Delete.
* simplify-rtx.c (simplify_relational_operation_1): Add the
canonicalization from i386.h.
* doc/md.texi (Canonicalization of Instructions): Document it.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.h
trunk/gcc/doc/md.texi
trunk/gcc/simplify-rtx.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (14 preceding siblings ...)
2007-09-09 19:22 ` rask at gcc dot gnu dot org
@ 2007-11-10 1:32 ` rask at gcc dot gnu dot org
2009-01-02 19:30 ` pinskia at gcc dot gnu dot org
` (2 subsequent siblings)
18 siblings, 0 replies; 21+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-11-10 1:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from rask at gcc dot gnu dot org 2007-11-10 01:32 -------
Two testcases which aren't optimized:
unsigned int bad1 (unsigned int a)
{
unsigned int c = a - 1;
if (c > a)
abort ();
else
return c;
}
unsigned int bad2 (unsigned int a)
{
unsigned int c = a - 2;
if (c > a)
abort ();
else
return c;
}
See also <URL:http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01359.html>.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (15 preceding siblings ...)
2007-11-10 1:32 ` rask at gcc dot gnu dot org
@ 2009-01-02 19:30 ` pinskia at gcc dot gnu dot org
2009-11-12 17:29 ` jparmele at wildbear dot com
2009-11-16 19:05 ` ubizjak at gmail dot com
18 siblings, 0 replies; 21+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-01-02 19:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from pinskia at gcc dot gnu dot org 2009-01-02 19:29 -------
>* gcc.target/i386/pr30315.c: New.
This testcase fails with -march=pentium-m turned on.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (16 preceding siblings ...)
2009-01-02 19:30 ` pinskia at gcc dot gnu dot org
@ 2009-11-12 17:29 ` jparmele at wildbear dot com
2009-11-16 19:05 ` ubizjak at gmail dot com
18 siblings, 0 replies; 21+ messages in thread
From: jparmele at wildbear dot com @ 2009-11-12 17:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from jparmele at wildbear dot com 2009-11-12 17:28 -------
This bug also still appears in 4.4.2 with --with-arch=pentium3.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug target/30315] optimize unsigned-add overflow test on x86 to use cpu flags from addl
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
` (17 preceding siblings ...)
2009-11-12 17:29 ` jparmele at wildbear dot com
@ 2009-11-16 19:05 ` ubizjak at gmail dot com
18 siblings, 0 replies; 21+ messages in thread
From: ubizjak at gmail dot com @ 2009-11-16 19:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from ubizjak at gmail dot com 2009-11-16 19:05 -------
(In reply to comment #18)
> This bug also still appears in 4.4.2 with --with-arch=pentium3.
pentium3 fails because it is not TARGET_HIMODE_MATH and TARGET_QIMODE_MATH.
So, it fails following testcase:
unsigned short
plusccsa (unsigned short a, unsigned short b)
{
unsigned short sum = a + b;
if (sum < a)
abort ();
return sum;
}
It used to generate:
plusccsa:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movzwl 8(%ebp), %edx
movzwl 12(%ebp), %eax
addl %edx, %eax
movzwl %ax, %eax
>> cmpw %ax, %dx
ja .L97
leave
ret
.L97:
call abort
And after the fix for wrong mode of compare insn for these targets [1]:
plusccsa:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movzwl 8(%ebp), %edx
movzwl 12(%ebp), %eax
addl %edx, %eax
movzwl %ax, %eax
>> cmpl %eax, %edx
ja .L52
leave
ret
.L52:
call abort
You don't want instructions that operate on bytes or words on pentium3...
At the end of the day, no bug here.
[1] http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00787.html
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30315
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-11-16 19:05 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-28 7:09 [Bug target/30315] New: optimize unsigned-add overflow test on x86 to use cpu flags from addl raeburn at raeburn dot org
2007-06-06 3:39 ` [Bug target/30315] " scovich at gmail dot com
2007-06-06 8:16 ` ubizjak at gmail dot com
2007-06-06 9:10 ` Andrew Pinski
2007-06-06 9:10 ` pinskia at gmail dot com
2007-06-06 10:33 ` rask at sygehus dot dk
2007-06-06 13:00 ` ubizjak at gmail dot com
2007-06-06 14:51 ` raeburn at raeburn dot org
2007-06-13 13:37 ` rask at sygehus dot dk
2007-08-03 19:40 ` patchapp at dberlin dot org
2007-08-04 13:10 ` patchapp at dberlin dot org
2007-08-14 14:39 ` rask at gcc dot gnu dot org
2007-08-18 17:56 ` raeburn at raeburn dot org
2007-08-19 13:01 ` rask at gcc dot gnu dot org
2007-08-19 18:04 ` ubizjak at gmail dot com
2007-08-30 19:45 ` patchapp at dberlin dot org
2007-09-09 19:22 ` rask at gcc dot gnu dot org
2007-11-10 1:32 ` rask at gcc dot gnu dot org
2009-01-02 19:30 ` pinskia at gcc dot gnu dot org
2009-11-12 17:29 ` jparmele at wildbear dot com
2009-11-16 19:05 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).