public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information
@ 2011-07-12 11:56 sgunderson at bigfoot dot com
2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: sgunderson at bigfoot dot com @ 2011-07-12 11:56 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
Summary: Could do more efficient unsigned-to-float to
conversions based on range information
Product: gcc
Version: 4.6.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: sgunderson@bigfoot.com
I have code that looks vaguely like this:
float func(unsigned x)
{
return (x & 0xfffff) * 0.01f;
}
When I compile it, GCC gives a long and relatively slow sequence:
fugl:~> gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.1-3'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-multiarch
--with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib/i386-linux-gnu
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib/i386-linux-gnu
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.1 (Debian 4.6.1-3)
fugl:~> gcc-4.6 -O2 -march=pentium3 -msse2 -mfpmath=sse -c test.c
fugl:~> objdump --disassemble test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
0: 83 ec 04 sub $0x4,%esp
3: 8b 54 24 08 mov 0x8(%esp),%edx
7: 89 d0 mov %edx,%eax
9: 81 e2 ff ff 00 00 and $0xffff,%edx
f: 25 ff ff 0f 00 and $0xfffff,%eax
14: c1 e8 10 shr $0x10,%eax
17: f3 0f 2a c0 cvtsi2ss %eax,%xmm0
1b: f3 0f 2a ca cvtsi2ss %edx,%xmm1
1f: f3 0f 59 05 00 00 00 mulss 0x0,%xmm0
26: 00
27: f3 0f 58 c1 addss %xmm1,%xmm0
2b: f3 0f 59 05 04 00 00 mulss 0x4,%xmm0
32: 00
33: f3 0f 11 04 24 movss %xmm0,(%esp)
38: d9 04 24 flds (%esp)
3b: 58 pop %eax
3c: c3 ret
3d: 8d 76 00 lea 0x0(%esi),%esi
I assume this is because x is unsigned (I cannot easily change this, as I
depend on wraparound). However, if I insert a cast to int after the and
operation, I get the same results, and a much better sequence:
00000040 <func2>:
40: 83 ec 04 sub $0x4,%esp
43: 8b 44 24 08 mov 0x8(%esp),%eax
47: 25 ff ff 0f 00 and $0xfffff,%eax
4c: f3 0f 2a c0 cvtsi2ss %eax,%xmm0
50: f3 0f 59 05 04 00 00 mulss 0x4,%xmm0
57: 00
58: f3 0f 11 04 24 movss %xmm0,(%esp)
5d: d9 04 24 flds (%esp)
60: 5a pop %edx
61: c3 ret
In other words, the modified code looks like this:
float func2(unsigned x)
{
return (int)(x & 0xfffff) * 0.01f;
}
This should be possible for GCC to do when it has range information that says
the sign bit cannot be set.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
@ 2011-07-12 12:18 ` rguenth at gcc dot gnu.org
2011-07-12 12:51 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-12 12:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
Last reconfirmed| |2011.07.12 12:18:02
Component|target |tree-optimization
CC| |rguenth at gcc dot gnu.org
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-12 12:18:02 UTC ---
Confirmed. VRP could do this transformation. I'm not sure it's always
worth or if there is a target that can do faster unsigned -> float conversion
than signed -> float conversion (though I doubt that). Probably similar
optimization can be applied for
float func (unsigned long long x)
{
return (x & 0xfffff) * 0.01f;
}
that is, introduce a truncation so that the int->float expander can use
floatsi instead of floatdi which might not be available either.
It happens that i?86 defines floatunsssi, so depending on the availability
of a unsigned -> float expander isn't a good profitability check.
The odd thing is of course that VRP would _insert_ a conversion ...
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
@ 2011-07-12 12:51 ` rguenth at gcc dot gnu.org
2011-07-12 15:21 ` sgunderson at bigfoot dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-12 12:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org
|gnu.org |
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-12 12:50:21 UTC ---
Created attachment 24743
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24743
proof of concept
Like this.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
2011-07-12 12:51 ` rguenth at gcc dot gnu.org
@ 2011-07-12 15:21 ` sgunderson at bigfoot dot com
2011-07-12 15:22 ` rguenther at suse dot de
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: sgunderson at bigfoot dot com @ 2011-07-12 15:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
--- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC ---
Wow, answer in record time :-)
I don't know anything about GCC internals, so I can't comment much on the
patch; my only worry here is what would happen if you had a very narrow mask,
e.g. (x & 0xf) and you try to coerce it into the minimum possible type (a
char); wouldn't you end up doing some sort of expansion with movzbl again?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
` (2 preceding siblings ...)
2011-07-12 15:21 ` sgunderson at bigfoot dot com
@ 2011-07-12 15:22 ` rguenther at suse dot de
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2011-07-12 15:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> 2011-07-12 15:21:51 UTC ---
On Tue, 12 Jul 2011, sgunderson at bigfoot dot com wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
>
> --- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC ---
> Wow, answer in record time :-)
>
> I don't know anything about GCC internals, so I can't comment much on the
> patch; my only worry here is what would happen if you had a very narrow mask,
> e.g. (x & 0xf) and you try to coerce it into the minimum possible type (a
> char); wouldn't you end up doing some sort of expansion with movzbl again?
That's why I limit it to SImode truncation (that should be
equivalent to an int). Quite lame ;)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
` (4 preceding siblings ...)
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
@ 2011-07-25 8:31 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-25 8:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
Target Milestone|--- |4.7.0
--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-25 08:31:25 UTC ---
Fixed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
` (3 preceding siblings ...)
2011-07-12 15:22 ` rguenther at suse dot de
@ 2011-07-25 8:31 ` rguenth at gcc dot gnu.org
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-25 8:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-25 08:30:50 UTC ---
Author: rguenth
Date: Mon Jul 25 08:30:46 2011
New Revision: 176735
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176735
Log:
2011-07-25 Richard Guenther <rguenther@suse.de>
PR tree-optimization/49715
* tree-vrp.c: Include expr.h and optabs.h.
(range_fits_type_): New function.
(simplify_float_conversion_using_ranges): Likewise.
(simplify_stmt_using_ranges): Call it.
* Makefile.in (tree-vrp.o): Add $(EXPR_H) and $(OPTABS_H) dependencies.
* optabs.c (can_float_p): Export.
* optabs.h (can_float_p): Declare.
* gcc.target/i386/pr49715-1.c: New testcase.
* gcc.target/i386/pr49715-2.c: Likewise.
Added:
trunk/gcc/testsuite/gcc.target/i386/pr49715-1.c
trunk/gcc/testsuite/gcc.target/i386/pr49715-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/optabs.c
trunk/gcc/optabs.h
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vrp.c
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-07-25 8:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
2011-07-12 12:51 ` rguenth at gcc dot gnu.org
2011-07-12 15:21 ` sgunderson at bigfoot dot com
2011-07-12 15:22 ` rguenther at suse dot de
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
2011-07-25 8:31 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).