[Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information
@ 2011-07-12 11:56 sgunderson at bigfoot dot com
  2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: sgunderson at bigfoot dot com @ 2011-07-12 11:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

           Summary: Could do more efficient unsigned-to-float to
                    conversions based on range information
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: sgunderson@bigfoot.com


I have code that looks vaguely like this:

float func(unsigned x)
{
    return (x & 0xfffff) * 0.01f;
}

When I compile it, GCC gives a long and relatively slow sequence:

fugl:~> gcc-4.6 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc-4.6
COLLECT_LTO_WRAPPER=/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.1-3'
--with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr
--program-suffix=-4.6 --enable-shared --enable-multiarch
--with-multiarch-defaults=i386-linux-gnu --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib/i386-linux-gnu
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib/i386-linux-gnu
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.6.1 (Debian 4.6.1-3) 

fugl:~> gcc-4.6 -O2 -march=pentium3 -msse2 -mfpmath=sse -c test.c
fugl:~> objdump --disassemble test.o                             

test.o:     file format elf32-i386


Disassembly of section .text:

00000000 <func>:
   0:    83 ec 04                 sub    $0x4,%esp
   3:    8b 54 24 08              mov    0x8(%esp),%edx
   7:    89 d0                    mov    %edx,%eax
   9:    81 e2 ff ff 00 00        and    $0xffff,%edx
   f:    25 ff ff 0f 00           and    $0xfffff,%eax
  14:    c1 e8 10                 shr    $0x10,%eax
  17:    f3 0f 2a c0              cvtsi2ss %eax,%xmm0
  1b:    f3 0f 2a ca              cvtsi2ss %edx,%xmm1
  1f:    f3 0f 59 05 00 00 00     mulss  0x0,%xmm0
  26:    00 
  27:    f3 0f 58 c1              addss  %xmm1,%xmm0
  2b:    f3 0f 59 05 04 00 00     mulss  0x4,%xmm0
  32:    00 
  33:    f3 0f 11 04 24           movss  %xmm0,(%esp)
  38:    d9 04 24                 flds   (%esp)
  3b:    58                       pop    %eax
  3c:    c3                       ret    
  3d:    8d 76 00                 lea    0x0(%esi),%esi

I assume this is because x is unsigned (I cannot easily change this, as I
depend on wraparound). However, if I insert a cast to int after the and
operation, I get the same results, and a much better sequence:

00000040 <func2>:
  40:    83 ec 04                 sub    $0x4,%esp
  43:    8b 44 24 08              mov    0x8(%esp),%eax
  47:    25 ff ff 0f 00           and    $0xfffff,%eax
  4c:    f3 0f 2a c0              cvtsi2ss %eax,%xmm0
  50:    f3 0f 59 05 04 00 00     mulss  0x4,%xmm0
  57:    00 
  58:    f3 0f 11 04 24           movss  %xmm0,(%esp)
  5d:    d9 04 24                 flds   (%esp)
  60:    5a                       pop    %edx
  61:    c3                       ret    

In other words, the modified code looks like this:

float func2(unsigned x)
{
    return (int)(x & 0xfffff) * 0.01f;
}

This should be possible for GCC to do when it has range information that says
the sign bit cannot be set.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
@ 2011-07-12 12:18 ` rguenth at gcc dot gnu.org
  2011-07-12 12:51 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-12 12:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2011.07.12 12:18:02
          Component|target                      |tree-optimization
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-12 12:18:02 UTC ---
Confirmed.  VRP could do this transformation.  I'm not sure it's always
worth or if there is a target that can do faster unsigned -> float conversion
than signed -> float conversion (though I doubt that).  Probably similar
optimization can be applied for

float func (unsigned long long x)
{
  return (x & 0xfffff) * 0.01f;
}

that is, introduce a truncation so that the int->float expander can use
floatsi instead of floatdi which might not be available either.

It happens that i?86 defines floatunsssi, so depending on the availability
of a unsigned -> float expander isn't a good profitability check.

The odd thing is of course that VRP would _insert_ a conversion ...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
  2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
@ 2011-07-12 12:51 ` rguenth at gcc dot gnu.org
  2011-07-12 15:21 ` sgunderson at bigfoot dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-12 12:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |rguenth at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-12 12:50:21 UTC ---
Created attachment 24743
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24743
proof of concept

Like this.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
  2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
  2011-07-12 12:51 ` rguenth at gcc dot gnu.org
@ 2011-07-12 15:21 ` sgunderson at bigfoot dot com
  2011-07-12 15:22 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: sgunderson at bigfoot dot com @ 2011-07-12 15:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

--- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC ---
Wow, answer in record time :-)

I don't know anything about GCC internals, so I can't comment much on the
patch; my only worry here is what would happen if you had a very narrow mask,
e.g. (x & 0xf) and you try to coerce it into the minimum possible type (a
char); wouldn't you end up doing some sort of expansion with movzbl again?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
                   ` (2 preceding siblings ...)
  2011-07-12 15:21 ` sgunderson at bigfoot dot com
@ 2011-07-12 15:22 ` rguenther at suse dot de
  2011-07-25  8:31 ` rguenth at gcc dot gnu.org
  2011-07-25  8:31 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2011-07-12 15:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> 2011-07-12 15:21:51 UTC ---
On Tue, 12 Jul 2011, sgunderson at bigfoot dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715
> 
> --- Comment #3 from sgunderson at bigfoot dot com 2011-07-12 15:19:51 UTC ---
> Wow, answer in record time :-)
> 
> I don't know anything about GCC internals, so I can't comment much on the
> patch; my only worry here is what would happen if you had a very narrow mask,
> e.g. (x & 0xf) and you try to coerce it into the minimum possible type (a
> char); wouldn't you end up doing some sort of expansion with movzbl again?

That's why I limit it to SImode truncation (that should be
equivalent to an int).  Quite lame ;)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
                   ` (4 preceding siblings ...)
  2011-07-25  8:31 ` rguenth at gcc dot gnu.org
@ 2011-07-25  8:31 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-25  8:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.7.0

--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-25 08:31:25 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/49715] Could do more efficient unsigned-to-float to conversions based on range information
  2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
                   ` (3 preceding siblings ...)
  2011-07-12 15:22 ` rguenther at suse dot de
@ 2011-07-25  8:31 ` rguenth at gcc dot gnu.org
  2011-07-25  8:31 ` rguenth at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-25  8:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49715

--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-25 08:30:50 UTC ---
Author: rguenth
Date: Mon Jul 25 08:30:46 2011
New Revision: 176735

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176735
Log:
2011-07-25  Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/49715
    * tree-vrp.c: Include expr.h and optabs.h.
    (range_fits_type_): New function.
    (simplify_float_conversion_using_ranges): Likewise.
    (simplify_stmt_using_ranges): Call it.
    * Makefile.in (tree-vrp.o): Add $(EXPR_H) and $(OPTABS_H) dependencies.
    * optabs.c (can_float_p): Export.
    * optabs.h (can_float_p): Declare.

    * gcc.target/i386/pr49715-1.c: New testcase.
    * gcc.target/i386/pr49715-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr49715-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr49715-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/optabs.c
    trunk/gcc/optabs.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vrp.c


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-25  8:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12 11:56 [Bug target/49715] New: Could do more efficient unsigned-to-float to conversions based on range information sgunderson at bigfoot dot com
2011-07-12 12:18 ` [Bug tree-optimization/49715] " rguenth at gcc dot gnu.org
2011-07-12 12:51 ` rguenth at gcc dot gnu.org
2011-07-12 15:21 ` sgunderson at bigfoot dot com
2011-07-12 15:22 ` rguenther at suse dot de
2011-07-25  8:31 ` rguenth at gcc dot gnu.org
2011-07-25  8:31 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).