public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
       [not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
@ 2012-06-28 23:29 ` aldot at gcc dot gnu.org
  2013-01-18 10:29 ` vda.linux at googlemail dot com
  2021-12-21 11:42 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 17+ messages in thread
From: aldot at gcc dot gnu.org @ 2012-06-28 23:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354

Bernhard Reutner-Fischer <aldot at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|2009-06-30 13:36:11         |2012-06-29
               Host|i386-pc-linux-gnu           |
              Build|i386-pc-linux-gnu           |

--- Comment #15 from Bernhard Reutner-Fischer <aldot at gcc dot gnu.org> 2012-06-28 23:28:28 UTC ---
Honza, did you find time to have a look?

I think this regressed alot in 4.6

$ for i in 3.4 4.2 4.4 4.5 4.6 4.7 4.8;do gcc-$i -fomit-frame-pointer -m32 -Os
-o t-$i.o -c t.c;done
$ size t-*.o
   text       data        bss        dec        hex    filename
 254731          0          0     254731      3e30b    t-3.4.o
 257731          0          0     257731      3eec3    t-4.2.o
 242787          0          0     242787      3b463    t-4.4.o
 242787          0          0     242787      3b463    t-4.5.o
 542811          0          0     542811      8485b    t-4.6.o
 542811          0          0     542811      8485b    t-4.7.o
 542811          0          0     542811      8485b    t-4.8.o

where:
gcc version 3.4.6 (Debian 3.4.6-10)
gcc version 4.2.4 (Debian 4.2.4-6)
gcc version 4.4.7 (Debian 4.4.7-1) 
gcc version 4.5.3 (Debian 4.5.3-12) 
gcc version 4.6.3 (Debian 4.6.3-1) 
gcc version 4.7.1 (Debian 4.7.1-2)
4.8 was pristine (just unrelated fixups)
gcc version 4.8.0 20120514 (experimental) [fixups revision
19d3eef:c8f5cfb:cbf2756acd7df8cfb441025e4512b97b6ef2fd10] (GCC)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
       [not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
  2012-06-28 23:29 ` [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size aldot at gcc dot gnu.org
@ 2013-01-18 10:29 ` vda.linux at googlemail dot com
  2021-12-21 11:42 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 17+ messages in thread
From: vda.linux at googlemail dot com @ 2013-01-18 10:29 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354

--- Comment #16 from Denis Vlasenko <vda.linux at googlemail dot com> 2013-01-18 10:29:12 UTC ---
(In reply to comment #15)
> Honza, did you find time to have a look?
> 
> I think this regressed alot in 4.6

Not really - it's just .eh_frame section.
I re-ran the tests with two gcc's I have here and sizes look like this:

   text       data        bss        dec        hex    filename
 257731          0          0     257731      3eec3    divmod-4.2.1-Os.o
 242787          0          0     242787      3b463    divmod-4.6.3-Os.o

Stock (unpatched) gcc improved, juggles registers better. For example:
int ib_100_x(int x) { return (100 / x) ^ (100 % x); }
    0:  b8 64 00 00 00          mov    $0x64,%eax
    5:  99                      cltd
    6:  f7 7c 24 04             idivl  0x4(%esp)
-   a:  31 c2                   xor    %eax,%edx
-   c:  89 d0                   mov    %edx,%eax
-   e:  c3                      ret
+   a:  31 d0                   xor    %edx,%eax
+   c:  c3                      ret

I believe my patch would improve things still - it is orthogonal to register
allocation.

BTW, just so that we are all on the same page wrt compiler options:
here's the script I use to compile, disassemble, and extract function sizes
from test program in comment 3. Tweakable by setting $PREFIX and/or $CC:

gencode.sh
==========
#!/bin/sh

#PREFIX="i686-"

test "$PREFIX"  || PREFIX=""
test "$CC"      || CC="${PREFIX}gcc"
test "$OBJDUMP" || OBJDUMP="${PREFIX}objdump"
test "$NM"      || NM="${PREFIX}nm"

CC_VER=`$CC --version | sed -n 's/[^ ]* [^ ]* \([3-9]\.[1-9][^ ]*\).*/\1/p'`
test "$CC_VER" || exit 1

build()
{
        opt=$1
        bname=divmod-$CC_VER${opt}${nail}

        # -ffunction-sections makes disasm easier to understand
        # (insn offsets start from 0 within every function).
        # -fno-exceptions -fno-asynchronous-unwind-tables: die, .eh_frame, die!
        $CC \
                -m32 \
                -fomit-frame-pointer \
                -ffunction-sections \
                -fno-exceptions \
                -fno-asynchronous-unwind-tables \
                ${opt} t.c -c -o $bname.o \
        && $OBJDUMP -dr $bname.o >$bname.asm \
        && $NM --size-sort $bname.o  | sort -k3 >$bname.nm
}

build -Os
#build -O2  #not interesting
#build -O3  #not interesting
size *.o | tee SIZES


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
       [not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
  2012-06-28 23:29 ` [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size aldot at gcc dot gnu.org
  2013-01-18 10:29 ` vda.linux at googlemail dot com
@ 2021-12-21 11:42 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-21 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (12 preceding siblings ...)
  2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
@ 2010-01-08  9:06 ` steven at gcc dot gnu dot org
  13 siblings, 0 replies; 17+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-01-08  9:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from steven at gcc dot gnu dot org  2010-01-08 09:06 -------
Honza, you said in comment #13 that you would look at this -- got news?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (11 preceding siblings ...)
  2009-06-21 16:48 ` vda dot linux at googlemail dot com
@ 2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
  2010-01-08  9:06 ` steven at gcc dot gnu dot org
  13 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-06-30 13:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from hubicka at gcc dot gnu dot org  2009-06-30 13:36 -------
Hmm,
looking at the cases it seems that main reason for the win is the fact that
idiv needs integer load instruction that has long immediate and we don't
optimize these for -Os well.

I suppose for -Os following is wrong:
    case CONST_INT:
    case CONST: 
    case LABEL_REF:
    case SYMBOL_REF:
      if (TARGET_64BIT && !x86_64_immediate_operand (x, VOIDmode))
        *total = 3;
      else if (TARGET_64BIT && !x86_64_zext_immediate_operand (x, VOIDmode))
        *total = 2;
      else if (flag_pic && SYMBOLIC_CONST (x)
               && (!TARGET_64BIT
                   || (!GET_CODE (x) != LABEL_REF
                       && (GET_CODE (x) != SYMBOL_REF
                           || !SYMBOL_REF_LOCAL_P (x)))))
        *total = 1;
      else  
        *total = 0;
      return true;

It probably should return actual size of load instruction with full sized
immediate and the individual cases matching RTL codes should know where
instruction allows cheap immediate operand encoding and prevent recursion
counting operand size itself.

I will look into this but won't complain if someone beats me :))

Honza


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |hubicka at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2009-06-06 13:41:23         |2009-06-30 13:36:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (10 preceding siblings ...)
  2009-06-21 16:47 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:48 ` vda dot linux at googlemail dot com
  2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
  2010-01-08  9:06 ` steven at gcc dot gnu dot org
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from vda dot linux at googlemail dot com  2009-06-21 16:48 -------
Created an attachment (id=18041)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18041&action=view)
Comparison of generated code with 4.4.svn.20090528 on i86


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (9 preceding siblings ...)
  2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
@ 2009-06-21 16:47 ` vda dot linux at googlemail dot com
  2009-06-21 16:48 ` vda dot linux at googlemail dot com
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from vda dot linux at googlemail dot com  2009-06-21 16:47 -------
In 32-bit code, there are indeed a few cases of code growth.

Here is a full list (id_XXX are signed divides, ud_XXX are unsigned ones):
-00000000 0000000f T id_x_4
+00000000 00000012 T id_x_4
-00000000 0000000f T id_x_8
+00000000 00000012 T id_x_8
-00000000 0000000f T id_x_16
+00000000 00000012 T id_x_16
-00000000 0000000f T id_x_32
+00000000 00000012 T id_x_32

-00000000 00000010 T ud_x_28
+00000000 00000015 T ud_x_28
-00000000 00000010 T ud_x_56
+00000000 00000015 T ud_x_56
-00000000 00000010 T ud_x_13952
+00000000 00000015 T ud_x_13952

They fall into two groups. Signed divisions by power-of-2 grew by 3 bytes but
they are *much faster* now, and considering how often people type "x / 4" and
think "this will be optimized to shift", forgetting that their x is signed, and
they therefore will have divide insn (!), I see it as a good trade. Code
comparison:

 00000000 <id_x_16>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 10 00 00 00          mov    $0x10,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  99                      cltd
-   c:  f7 f9                   idiv   %ecx
-   e:  c3                      ret
+   0:  8b 54 24 04             mov    0x4(%esp),%edx
+   4:  89 d0                   mov    %edx,%eax
+   6:  c1 f8 1f                sar    $0x1f,%eax
+   9:  83 e0 0f                and    $0xf,%eax
+   c:  01 d0                   add    %edx,%eax
+   e:  c1 f8 04                sar    $0x4,%eax
+  11:  c3                      ret

The second group is just a few rare cases where "multiple by reciprocal"
optimization happens to require more processing and code is 5 bytes longer:

 00000000 <ud_x_56>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 38 00 00 00          mov    $0x38,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  53                      push   %ebx
+   1:  8b 4c 24 08             mov    0x8(%esp),%ecx
+   5:  bb 25 49 92 24          mov    $0x24924925,%ebx
+   a:  c1 e9 03                shr    $0x3,%ecx
+   d:  89 c8                   mov    %ecx,%eax
+   f:  f7 e3                   mul    %ebx
+  11:  5b                      pop    %ebx
+  12:  89 d0                   mov    %edx,%eax
+  14:  c3                      ret

This is rare - only three cases in entire t.c.bz2

They are far outweighted by 474 cases where code got smaller.

Most of them are saving only one byte. For example, unsigned_x / 100:

 00000000 <ud_x_100>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 64 00 00 00          mov    $0x64,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 1f 85 eb 51          mov    $0x51eb851f,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  c1 e8 05                shr    $0x5,%eax
+   e:  c3                      ret

Some cases got shorter by 2 or 4 bytes:
-00000000 00000010 T ud_x_3
+00000000 0000000e T ud_x_3
-00000000 00000010 T ud_x_9
+00000000 0000000e T ud_x_9
-00000000 00000010 T ud_x_67
+00000000 0000000e T ud_x_67
-00000000 00000010 T ud_x_641
+00000000 0000000c T ud_x_641
-00000000 00000010 T ud_x_6700417
+00000000 0000000c T ud_x_6700417

For example, unsigned_x / 9:

 00000000 <ud_x_9>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 09 00 00 00          mov    $0x9,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 39 8e e3 38          mov    $0x38e38e39,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  d1 e8                   shr    %eax
+   d:  c3                      ret

and unsigned_x / 641:

 00000000 <ud_x_641>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 81 02 00 00          mov    $0x281,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 81 3d 66 00          mov    $0x663d81,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  c3                      ret

I will attach t32.asm.diff now


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (8 preceding siblings ...)
  2009-06-21 16:12 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
  2009-06-21 16:47 ` vda dot linux at googlemail dot com
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-06-21 16:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2009-06-21 16:25 -------
Do we have correct size estimates on idiv with a constant argument at all?
I don't see length attributes on it ...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (7 preceding siblings ...)
  2009-06-21 16:11 ` vda dot linux at googlemail dot com
@ 2009-06-21 16:12 ` vda dot linux at googlemail dot com
  2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from vda dot linux at googlemail dot com  2009-06-21 16:12 -------
Created an attachment (id=18040)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18040&action=view)
Comparison of generated code with 4.4.svn.20090528 on x86_64


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (6 preceding siblings ...)
  2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
@ 2009-06-21 16:11 ` vda dot linux at googlemail dot com
  2009-06-21 16:12 ` vda dot linux at googlemail dot com
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2009-06-21 16:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from vda dot linux at googlemail dot com  2009-06-21 16:11 -------
(In reply to comment #7)
> It seems to make sense to bump cost of idiv a bit, given the fact that there
> are register pressure implications.
> 
> I would like to however understand what code sequences we produce that are
> estimated to be long but ends up being shorter in practice.  Would be possible
> to try to give me some examples of constants where it is important to bump cost
> to 8?  It is possible we can simply fix cost estimation in divmod expansion
> instead.

Attached t.c.bz2 is a good source file to experiment with.

With last month's svn snapshot of gcc, I did the following:

/usr/app/gcc-4.4.svn.20090528/bin/gcc -g0 -Os -fomit-frame-pointer
-ffunction-sections -c t.c
objdump -dr t.o >t.asm

with and without the patch, and compared results. (-ffunction-sections are used
merely because they make "objdump -dr" output much more suitable for diffing).

Here is the diff between unpatched and patched gcc's code generated for int_x /
16:

 Disassembly of section .text.id_x_16:
 0000000000000000 <id_x_16>:
-   0:  89 f8                   mov    %edi,%eax
-   2:  ba 10 00 00 00          mov    $0x10,%edx
-   7:  89 d1                   mov    %edx,%ecx
-   9:  99                      cltd
-   a:  f7 f9                   idiv   %ecx
-   c:  c3                      retq
+   0:  8d 47 0f                lea    0xf(%rdi),%eax
+   3:  85 ff                   test   %edi,%edi
+   5:  0f 49 c7                cmovns %edi,%eax
+   8:  c1 f8 04                sar    $0x4,%eax
+   b:  c3                      retq

int_x / 2:

 Disassembly of section .text.id_x_2:
 0000000000000000 <id_x_2>:
    0:  89 f8                   mov    %edi,%eax
-   2:  ba 02 00 00 00          mov    $0x2,%edx
-   7:  89 d1                   mov    %edx,%ecx
-   9:  99                      cltd
-   a:  f7 f9                   idiv   %ecx
-   c:  c3                      retq
+   2:  c1 e8 1f                shr    $0x1f,%eax
+   5:  01 f8                   add    %edi,%eax
+   7:  d1 f8                   sar    %eax
+   9:  c3                      retq

As you can see, code become smaller and *much* faster (not even mul insn is
used now).

Here is an example of unsigned_x / 641. In this case, code size is the same,
but the code is faster:

 Disassembly of section .text.ud_x_641:
 0000000000000000 <ud_x_641>:
-   0:  ba 81 02 00 00          mov    $0x281,%edx
-   5:  89 f8                   mov    %edi,%eax
-   7:  89 d1                   mov    %edx,%ecx
-   9:  31 d2                   xor    %edx,%edx
-   b:  f7 f1                   div    %ecx
+   0:  89 f8                   mov    %edi,%eax
+   2:  48 69 c0 81 3d 66 00    imul   $0x663d81,%rax,%rax
+   9:  48 c1 e8 20             shr    $0x20,%rax
    d:  c3                      retq

There is not a single instance of code growth. Either newer gcc is better or
maybe code growth cases are in 32-bit code only.

I will attach t64.asm.diff, take a look if you want to see all changes in
generated code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (5 preceding siblings ...)
  2009-06-05 16:19 ` aldot at gcc dot gnu dot org
@ 2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
  2009-06-21 16:11 ` vda dot linux at googlemail dot com
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-06-06 13:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from hubicka at gcc dot gnu dot org  2009-06-06 13:41 -------
It seems to make sense to bump cost of idiv a bit, given the fact that there
are register pressure implications.

I would like to however understand what code sequences we produce that are
estimated to be long but ends up being shorter in practice.  Would be possible
to try to give me some examples of constants where it is important to bump cost
to 8?  It is possible we can simply fix cost estimation in divmod expansion
instead.

Honza 


-- 

hubicka at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-06-06 13:41:23
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (4 preceding siblings ...)
  2007-07-25 15:22 ` vda dot linux at googlemail dot com
@ 2009-06-05 16:19 ` aldot at gcc dot gnu dot org
  2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: aldot at gcc dot gnu dot org @ 2009-06-05 16:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from aldot at gcc dot gnu dot org  2009-06-05 16:19 -------
CC'ing honza as i386 maintainer


-- 

aldot at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aldot at gcc dot gnu dot
                   |                            |org, hubicka at gcc dot gnu
                   |                            |dot org
OtherBugsDependingO|                            |37515
              nThis|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (3 preceding siblings ...)
  2007-07-25 15:17 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:22 ` vda dot linux at googlemail dot com
  2009-06-05 16:19 ` aldot at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from vda dot linux at googlemail dot com  2007-07-25 15:22 -------
Forgot to mention:

* generator tests signed and unsigned divisions and modulus, both const / x and
x / const, and also tests u = a / b; v = a % b; construct.

* you need to filter gen_test output to weed out dups:

./gen_test | sort | uniq >t.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
                   ` (2 preceding siblings ...)
  2007-07-25 15:09 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:17 ` vda dot linux at googlemail dot com
  2007-07-25 15:22 ` vda dot linux at googlemail dot com
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from vda dot linux at googlemail dot com  2007-07-25 15:17 -------
Created an attachment (id=13975)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13975&action=view)
Test program generator

Test program was generated using gen_test.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
  2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
  2007-07-25 15:05 ` vda dot linux at googlemail dot com
@ 2007-07-25 15:09 ` vda dot linux at googlemail dot com
  2007-07-25 15:17 ` vda dot linux at googlemail dot com
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from vda dot linux at googlemail dot com  2007-07-25 15:09 -------
Created an attachment (id=13974)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13974&action=view)
Auto-generated test program with 15000 constant divs/mods

Test program, bzipped.

Build with
gcc -fomit-frame-pointer -Os t.c
and see sizes:
nm --size-sort t-Os.o | sort -k3


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
  2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
@ 2007-07-25 15:05 ` vda dot linux at googlemail dot com
  2007-07-25 15:09 ` vda dot linux at googlemail dot com
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: vda dot linux at googlemail dot com @ 2007-07-25 15:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from vda dot linux at googlemail dot com  2007-07-25 15:05 -------
Created an attachment (id=13973)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13973&action=view)
Fix: adjust div cost for -Os on i386 

Patch was tested with 4.2.1, I guess it will apply to other versions of gcc as
it is quite trivial.

Test program with lots or randomly-generated constant divisions and %'s was
compiled by patched gcc with different costs of division:

   text    data     bss     dec     hex filename
 257731       0       0  257731   3eec3 421.org/t-Os.o
 256788       0       0  256788   3eb14 421.7/t-Os.o
 256788       0       0  256788   3eb14 421.8/t-Os.o
 257377       0       0  257377   3ed61 421.9/t-Os.o
 257825       0       0  257825   3ef21 421.10/t-Os.o

Seems like (at least on 4.2.1) cost of 8 is giving smallest code.

Among 15000 divisions in test program, only signed divisions by power-of-two
grew in size (but they become MUCH faster, as they don't even use multiply now,
let alone div):

@@ -1703 +1703 @@
-0000000f T id_x_16
+00000012 T id_x_16
@@ -1836 +1836 @@
-0000000f T id_x_2
+0000000e T id_x_2
@@ -2030 +2030 @@
-0000000f T id_x_32
+00000012 T id_x_32

id_x_16 was:
        movl    4(%esp), %eax
        movl    $16, %edx
        movl    %edx, %ecx
        cltd
        idivl   %ecx
        ret
Now it is:
        movl    4(%esp), %edx
        movl    %edx, %eax
        sarl    $31, %eax
        andl    $15, %eax
        addl    %edx, %eax
        sarl    $4, %eax
        ret

and also unsigned_x / 28, unsigned_x / 13952, unsigned_x / 56 grew by 1 byte.

The rest either were not changed (and still use div insn) or shrank (typically
by 1 byte, record holders are "unsigned_x / 641" and "unsigned_x / 6700417" -
shrank by 4 bytes).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
  2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
@ 2007-01-02 22:45 ` pinskia at gcc dot gnu dot org
  2007-07-25 15:05 ` vda dot linux at googlemail dot com
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-01-02 22:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2007-01-02 22:45 -------
I think this is really a cost issue with the x86 back-end, rather than with the
middle-end.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-12-21 11:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-30354-4@http.gcc.gnu.org/bugzilla/>
2012-06-28 23:29 ` [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size aldot at gcc dot gnu.org
2013-01-18 10:29 ` vda.linux at googlemail dot com
2021-12-21 11:42 ` pinskia at gcc dot gnu.org
2007-01-02 22:09 [Bug rtl-optimization/30354] New: " vda dot linux at googlemail dot com
2007-01-02 22:45 ` [Bug target/30354] " pinskia at gcc dot gnu dot org
2007-07-25 15:05 ` vda dot linux at googlemail dot com
2007-07-25 15:09 ` vda dot linux at googlemail dot com
2007-07-25 15:17 ` vda dot linux at googlemail dot com
2007-07-25 15:22 ` vda dot linux at googlemail dot com
2009-06-05 16:19 ` aldot at gcc dot gnu dot org
2009-06-06 13:41 ` hubicka at gcc dot gnu dot org
2009-06-21 16:11 ` vda dot linux at googlemail dot com
2009-06-21 16:12 ` vda dot linux at googlemail dot com
2009-06-21 16:26 ` rguenth at gcc dot gnu dot org
2009-06-21 16:47 ` vda dot linux at googlemail dot com
2009-06-21 16:48 ` vda dot linux at googlemail dot com
2009-06-30 13:36 ` hubicka at gcc dot gnu dot org
2010-01-08  9:06 ` steven at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).