From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-286857-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29904 invoked by alias); 21 Jun 2009 16:47:46 -0000
Received: (qmail 29862 invoked by uid 48); 21 Jun 2009 16:47:34 -0000
Date: Sun, 21 Jun 2009 16:47:00 -0000
Message-ID: <20090621164734.29861.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-30354-12956@http.gcc.gnu.org/bugzilla/>
Subject: [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
In-Reply-To: <bug-30354-12956@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "vda dot linux at googlemail dot com" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2009-06/txt/msg01509.txt.bz2


------- Comment #11 from vda dot linux at googlemail dot com  2009-06-21 16:47 -------
In 32-bit code, there are indeed a few cases of code growth.

Here is a full list (id_XXX are signed divides, ud_XXX are unsigned ones):
-00000000 0000000f T id_x_4
+00000000 00000012 T id_x_4
-00000000 0000000f T id_x_8
+00000000 00000012 T id_x_8
-00000000 0000000f T id_x_16
+00000000 00000012 T id_x_16
-00000000 0000000f T id_x_32
+00000000 00000012 T id_x_32

-00000000 00000010 T ud_x_28
+00000000 00000015 T ud_x_28
-00000000 00000010 T ud_x_56
+00000000 00000015 T ud_x_56
-00000000 00000010 T ud_x_13952
+00000000 00000015 T ud_x_13952

They fall into two groups. Signed divisions by power-of-2 grew by 3 bytes but
they are *much faster* now, and considering how often people type "x / 4" and
think "this will be optimized to shift", forgetting that their x is signed, and
they therefore will have divide insn (!), I see it as a good trade. Code
comparison:

 00000000 <id_x_16>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 10 00 00 00          mov    $0x10,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  99                      cltd
-   c:  f7 f9                   idiv   %ecx
-   e:  c3                      ret
+   0:  8b 54 24 04             mov    0x4(%esp),%edx
+   4:  89 d0                   mov    %edx,%eax
+   6:  c1 f8 1f                sar    $0x1f,%eax
+   9:  83 e0 0f                and    $0xf,%eax
+   c:  01 d0                   add    %edx,%eax
+   e:  c1 f8 04                sar    $0x4,%eax
+  11:  c3                      ret

The second group is just a few rare cases where "multiple by reciprocal"
optimization happens to require more processing and code is 5 bytes longer:

 00000000 <ud_x_56>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 38 00 00 00          mov    $0x38,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  53                      push   %ebx
+   1:  8b 4c 24 08             mov    0x8(%esp),%ecx
+   5:  bb 25 49 92 24          mov    $0x24924925,%ebx
+   a:  c1 e9 03                shr    $0x3,%ecx
+   d:  89 c8                   mov    %ecx,%eax
+   f:  f7 e3                   mul    %ebx
+  11:  5b                      pop    %ebx
+  12:  89 d0                   mov    %edx,%eax
+  14:  c3                      ret

This is rare - only three cases in entire t.c.bz2

They are far outweighted by 474 cases where code got smaller.

Most of them are saving only one byte. For example, unsigned_x / 100:

 00000000 <ud_x_100>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 64 00 00 00          mov    $0x64,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 1f 85 eb 51          mov    $0x51eb851f,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  c1 e8 05                shr    $0x5,%eax
+   e:  c3                      ret

Some cases got shorter by 2 or 4 bytes:
-00000000 00000010 T ud_x_3
+00000000 0000000e T ud_x_3
-00000000 00000010 T ud_x_9
+00000000 0000000e T ud_x_9
-00000000 00000010 T ud_x_67
+00000000 0000000e T ud_x_67
-00000000 00000010 T ud_x_641
+00000000 0000000c T ud_x_641
-00000000 00000010 T ud_x_6700417
+00000000 0000000c T ud_x_6700417

For example, unsigned_x / 9:

 00000000 <ud_x_9>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 09 00 00 00          mov    $0x9,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 39 8e e3 38          mov    $0x38e38e39,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  d1 e8                   shr    %eax
+   d:  c3                      ret

and unsigned_x / 641:

 00000000 <ud_x_641>:
-   0:  8b 44 24 04             mov    0x4(%esp),%eax
-   4:  ba 81 02 00 00          mov    $0x281,%edx
-   9:  89 d1                   mov    %edx,%ecx
-   b:  31 d2                   xor    %edx,%edx
-   d:  f7 f1                   div    %ecx
-   f:  c3                      ret
+   0:  b8 81 3d 66 00          mov    $0x663d81,%eax
+   5:  f7 64 24 04             mull   0x4(%esp)
+   9:  89 d0                   mov    %edx,%eax
+   b:  c3                      ret

I will attach t32.asm.diff now


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354