From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-286854-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 18314 invoked by alias); 21 Jun 2009 16:11:32 -0000
Received: (qmail 18260 invoked by uid 48); 21 Jun 2009 16:11:15 -0000
Date: Sun, 21 Jun 2009 16:11:00 -0000
Message-ID: <20090621161115.18259.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-30354-12956@http.gcc.gnu.org/bugzilla/>
Subject: [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size.
In-Reply-To: <bug-30354-12956@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "vda dot linux at googlemail dot com" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2009-06/txt/msg01506.txt.bz2


------- Comment #8 from vda dot linux at googlemail dot com  2009-06-21 16:11 -------
(In reply to comment #7)
> It seems to make sense to bump cost of idiv a bit, given the fact that there
> are register pressure implications.
> 
> I would like to however understand what code sequences we produce that are
> estimated to be long but ends up being shorter in practice.  Would be possible
> to try to give me some examples of constants where it is important to bump cost
> to 8?  It is possible we can simply fix cost estimation in divmod expansion
> instead.

Attached t.c.bz2 is a good source file to experiment with.

With last month's svn snapshot of gcc, I did the following:

/usr/app/gcc-4.4.svn.20090528/bin/gcc -g0 -Os -fomit-frame-pointer
-ffunction-sections -c t.c
objdump -dr t.o >t.asm

with and without the patch, and compared results. (-ffunction-sections are used
merely because they make "objdump -dr" output much more suitable for diffing).

Here is the diff between unpatched and patched gcc's code generated for int_x /
16:

 Disassembly of section .text.id_x_16:
 0000000000000000 <id_x_16>:
-   0:  89 f8                   mov    %edi,%eax
-   2:  ba 10 00 00 00          mov    $0x10,%edx
-   7:  89 d1                   mov    %edx,%ecx
-   9:  99                      cltd
-   a:  f7 f9                   idiv   %ecx
-   c:  c3                      retq
+   0:  8d 47 0f                lea    0xf(%rdi),%eax
+   3:  85 ff                   test   %edi,%edi
+   5:  0f 49 c7                cmovns %edi,%eax
+   8:  c1 f8 04                sar    $0x4,%eax
+   b:  c3                      retq

int_x / 2:

 Disassembly of section .text.id_x_2:
 0000000000000000 <id_x_2>:
    0:  89 f8                   mov    %edi,%eax
-   2:  ba 02 00 00 00          mov    $0x2,%edx
-   7:  89 d1                   mov    %edx,%ecx
-   9:  99                      cltd
-   a:  f7 f9                   idiv   %ecx
-   c:  c3                      retq
+   2:  c1 e8 1f                shr    $0x1f,%eax
+   5:  01 f8                   add    %edi,%eax
+   7:  d1 f8                   sar    %eax
+   9:  c3                      retq

As you can see, code become smaller and *much* faster (not even mul insn is
used now).

Here is an example of unsigned_x / 641. In this case, code size is the same,
but the code is faster:

 Disassembly of section .text.ud_x_641:
 0000000000000000 <ud_x_641>:
-   0:  ba 81 02 00 00          mov    $0x281,%edx
-   5:  89 f8                   mov    %edi,%eax
-   7:  89 d1                   mov    %edx,%ecx
-   9:  31 d2                   xor    %edx,%edx
-   b:  f7 f1                   div    %ecx
+   0:  89 f8                   mov    %edi,%eax
+   2:  48 69 c0 81 3d 66 00    imul   $0x663d81,%rax,%rax
+   9:  48 c1 e8 20             shr    $0x20,%rax
    d:  c3                      retq

There is not a single instance of code growth. Either newer gcc is better or
maybe code growth cases are in 32-bit code only.

I will attach t64.asm.diff, take a look if you want to see all changes in
generated code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354