From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29904 invoked by alias); 21 Jun 2009 16:47:46 -0000 Received: (qmail 29862 invoked by uid 48); 21 Jun 2009 16:47:34 -0000 Date: Sun, 21 Jun 2009 16:47:00 -0000 Message-ID: <20090621164734.29861.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/30354] -Os doesn't optimize a/CONST even if it saves size. In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "vda dot linux at googlemail dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-06/txt/msg01509.txt.bz2 ------- Comment #11 from vda dot linux at googlemail dot com 2009-06-21 16:47 ------- In 32-bit code, there are indeed a few cases of code growth. Here is a full list (id_XXX are signed divides, ud_XXX are unsigned ones): -00000000 0000000f T id_x_4 +00000000 00000012 T id_x_4 -00000000 0000000f T id_x_8 +00000000 00000012 T id_x_8 -00000000 0000000f T id_x_16 +00000000 00000012 T id_x_16 -00000000 0000000f T id_x_32 +00000000 00000012 T id_x_32 -00000000 00000010 T ud_x_28 +00000000 00000015 T ud_x_28 -00000000 00000010 T ud_x_56 +00000000 00000015 T ud_x_56 -00000000 00000010 T ud_x_13952 +00000000 00000015 T ud_x_13952 They fall into two groups. Signed divisions by power-of-2 grew by 3 bytes but they are *much faster* now, and considering how often people type "x / 4" and think "this will be optimized to shift", forgetting that their x is signed, and they therefore will have divide insn (!), I see it as a good trade. Code comparison: 00000000 : - 0: 8b 44 24 04 mov 0x4(%esp),%eax - 4: ba 10 00 00 00 mov $0x10,%edx - 9: 89 d1 mov %edx,%ecx - b: 99 cltd - c: f7 f9 idiv %ecx - e: c3 ret + 0: 8b 54 24 04 mov 0x4(%esp),%edx + 4: 89 d0 mov %edx,%eax + 6: c1 f8 1f sar $0x1f,%eax + 9: 83 e0 0f and $0xf,%eax + c: 01 d0 add %edx,%eax + e: c1 f8 04 sar $0x4,%eax + 11: c3 ret The second group is just a few rare cases where "multiple by reciprocal" optimization happens to require more processing and code is 5 bytes longer: 00000000 : - 0: 8b 44 24 04 mov 0x4(%esp),%eax - 4: ba 38 00 00 00 mov $0x38,%edx - 9: 89 d1 mov %edx,%ecx - b: 31 d2 xor %edx,%edx - d: f7 f1 div %ecx - f: c3 ret + 0: 53 push %ebx + 1: 8b 4c 24 08 mov 0x8(%esp),%ecx + 5: bb 25 49 92 24 mov $0x24924925,%ebx + a: c1 e9 03 shr $0x3,%ecx + d: 89 c8 mov %ecx,%eax + f: f7 e3 mul %ebx + 11: 5b pop %ebx + 12: 89 d0 mov %edx,%eax + 14: c3 ret This is rare - only three cases in entire t.c.bz2 They are far outweighted by 474 cases where code got smaller. Most of them are saving only one byte. For example, unsigned_x / 100: 00000000 : - 0: 8b 44 24 04 mov 0x4(%esp),%eax - 4: ba 64 00 00 00 mov $0x64,%edx - 9: 89 d1 mov %edx,%ecx - b: 31 d2 xor %edx,%edx - d: f7 f1 div %ecx - f: c3 ret + 0: b8 1f 85 eb 51 mov $0x51eb851f,%eax + 5: f7 64 24 04 mull 0x4(%esp) + 9: 89 d0 mov %edx,%eax + b: c1 e8 05 shr $0x5,%eax + e: c3 ret Some cases got shorter by 2 or 4 bytes: -00000000 00000010 T ud_x_3 +00000000 0000000e T ud_x_3 -00000000 00000010 T ud_x_9 +00000000 0000000e T ud_x_9 -00000000 00000010 T ud_x_67 +00000000 0000000e T ud_x_67 -00000000 00000010 T ud_x_641 +00000000 0000000c T ud_x_641 -00000000 00000010 T ud_x_6700417 +00000000 0000000c T ud_x_6700417 For example, unsigned_x / 9: 00000000 : - 0: 8b 44 24 04 mov 0x4(%esp),%eax - 4: ba 09 00 00 00 mov $0x9,%edx - 9: 89 d1 mov %edx,%ecx - b: 31 d2 xor %edx,%edx - d: f7 f1 div %ecx - f: c3 ret + 0: b8 39 8e e3 38 mov $0x38e38e39,%eax + 5: f7 64 24 04 mull 0x4(%esp) + 9: 89 d0 mov %edx,%eax + b: d1 e8 shr %eax + d: c3 ret and unsigned_x / 641: 00000000 : - 0: 8b 44 24 04 mov 0x4(%esp),%eax - 4: ba 81 02 00 00 mov $0x281,%edx - 9: 89 d1 mov %edx,%ecx - b: 31 d2 xor %edx,%edx - d: f7 f1 div %ecx - f: c3 ret + 0: b8 81 3d 66 00 mov $0x663d81,%eax + 5: f7 64 24 04 mull 0x4(%esp) + 9: 89 d0 mov %edx,%eax + b: c3 ret I will attach t32.asm.diff now -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30354