From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10882 invoked by alias); 26 Apr 2012 15:53:30 -0000 Received: (qmail 10873 invoked by uid 22791); 26 Apr 2012 15:53:29 -0000 X-SWARE-Spam-Status: No, hits=-4.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_LZ,TW_SL,TW_SR,TW_TL X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 26 Apr 2012 15:53:15 +0000 From: "bonzini at gnu dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/53087] [powerpc] Poor code from cstore expander Date: Thu, 26 Apr 2012 15:53:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: bonzini at gnu dot org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: bonzini at gnu dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Status AssignedTo Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg02362.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53087 Paolo Bonzini changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |bonzini at gnu dot org |gnu.org | --- Comment #10 from Paolo Bonzini 2012-04-26 15:52:32 UTC --- Created attachment 27248 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27248 tentative patch Any change in expmed.c is not enough to avoid the cr-setting instruction "and.", because we get that instruction by design: For SNE, we would prefer that the xor/abs sequence be used for integers. For SEQ, likewise, except that comparisons with zero should be done with an scc insns. However, due to the order that combine see the resulting insns, we must, in fact, allow SEQ for integers. The question is whether it is really true that "For SEQ, likewise, except that comparisons with zero should be done with an scc insns". In general, I think no. I tried disabling scc altogether on this testcase: int eq1_phi (unsigned long a) { return a ? 0 : 1; } and Steven's smaller testcase. They compile respectively to: eq1_phi: sradi 0,3,63 xor 3,0,3 subf 3,0,3 addi 3,3,-1 srdi 3,3,63 blr foo: li 9,1 lis 0,0xcf8 sld 3,9,3 ori 0,0,63 and 0,3,0 neg 3,0 srdi 3,3,63 blr In both cases the code is produced by the "abs"-based trick in emit_store_flag. The trick produces bad code in the first case, because combine finds nothing to simplify---unlike Steven's case, where abs effectively goes away completely. Since many current PPC processors lack the abs/nabs instructions and do implement clz/ctz, I tried teaching expmed.c the cntlz trick of comment 5. This improves eq1_phi noticeably and produces one instruction more than above (as expected, this is roughly the same code as XLC): eq1_phi: cntlzd 3,3 srdi 3,3,6 blr foo: li 9,1 lis 0,0xcf8 sld 3,9,3 ori 0,0,63 and 3,3,0 cntlzd 3,3 addi 3,3,-64 srdi 3,3,63 blr Overall, I think it is preferable to get the slight pessimization of "foo" and get better code when combine cannot optimize away the ABS. It would anyway fix the bug, and the slight pessimization can still be offset in other ways, as suggested below. Hence I'm attaching a patch that teaches expmed to use cntlz, removes abs_nopower patterns, and makes cstoresi4 do the same thing for EQ and NE. I will add testcases and bootstrap/regtest on x86 before submitting, but of course in the meanwhile it would help a lot if somebody can bootstrap/regtest it on ppc and/or ppc64. On top of this, one thing to look at is making switch expansion better. For example one way to solve this bug in a target-independent way could be to expand switch statements to jump if (mask << a) < 0 (with a bit-reversed mask). <0 and >=0 are a bit easier to convert to sCC than ==0 and !=0. On SPARC the bit-reversed mask can even be cheaper to set up (a single mov covers 14-bit immediates, a single sethi covers 22-bit immediates as long as the lowest 10 are zero); on PowerPC and ARM it should be roughly the same. I don't know about Thumb. Another thing to do is to use a smaller type if possible, in this case int instead of long. This is because loading 64-bit constants can be expensive. This is especially true with the bit-reversed mask, but also in general; on PowerPC, loading 0xCF80003F in DImode is one instruction more than 0xCF80003F in SImode, because of a rldicl instruction needed to clear the high word. This two improvements give for the same input bits as comment 0: int bar (unsigned long a) { return (((int)0xFC001F30 << a) < 0); } which compiles to even better code: bar: lis 0,0xfc00 ori 0,0,7984 slw 3,0,3 srwi 3,3,31 blr or on x86: movl %edi, %ecx movl $-67100880, %eax sall %cl, %eax shrl $31, %eax ret Finally, switch expansion could also pick a different jump direction (!= vs == for the current code, < vs. >= with the above suggestion) to minimize the number of bits set in the mask. On PPC for example, 0x0CF8FFFF is one instruction more expensive than 0xF3070000.