From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6908 invoked by alias); 24 Apr 2013 00:00:32 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 6852 invoked by uid 48); 24 Apr 2013 00:00:27 -0000 From: "amodra at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/57052] New: missed optimization with rotate and mask Date: Wed, 24 Apr 2013 00:00:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: amodra at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 X-SW-Source: 2013-04/txt/msg01994.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57052 Bug #: 57052 Summary: missed optimization with rotate and mask Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: amodra@gmail.com /* -m32 -O -S */ int foo (unsigned int x, int r) { return ((x << r) | (x >> (32 - r))) & 0xff; } results in: foo: rlwnm 3,3,4,0xffffffff rlwinm 3,3,0,24,31 blr Compiling the same code with -m32 -O -S -mlittle gives the properly optimized result of: foo: rlwnm 3,3,4,0xff blr This is because many of the rs6000.md rotate/shift and mask patterns use subregs with wrong byte offsets. eg. rotlsi3_internal7, the insn that ought to match here, has (subreg:QI (rotate:SI ...) 0). The 0 selects the most significant byte when BYTES_BIG_ENDIAN and the least significant when !BYTES_BIG_ENDIAN. Fortunately combine doesn't seem to generate subregs for high parts, so changing the testcase mask to 0xff000000 doesn't result in wrong code. Annoyingly, rotlsi3_internal4 would match here too if combine_simplify_rtx() didn't simplify (set (reg:SI) (and:SI () 255)) to use subregs.