From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12583 invoked by alias); 20 Apr 2011 04:25:25 -0000 Received: (qmail 12556 invoked by uid 22791); 20 Apr 2011 04:25:23 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_VZ,TW_ZW X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Apr 2011 04:25:09 +0000 From: "torvalds@linux-foundation.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug other/48696] New: Horrible bitfield code generation on x86 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: other X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: torvalds@linux-foundation.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Wed, 20 Apr 2011 04:25:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg02092.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696 Summary: Horrible bitfield code generation on x86 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassigned@gcc.gnu.org ReportedBy: torvalds@linux-foundation.org gcc (tried 4.5.1 and 4.6.0) generates absolutely horrid code for some common bitfield accesses due to minimizing the access size. Trivial test case: struct bad_gcc_code_generation { unsigned type:6, pos:16, stream:10; }; int show_bug(struct bad_gcc_code_generation *a) { a->type = 0; return a->pos; } will generate code like this on x86-64 with -O2: andb $-64, (%rdi) movl (%rdi), %eax shrl $6, %eax movzwl %ax, %eax ret where the problem is that byte access write, followed by a word access read. Most (all?) modern x86 CPU's will come to a screeching halt when they see a read that hits a store buffer entry, but cannot be fully forwarded from it. The penalty can be quite severe, and this is _very_ noticeable in profiles. This code would be _much_ faster either using an "andl" (making the store size match the next load, and thus forwarding through the store buffer), or by having the load be done first. (The above code snippet is not the real code I noticed it on, obviously, but real code definitely sees this, and profiling shows very clearly how the 32-bit load from memory basically stops cold due to the partial store buffer hit) Using non-native accesses to memory is fine for loads (so narrowing the access for a pure load is fine), but for store or read-modify-write instructions it's almost always a serious performance problem to try to "optimize" the memory operand size to something smaller. Yes, the constants often shrink, but the code becomes *much* slower unless you can guarantee that there are no loads of the original access size that follow the write.