public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/48696] New: Horrible bitfield code generation on x86
@ 2011-04-20  4:25 torvalds@linux-foundation.org
  2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: torvalds@linux-foundation.org @ 2011-04-20  4:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

           Summary: Horrible bitfield code generation on x86
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: torvalds@linux-foundation.org


gcc (tried 4.5.1 and 4.6.0) generates absolutely horrid code for some common
bitfield accesses due to minimizing the access size.

Trivial test case:

  struct bad_gcc_code_generation {
    unsigned type:6,
         pos:16,
         stream:10;
  };

  int show_bug(struct bad_gcc_code_generation *a)
  {
    a->type = 0;
    return a->pos;
  }

will generate code like this on x86-64 with -O2:

    andb    $-64, (%rdi)
    movl    (%rdi), %eax
    shrl    $6, %eax
    movzwl    %ax, %eax
    ret

where the problem is that byte access write, followed by a word access read.

Most (all?) modern x86 CPU's will come to a screeching halt when they see a
read that hits a store buffer entry, but cannot be fully forwarded from it. The
penalty can be quite severe, and this is _very_ noticeable in profiles.

This code would be _much_ faster either using an "andl" (making the store size
match the next load, and thus forwarding through the store buffer), or by
having the load be done first. 

(The above code snippet is not the real code I noticed it on, obviously, but
real code definitely sees this, and profiling shows very clearly how the 32-bit
load from memory basically stops cold due to the partial store buffer hit)

Using non-native accesses to memory is fine for loads (so narrowing the access
for a pure load is fine), but for store or read-modify-write instructions it's
almost always a serious performance problem to try to "optimize" the memory
operand size to something smaller.

Yes, the constants often shrink, but the code becomes *much* slower unless you
can guarantee that there are no loads of the original access size that follow
the write.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-02-11 11:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
2011-04-20  9:44 ` [Bug rtl-optimization/48696] " rguenth at gcc dot gnu.org
2011-04-20 11:48 ` ebotcazou at gcc dot gnu.org
2011-04-20 12:12 ` rguenth at gcc dot gnu.org
2011-04-20 12:15 ` rguenth at gcc dot gnu.org
2011-04-20 15:54   ` Jan Hubicka
2011-04-20 12:59 ` ebotcazou at gcc dot gnu.org
2011-04-20 15:31 ` torvalds@linux-foundation.org
2011-04-20 15:41 ` rguenth at gcc dot gnu.org
2011-04-20 15:41 ` rguenth at gcc dot gnu.org
2011-04-20 15:54 ` hubicka at ucw dot cz
2011-04-20 16:17 ` torvalds@linux-foundation.org
2011-04-20 16:20 ` jakub at gcc dot gnu.org
2011-04-21 15:44 ` joseph at codesourcery dot com
2011-04-21 15:49 ` rguenther at suse dot de
2011-05-03 11:26 ` rguenth at gcc dot gnu.org
2021-02-11 11:22 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).