[Bug other/48696] New: Horrible bitfield code generation on x86

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug other/48696] New: Horrible bitfield code generation on x86
@ 2011-04-20  4:25 torvalds@linux-foundation.org
  2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
                   ` (15 more replies)
  0 siblings, 16 replies; 18+ messages in thread
From: torvalds@linux-foundation.org @ 2011-04-20  4:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

           Summary: Horrible bitfield code generation on x86
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: torvalds@linux-foundation.org

gcc (tried 4.5.1 and 4.6.0) generates absolutely horrid code for some common
bitfield accesses due to minimizing the access size.

Trivial test case:

  struct bad_gcc_code_generation {
    unsigned type:6,
         pos:16,
         stream:10;
  };

  int show_bug(struct bad_gcc_code_generation *a)
  {
    a->type = 0;
    return a->pos;
  }

will generate code like this on x86-64 with -O2:

    andb    $-64, (%rdi)
    movl    (%rdi), %eax
    shrl    $6, %eax
    movzwl    %ax, %eax
    ret

where the problem is that byte access write, followed by a word access read.

Most (all?) modern x86 CPU's will come to a screeching halt when they see a
read that hits a store buffer entry, but cannot be fully forwarded from it. The
penalty can be quite severe, and this is _very_ noticeable in profiles.

This code would be _much_ faster either using an "andl" (making the store size
match the next load, and thus forwarding through the store buffer), or by
having the load be done first. 

(The above code snippet is not the real code I noticed it on, obviously, but
real code definitely sees this, and profiling shows very clearly how the 32-bit
load from memory basically stops cold due to the partial store buffer hit)

Using non-native accesses to memory is fine for loads (so narrowing the access
for a pure load is fine), but for store or read-modify-write instructions it's
almost always a serious performance problem to try to "optimize" the memory
operand size to something smaller.

Yes, the constants often shrink, but the code becomes *much* slower unless you
can guarantee that there are no loads of the original access size that follow
the write.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug other/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
@ 2011-04-20  4:28 ` torvalds@linux-foundation.org
  2011-04-20  9:44 ` [Bug rtl-optimization/48696] " rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: torvalds@linux-foundation.org @ 2011-04-20  4:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #1 from Linus Torvalds <torvalds@linux-foundation.org> 2011-04-20 04:28:18 UTC ---
Side note, this is talked about in the Intel optimization manual

  3.6.4 Store Forwarding

Under "General Optimization Guidelines" -> "Optimizing Memory Accesses" ->
"Store Forwarding".

I'm sure people are aware of it, but I'm not certain people realize how
horrible the performance hit can be. I have to admit to having been a bit
surprised myself just _how_ clearly it shows up in profiles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
  2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
@ 2011-04-20  9:44 ` rguenth at gcc dot gnu.org
  2011-04-20 11:48 ` ebotcazou at gcc dot gnu.org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-20  9:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
      Known to work|                            |3.3.3, 4.0.3
            Version|unknown                     |4.5.2
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2011.04.20 09:43:28
          Component|other                       |rtl-optimization
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1
      Known to fail|                            |4.0.4, 4.1.2, 4.3.5, 4.5.2,
                   |                            |4.6.0, 4.7.0

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-04-20 09:43:28 UTC ---
First of all, confirmed.

We already expand to a byte read-modify-write and later do not realize
that with a word one we could combine the later read.  Thus, we optimize
the bitfield store on its own, not looking at surroundings.

I'm not sure where to best address this, rather than throwing in again
the idea of lowering bitfield accesses early on trees.

Eventually simply always doing whole-bitfield read-modify-write at expansion
time will be beneficial, at least when not optimizing for size.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
  2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
  2011-04-20  9:44 ` [Bug rtl-optimization/48696] " rguenth at gcc dot gnu.org
@ 2011-04-20 11:48 ` ebotcazou at gcc dot gnu.org
  2011-04-20 12:12 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-04-20 11:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot
                   |                            |gnu.org

--- Comment #3 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-04-20 11:47:00 UTC ---
So something changed between 4.0.3 and 4.0.4?  Or maybe a typo?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (2 preceding siblings ...)
  2011-04-20 11:48 ` ebotcazou at gcc dot gnu.org
@ 2011-04-20 12:12 ` rguenth at gcc dot gnu.org
  2011-04-20 12:15 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-20 12:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-04-20 12:11:56 UTC ---
(In reply to comment #3)
> So something changed between 4.0.3 and 4.0.4?  Or maybe a typo?

I only have 32bit compilers for both and see, for 4.0.3:

show_bug:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx
        movl    (%edx), %eax
        andl    $-64, %eax
        movl    %eax, (%edx)
        shrl    $6, %eax
        popl    %ebp
        movzwl  %ax, %eax
        ret

and for 4.0.4:

show_bug:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %eax
        andb    $-64, (%eax)
        movl    (%eax), %eax
        leave
        shrl    $6, %eax
        movzwl  %ax, %eax
        ret


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (3 preceding siblings ...)
  2011-04-20 12:12 ` rguenth at gcc dot gnu.org
@ 2011-04-20 12:15 ` rguenth at gcc dot gnu.org
  2011-04-20 15:54   ` Jan Hubicka
  2011-04-20 12:59 ` ebotcazou at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-20 12:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-04-20 12:15:14 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > So something changed between 4.0.3 and 4.0.4?  Or maybe a typo?
> 
> I only have 32bit compilers for both and see, for 4.0.3:
> 
> show_bug:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    8(%ebp), %edx
>         movl    (%edx), %eax
>         andl    $-64, %eax
>         movl    %eax, (%edx)
>         shrl    $6, %eax
>         popl    %ebp
>         movzwl  %ax, %eax
>         ret
> 
> and for 4.0.4:
> 
> show_bug:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    8(%ebp), %eax
>         andb    $-64, (%eax)
>         movl    (%eax), %eax
>         leave
>         shrl    $6, %eax
>         movzwl  %ax, %eax
>         ret

Actually the 4.0.4 compiler is x86_64, the code with -m32.  The 4.0.3
compiler is i586.

 /space/rguenther/install/gcc-4.0.3/libexec/gcc/i686-pc-linux-gnu/4.0.3/cc1
-quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=pentiumpro -auxbase t -O2
-version -o t.s


/space/rguenther/install/gcc-4.0.4/libexec/gcc/x86_64-unknown-linux-gnu/4.0.4/cc1
-quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=k8 -auxbase t -O2 -version -o
t.s

but no -march/tune combination makes the bug vanish for the 4.0.4 compiler
(maybe a HWI dependent "optimization")


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20 12:15 ` rguenth at gcc dot gnu.org
@ 2011-04-20 15:54   ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2011-04-20 15:54 UTC (permalink / raw)
  To: rguenth at gcc dot gnu.org; +Cc: gcc-bugs

> Actually the 4.0.4 compiler is x86_64, the code with -m32.  The 4.0.3
> compiler is i586.
> 
>  /space/rguenther/install/gcc-4.0.3/libexec/gcc/i686-pc-linux-gnu/4.0.3/cc1
> -quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=pentiumpro -auxbase t -O2
> -version -o t.s
> 
> 
> /space/rguenther/install/gcc-4.0.4/libexec/gcc/x86_64-unknown-linux-gnu/4.0.4/cc1
> -quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=k8 -auxbase t -O2 -version -o
> t.s
> 
> but no -march/tune combination makes the bug vanish for the 4.0.4 compiler
> (maybe a HWI dependent "optimization")

We do have i386 flag to disable instruction choice that change memory access
size, it is X86_TUNE_MEMORY_MISMATCH_STALL and I remember myself adding code to
check that we don't do instruction selection that changes memory access size (i.e.
andl->andb promotion) unless when optimizing for size.

This code seems to've evaporated from backend, so I guess it needs re-adding ;)
It did not however work quite reliably since combiner actually does this kind of
transform himself from time to time and I did not get approval to get target
hook for that back then (in 2000 or so).

Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (4 preceding siblings ...)
  2011-04-20 12:15 ` rguenth at gcc dot gnu.org
@ 2011-04-20 12:59 ` ebotcazou at gcc dot gnu.org
  2011-04-20 15:31 ` torvalds@linux-foundation.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: ebotcazou at gcc dot gnu.org @ 2011-04-20 12:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

Eric Botcazou <ebotcazou at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |4.1.0

--- Comment #6 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-04-20 12:59:40 UTC ---
Very likely PR rtl-optimization/22563 for the regression in 4.0.x and 4.1.x.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (5 preceding siblings ...)
  2011-04-20 12:59 ` ebotcazou at gcc dot gnu.org
@ 2011-04-20 15:31 ` torvalds@linux-foundation.org
  2011-04-20 15:41 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: torvalds@linux-foundation.org @ 2011-04-20 15:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #7 from Linus Torvalds <torvalds@linux-foundation.org> 2011-04-20 15:30:17 UTC ---
(In reply to comment #2)
> 
> I'm not sure where to best address this, rather than throwing in again
> the idea of lowering bitfield accesses early on trees.

So my gut feel is that getting rid of the bitfield as early as possible, and
turning all bitfield accesses into regular load/shift/mask/store operations is
always the right thing to do.

I also think that doing it with the size that the user specified is generally a
good idea, ie I sincerely hope that gcc hasn't thrown away the "unsigned int"
part of the type when it does the lowering of the bitfield op.

If gcc has forgotten the underlying type, and only looks at the bitfield size
and offset, gcc will likely never do a good job at it unless gcc gets _really_
smart and looks at all the accesses around it and decides "I need to do these
as 'int' just because (ie in the example, the "unsigned" base type is as
important as is the "bits 0..5" range information).

So I suspect it's better to just do a totally mindless expansion of bitfield
accesses early, and then use all the regular optimizations on them. Rather than
keep them around as bitfields and try to optimize at some higher level.

In an ironic twist, the real program that shows this optimization problem is
"sparse" (the kernel source code checker), which can actually do a "linearize
and optimize" the test-case itself, and in this case does this all better than
gcc (using its "dump the linearized IR" test-program):

  [torvalds@i5 ~]$ ./src/sparse/test-linearize test.c 
  test.c:7:5: warning: symbol 'show_bug' was not declared. Should it be static?
  show_bug:
  .L0x7f4cf7b93010:
    <entry-point>
    load.32     %r2 <- 0[%arg1]
    and.32      %r3 <- %r2, $-64
    store.32    %r3 -> 0[%arg1]
    lsr.32      %r7 <- %r3, $6
    cast.32     %r8 <- (16) %r7
    ret.32      %r8

Heh. Sparse may get a lot of other things wrong, but it got this particular
case right.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (6 preceding siblings ...)
  2011-04-20 15:31 ` torvalds@linux-foundation.org
@ 2011-04-20 15:41 ` rguenth at gcc dot gnu.org
  2011-04-20 15:41 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-20 15:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-04-20 15:41:09 UTC ---
Btw, the branch from the work "some time ago" created

show_bug:
.LFB2:
        movl    (%rdi), %eax
        andl    $-64, %eax
        movl    %eax, (%rdi)
        shrl    $6, %eax
        movzwl  %ax, %eax
        ret

for your testcase.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (7 preceding siblings ...)
  2011-04-20 15:41 ` rguenth at gcc dot gnu.org
@ 2011-04-20 15:41 ` rguenth at gcc dot gnu.org
  2011-04-20 15:54 ` hubicka at ucw dot cz
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-04-20 15:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-04-20 15:39:38 UTC ---
(In reply to comment #7)
> (In reply to comment #2)
> > 
> > I'm not sure where to best address this, rather than throwing in again
> > the idea of lowering bitfield accesses early on trees.
> 
> So my gut feel is that getting rid of the bitfield as early as possible, and
> turning all bitfield accesses into regular load/shift/mask/store operations is
> always the right thing to do.
> 
> I also think that doing it with the size that the user specified is generally a
> good idea, ie I sincerely hope that gcc hasn't thrown away the "unsigned int"
> part of the type when it does the lowering of the bitfield op.

Yeah, I was working on this some time ago.

> If gcc has forgotten the underlying type, and only looks at the bitfield size
> and offset, gcc will likely never do a good job at it unless gcc gets _really_
> smart and looks at all the accesses around it and decides "I need to do these
> as 'int' just because (ie in the example, the "unsigned" base type is as
> important as is the "bits 0..5" range information).

Unfortunately the underlying type isn't easily available (at least I didn't
yet find it ...).  But I suppose we have to guess anyway considering
targets that don't handle unaligned accesses well or packed bitfields.
Thus, an idea was to use aligned word-size loads/stores and only at the
start/end of a structure fall back to smaller accesses (for strict align
targets).  I still hope to eventually find that underlying type info
somewhere ...

> So I suspect it's better to just do a totally mindless expansion of bitfield
> accesses early, and then use all the regular optimizations on them. Rather than
> keep them around as bitfields and try to optimize at some higher level.

Yep.  Same for how we currently deal with unaligned loads on targets
that do not support them - the generated code is very similar.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (8 preceding siblings ...)
  2011-04-20 15:41 ` rguenth at gcc dot gnu.org
@ 2011-04-20 15:54 ` hubicka at ucw dot cz
  2011-04-20 16:17 ` torvalds@linux-foundation.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: hubicka at ucw dot cz @ 2011-04-20 15:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #10 from Jan Hubicka <hubicka at ucw dot cz> 2011-04-20 15:54:17 UTC ---
> Actually the 4.0.4 compiler is x86_64, the code with -m32.  The 4.0.3
> compiler is i586.
> 
>  /space/rguenther/install/gcc-4.0.3/libexec/gcc/i686-pc-linux-gnu/4.0.3/cc1
> -quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=pentiumpro -auxbase t -O2
> -version -o t.s
> 
> 
> /space/rguenther/install/gcc-4.0.4/libexec/gcc/x86_64-unknown-linux-gnu/4.0.4/cc1
> -quiet -v t.c -quiet -dumpbase t.c -m32 -mtune=k8 -auxbase t -O2 -version -o
> t.s
> 
> but no -march/tune combination makes the bug vanish for the 4.0.4 compiler
> (maybe a HWI dependent "optimization")

We do have i386 flag to disable instruction choice that change memory access
size, it is X86_TUNE_MEMORY_MISMATCH_STALL and I remember myself adding code to
check that we don't do instruction selection that changes memory access size
(i.e.
andl->andb promotion) unless when optimizing for size.

This code seems to've evaporated from backend, so I guess it needs re-adding ;)
It did not however work quite reliably since combiner actually does this kind
of
transform himself from time to time and I did not get approval to get target
hook for that back then (in 2000 or so).

Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (9 preceding siblings ...)
  2011-04-20 15:54 ` hubicka at ucw dot cz
@ 2011-04-20 16:17 ` torvalds@linux-foundation.org
  2011-04-20 16:20 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: torvalds@linux-foundation.org @ 2011-04-20 16:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #11 from Linus Torvalds <torvalds@linux-foundation.org> 2011-04-20 16:16:52 UTC ---
(In reply to comment #8)
> 
> Unfortunately the underlying type isn't easily available (at least I didn't
> yet find it ...).  But I suppose we have to guess anyway considering
> targets that don't handle unaligned accesses well or packed bitfields.
> Thus, an idea was to use aligned word-size loads/stores and only at the
> start/end of a structure fall back to smaller accesses (for strict align
> targets).  

That sounds fine.

The only reason to bother with the "underlying type" is that I suspect it could
be possible for educated programmers to use it as a code generation hint. IOW,
if all the individual fields end up fitting nicely in "char", using that as a
base type (even if the _total_ fields don't fit in a single byte) might be a
good hint for the compiler that it can/should use byte accesses and small
constants.

But using the biggest aligned word-size is probably equally good in practice.

And if you end up narrowing the types on _reads_, I think that's fine on x86. I
forget the exact store buffer forwarding rules (and they probably vary a bit
between different microarchitectures anyway), but I think almost all of them
support forwarding a larger store into a smaller (aligned) load.

It's just the writes that should normally not be narrowed.

(Of course, sometimes you may really want to narrow it. Replacing a

   andl $0xffffff00,(%rax)

with a simple

   movb $0,(%rax)

is certainly a very tempting optimization, but it really only works if there
are no subsequent word-sized loads that would get fouled by the write buffer
entry.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (10 preceding siblings ...)
  2011-04-20 16:17 ` torvalds@linux-foundation.org
@ 2011-04-20 16:20 ` jakub at gcc dot gnu.org
  2011-04-21 15:44 ` joseph at codesourcery dot com
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-04-20 16:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-04-20 16:19:05 UTC ---
Well, there is also the expander that can and often does increase the size of
the accesses, see e.g. PR48124 for more details.  And e.g. for C++0x memory
model as well as -fopenmp or, I guess, kernel SMP as well, the access size
should never grow into following non-bitfield fields and the bitfield access
lowering has to take it into account.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (11 preceding siblings ...)
  2011-04-20 16:20 ` jakub at gcc dot gnu.org
@ 2011-04-21 15:44 ` joseph at codesourcery dot com
  2011-04-21 15:49 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 18+ messages in thread
From: joseph at codesourcery dot com @ 2011-04-21 15:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #13 from joseph at codesourcery dot com <joseph at codesourcery dot com> 2011-04-21 15:43:27 UTC ---
On Wed, 20 Apr 2011, rguenth at gcc dot gnu.org wrote:

> > If gcc has forgotten the underlying type, and only looks at the bitfield size
> > and offset, gcc will likely never do a good job at it unless gcc gets _really_
> > smart and looks at all the accesses around it and decides "I need to do these
> > as 'int' just because (ie in the example, the "unsigned" base type is as
> > important as is the "bits 0..5" range information).
> 
> Unfortunately the underlying type isn't easily available (at least I didn't
> yet find it ...).  But I suppose we have to guess anyway considering
> targets that don't handle unaligned accesses well or packed bitfields.
> Thus, an idea was to use aligned word-size loads/stores and only at the
> start/end of a structure fall back to smaller accesses (for strict align
> targets).  I still hope to eventually find that underlying type info
> somewhere ...

The underlying type is used for -fstrict-volatile-bitfields, so it must be 
available in the cases needed for that option.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (12 preceding siblings ...)
  2011-04-21 15:44 ` joseph at codesourcery dot com
@ 2011-04-21 15:49 ` rguenther at suse dot de
  2011-05-03 11:26 ` rguenth at gcc dot gnu.org
  2021-02-11 11:22 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 18+ messages in thread
From: rguenther at suse dot de @ 2011-04-21 15:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> 2011-04-21 15:48:10 UTC ---
On Thu, 21 Apr 2011, joseph at codesourcery dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696
> 
> --- Comment #13 from joseph at codesourcery dot com <joseph at codesourcery dot com> 2011-04-21 15:43:27 UTC ---
> On Wed, 20 Apr 2011, rguenth at gcc dot gnu.org wrote:
> 
> > > If gcc has forgotten the underlying type, and only looks at the bitfield size
> > > and offset, gcc will likely never do a good job at it unless gcc gets _really_
> > > smart and looks at all the accesses around it and decides "I need to do these
> > > as 'int' just because (ie in the example, the "unsigned" base type is as
> > > important as is the "bits 0..5" range information).
> > 
> > Unfortunately the underlying type isn't easily available (at least I didn't
> > yet find it ...).  But I suppose we have to guess anyway considering
> > targets that don't handle unaligned accesses well or packed bitfields.
> > Thus, an idea was to use aligned word-size loads/stores and only at the
> > start/end of a structure fall back to smaller accesses (for strict align
> > targets).  I still hope to eventually find that underlying type info
> > somewhere ...
> 
> The underlying type is used for -fstrict-volatile-bitfields, so it must be 
> available in the cases needed for that option.

It relies on the mode, thus the start of the bitfield is at
bitpos % mode-size and its size is mode-size if it is aligned.
I'm not sure -fstrict-volatile-bitfields works for the unaligned/packed
case.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (13 preceding siblings ...)
  2011-04-21 15:49 ` rguenther at suse dot de
@ 2011-05-03 11:26 ` rguenth at gcc dot gnu.org
  2021-02-11 11:22 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-03 11:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |rguenth at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #15 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-03 11:20:34 UTC ---
Looking into bitfield lowering again.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug rtl-optimization/48696] Horrible bitfield code generation on x86
  2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
                   ` (14 preceding siblings ...)
  2011-05-03 11:26 ` rguenth at gcc dot gnu.org
@ 2021-02-11 11:22 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-11 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48696

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu.org
   Last reconfirmed|2012-02-01 00:00:00         |2021-2-11

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
And re-confirmed a few years later.  Sorry :/

Sofar we've improved the middle-end representation for bitfields, we have an
underlying field covering what memory models require to see as a single entity
and we've gained simpler IL for bitfield inserts.

But the actual lowering hasn't materialized yet and we'd still need to deal
with optimizations turning larger to smaller stores if we know we can do it
(which is difficult - we've seen STLF fails across TUs in benchmarks in the
context of vectorization).

IIRC Andrew was most recently working on sth like bitfield lowering.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-02-11 11:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-20  4:25 [Bug other/48696] New: Horrible bitfield code generation on x86 torvalds@linux-foundation.org
2011-04-20  4:28 ` [Bug other/48696] " torvalds@linux-foundation.org
2011-04-20  9:44 ` [Bug rtl-optimization/48696] " rguenth at gcc dot gnu.org
2011-04-20 11:48 ` ebotcazou at gcc dot gnu.org
2011-04-20 12:12 ` rguenth at gcc dot gnu.org
2011-04-20 12:15 ` rguenth at gcc dot gnu.org
2011-04-20 15:54   ` Jan Hubicka
2011-04-20 12:59 ` ebotcazou at gcc dot gnu.org
2011-04-20 15:31 ` torvalds@linux-foundation.org
2011-04-20 15:41 ` rguenth at gcc dot gnu.org
2011-04-20 15:41 ` rguenth at gcc dot gnu.org
2011-04-20 15:54 ` hubicka at ucw dot cz
2011-04-20 16:17 ` torvalds@linux-foundation.org
2011-04-20 16:20 ` jakub at gcc dot gnu.org
2011-04-21 15:44 ` joseph at codesourcery dot com
2011-04-21 15:49 ` rguenther at suse dot de
2011-05-03 11:26 ` rguenth at gcc dot gnu.org
2021-02-11 11:22 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).