public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
@ 2012-01-05 22:40 ` pinskia at gcc dot gnu.org
  2012-01-08 19:55 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-01-05 22:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-01-05 22:40:30 UTC ---
Trying 8 -> 9:
Failed to match this instruction:
(parallel [
        (set (reg:HI 68)
            (ior:HI (zero_extend:HI (mem/c/i:QI (symbol_ref:DI ("a")  <var_decl
0x7fd37fc73140 a>) [0 a+0 S1 A8]))
                (reg:HI 69)))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:HI 68)
    (ior:HI (zero_extend:HI (mem/c/i:QI (symbol_ref:DI ("a")  <var_decl
0x7fd37fc73140 a>) [0 a+0 S1 A8]))
        (reg:HI 69)))

Should be matched to do:
orb a, %al

And:
Trying 8, 7 -> 9:
Failed to match this instruction:
(parallel [
        (set (reg:HI 68)
            (ior:HI (and:HI (subreg:HI (reg:SI 67 [ b ]) 0)
                    (const_int -256 [0xffffffffffffff00]))
                (zero_extend:HI (mem/c/i:QI (symbol_ref:DI ("a")  <var_decl
0x7fd37fc73140 a>) [0 a+0 S1 A8]))))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:HI 68)
    (ior:HI (and:HI (subreg:HI (reg:SI 67 [ b ]) 0)
            (const_int -256 [0xffffffffffffff00]))
        (zero_extend:HI (mem/c/i:QI (symbol_ref:DI ("a")  <var_decl
0x7fd37fc73140 a>) [0 a+0 S1 A8]))))

Should be matched to do:
movb a, %al

So this is a target issue.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
  2012-01-05 22:40 ` [Bug target/15533] Missed move to partial register pinskia at gcc dot gnu.org
@ 2012-01-08 19:55 ` ubizjak at gmail dot com
  2021-07-26 19:42 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2012-01-08 19:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2012-01-08 19:55:00 UTC ---
We do in fact have comment in i386.md:

;; Logical inclusive and exclusive OR instructions

;; %%% This used to optimize known byte-wide and operations to memory.
;; If this is considered useful, it should be done with splitters.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
  2012-01-05 22:40 ` [Bug target/15533] Missed move to partial register pinskia at gcc dot gnu.org
  2012-01-08 19:55 ` ubizjak at gmail dot com
@ 2021-07-26 19:42 ` pinskia at gcc dot gnu.org
  2021-08-22 22:46 ` peter at cordes dot ca
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26 19:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This looks improved in GCC 4.4.7 and above:
fn(unsigned short):
        movzbl  a, %edx
        xorb    %al, %al
        orl     %edx, %eax
        ret

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-07-26 19:42 ` pinskia at gcc dot gnu.org
@ 2021-08-22 22:46 ` peter at cordes dot ca
  2021-10-11  8:31 ` ubizjak at gmail dot com
  2021-10-11  8:34 ` [Bug tree-optimization/15533] " pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: peter at cordes dot ca @ 2021-08-22 22:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #5 from Peter Cordes <peter at cordes dot ca> ---
The new asm less bad, but still not good.  PR53133 is closed, but this code-gen
is a new instance of partial-register writing with xor al,al.  Also related:
PR82940 re: identifying bitfield insert patterns in the middle-end; hopefully
Andrew Pinski's planned set of patches to improve that can help back-ends do a
better job?

If we're going to read a 32-bit reg after writing an 8-bit reg (causing a
partial-register stall on Nehalem and earlier), we should be doing

  mov  a, %al       # merge into the low byte of RAX
  ret

Haswell and newer Intel don't rename the low byte partial register separately
from the full register, so they behave like AMD and other non-P6 /
non-Sandybridge CPU: dependency on the full register.  That's good for this
code; in this case the merging is necessary and we don't want the CPU to guess
that it won't be needed later.  The load+ALU-merge uops can micro-fuse into a
single uop for the front end.

 xor %al,%al still has a false dependency on the old value of RAX because it's
not a zeroing idiom; IIRC in my testing it's at least as good to do  mov $0,
%al.  Both instructions are 2 bytes long.

*
https://stackoverflow.com/questions/41573502/why-doesnt-gcc-use-partial-registers
 survey of the ways partial regs are handled on Intel P6 family vs. Intel
Sandybridge vs. Haswell and later vs. non-Intel and Intel Silvermont etc.
*
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to
- details of my testing on Haswell / Skylake.

----

*If* we still care about  -mtune=nehalem  and other increasingly less relevant
CPUs, we should be avoiding a partial register stall for those tuning options
with something like

   movzbl   a, %edx
   and      $-256, %eax
   or       %edx, %eax

i.e. what we're already doing, but spend a 5-byte AND-immediate instead of a
2-byte xor %al,%al or mov $0, %al

(That's what clang always does, so it's missing the code-size optimization.
https://godbolt.org/z/jsE57EKcb shows a similar case of return (a&0xFFFFFF00u)
| (b&0xFFu); with two register args)

-----

The penalty on Pentium-M through Nehalem is to stall for 2-3 cycles while a
merging uop is inserted.  The penalty on earlier P6 (PPro / Pentium III) is to
stall for 5-6 cycles until the partial-register write retires.

The penalty on Sandybridge (and maybe Ivy Bridge if it renames AL) is no stall,
just insert a merging uop.

On later Intel, and AMD, and Silvermont-family Intel, writing AL has a
dependency on the old RAX; it's a merge on the spot.

BTW, modern Intel does still rename AH separately, and merging does require the
front-end to issue a merging uop in a cycle by itself.  So writing AH instead
of AL would be different.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-08-22 22:46 ` peter at cordes dot ca
@ 2021-10-11  8:31 ` ubizjak at gmail dot com
  2021-10-11  8:34 ` [Bug tree-optimization/15533] " pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-11  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Cesar Eduardo Barros from comment #0)
> When compiling:
> #include <stdint.h>
> #define regparm __attribute__((regparm(3)))
> uint8_t a;
> uint16_t regparm fn(uint16_t b)
> { return (b & ~0xFF) | a; }

This should probably be optimized in match.pd to BIT_INSERT_EXPR.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/15533] Missed move to partial register
       [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-10-11  8:31 ` ubizjak at gmail dot com
@ 2021-10-11  8:34 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-11  8:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |pinskia at gcc dot gnu.org

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine for gcc 13.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-11  8:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-15533-4@http.gcc.gnu.org/bugzilla/>
2012-01-05 22:40 ` [Bug target/15533] Missed move to partial register pinskia at gcc dot gnu.org
2012-01-08 19:55 ` ubizjak at gmail dot com
2021-07-26 19:42 ` pinskia at gcc dot gnu.org
2021-08-22 22:46 ` peter at cordes dot ca
2021-10-11  8:31 ` ubizjak at gmail dot com
2021-10-11  8:34 ` [Bug tree-optimization/15533] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).