public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/82261] x86: missing peephole for SHLD / SHRD
       [not found] <bug-82261-4@http.gcc.gnu.org/bugzilla/>
@ 2020-05-18 20:39 ` michaeljclark at mac dot com
  2020-05-19 10:45 ` ubizjak at gmail dot com
  2022-04-09  7:00 ` peter at cordes dot ca
  2 siblings, 0 replies; 3+ messages in thread
From: michaeljclark at mac dot com @ 2020-05-18 20:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

Michael Clark <michaeljclark at mac dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michaeljclark at mac dot com

--- Comment #2 from Michael Clark <michaeljclark at mac dot com> ---
Just refreshing this issue. I found it while testing some code-gen on Godbolt:

- https://godbolt.org/z/uXGxZ9

I noticed that Haswell code-gen uses SHRX/SHLX, but I think -Os and pre-Haswell
would benefit from this peephole if it is not complex to add. Noting that Clang
prefers SHLD / SHRD over the SHRX+SHLX pair no matter the -march flavor.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/82261] x86: missing peephole for SHLD / SHRD
       [not found] <bug-82261-4@http.gcc.gnu.org/bugzilla/>
  2020-05-18 20:39 ` [Bug target/82261] x86: missing peephole for SHLD / SHRD michaeljclark at mac dot com
@ 2020-05-19 10:45 ` ubizjak at gmail dot com
  2022-04-09  7:00 ` peter at cordes dot ca
  2 siblings, 0 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-19 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Michael Clark from comment #2)
> Just refreshing this issue. I found it while testing some code-gen on
> Godbolt:

The combiner creates:

Failed to match this instruction:
(parallel [
        (set (reg:SI 89)
            (ior:SI (ashift:SI (reg:SI 94)
                    (subreg:QI (reg/v:SI 88 [ n ]) 0))
                (lshiftrt:SI (reg:SI 95)
                    (minus:QI (subreg:QI (reg:SI 91) 0)
                        (subreg:QI (reg/v:SI 88 [ n ]) 0)))))
        (clobber (reg:CC 17 flags))
    ])

This is *almost* matched by:

(define_insn "x86_shld"
  [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
        (ior:SI (ashift:SI (match_dup 0)
                  (match_operand:QI 2 "nonmemory_operand" "Ic"))
                (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
                  (minus:QI (const_int 32) (match_dup 2)))))
   (clobber (reg:CC FLAGS_REG))]

but RTL combiner doesn't propagate (const_int 32) into the pattern.

I wonder if tree combiner can help here.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/82261] x86: missing peephole for SHLD / SHRD
       [not found] <bug-82261-4@http.gcc.gnu.org/bugzilla/>
  2020-05-18 20:39 ` [Bug target/82261] x86: missing peephole for SHLD / SHRD michaeljclark at mac dot com
  2020-05-19 10:45 ` ubizjak at gmail dot com
@ 2022-04-09  7:00 ` peter at cordes dot ca
  2 siblings, 0 replies; 3+ messages in thread
From: peter at cordes dot ca @ 2022-04-09  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #4 from Peter Cordes <peter at cordes dot ca> ---
GCC will emit SHLD / SHRD as part of shifting an integer that's two registers
wide.
Hironori Bono proposed the following functions as a workaround for this missed
optimization (https://stackoverflow.com/a/71805063/224132)

#include <stdint.h>

#ifdef __SIZEOF_INT128__
uint64_t shldq_x64(uint64_t low, uint64_t high, uint64_t count) {
  return (uint64_t)(((((unsigned __int128)high << 64) | (unsigned __int128)low)
<< (count & 63)) >> 64);
}

uint64_t shrdq_x64(uint64_t low, uint64_t high, uint64_t count) {
  return (uint64_t)((((unsigned __int128)high << 64) | (unsigned __int128)low)
>> (count & 63));
}
#endif

uint32_t shld_x86(uint32_t low, uint32_t high, uint32_t count) {
  return (uint32_t)(((((uint64_t)high << 32) | (uint64_t)low) << (count & 31))
>> 32);
}

uint32_t shrd_x86(uint32_t low, uint32_t high, uint32_t count) {
  return (uint32_t)((((uint64_t)high << 32) | (uint64_t)low) >> (count & 31));
}

---

The uint64_t functions (using __int128) compile cleanly in 64-bit mode
(https://godbolt.org/z/1j94Gcb4o) using 64-bit operand-size shld/shrd

but the uint32_t functions compile to a total mess in 32-bit mode (GCC11.2 -O3
-m32 -mregparm=3) before eventually using shld, including a totally insane 
    or      dh, 0

GCC trunk with -O3 -mregparm=3 compiles them cleanly, but without regparm it's
also slightly different mess.

Ironically, the uint32_t functions compile to quite a few instructions in
64-bit mode, actually doing the operations as written with shifts and ORs, and
having to manually mask the shift count to &31 because it uses a 64-bit
operand-size shift which masks with &63.  32-bit operand-size SHLD would be a
win here, at least for -mtune=intel or a specific Intel uarch.

I haven't looked at whether they still compile ok after inlining into
surrounding code, or whether operations would tend to combine with other things
in preference to becoming an SHLD.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-09  7:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-82261-4@http.gcc.gnu.org/bugzilla/>
2020-05-18 20:39 ` [Bug target/82261] x86: missing peephole for SHLD / SHRD michaeljclark at mac dot com
2020-05-19 10:45 ` ubizjak at gmail dot com
2022-04-09  7:00 ` peter at cordes dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).