public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
@ 2022-07-06 22:05 already5chosen at yahoo dot com
  2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-07-06 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

            Bug ID: 106220
           Summary: x86-64 optimizer forgets about shrd peephole
                    optimization pattern when faced with more than one in
                    close proximity
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: already5chosen at yahoo dot com
  Target Milestone: ---

I am reporting about right shift issue, but left shift has the same issues as
well.

In theory, gcc knows how to calculate lower 64 bits of the right shift of
128-bit number with a single instruction when it is provable that shift count
is in range [0:63]. In practice, it does it only under very special condition.
See here: https://godbolt.org/z/fhdo8xhxW

foo1to1 is good
foo2to1 is good
foo1to2 starts well but is broken near the end but hyperactive vectorizer.
But that's a separate issue already reported in 105617.

foo2to2, foo2to3, foo3to4 - looks like compiler forgot all it knew about
double-word right shifts, or, more likely, forgot that (x % 64) is always in
range [0:63].

I am reporting it as a target issue despite being sure that the problem is not
in the x86-64 back end itself, but somehow in interaction between various
phases of optimizer. As 80+ percents of my reports.
However it's your call, not mine. In practice, an impact is most visible on
x86-64, because, due to existence of shrd instruction, x86-64 is potentially
very good in this sort of tasks. On ARM64 or on POWER64LE the relative slowdown
is lower, because an optimal code is not as fast.

P.S
82261 sounds similar, but I am not sure it is related.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/106220] x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
  2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
@ 2022-07-06 22:12 ` pinskia at gcc dot gnu.org
  2022-07-06 22:14 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-06 22:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 53265
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53265&action=edit
testcase

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/106220] x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
  2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
  2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
@ 2022-07-06 22:14 ` pinskia at gcc dot gnu.org
  2022-07-06 23:34 ` already5chosen at yahoo dot com
  2022-11-11  5:05 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-06 22:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
-O2 -march=haswell

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/106220] x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
  2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
  2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
  2022-07-06 22:14 ` pinskia at gcc dot gnu.org
@ 2022-07-06 23:34 ` already5chosen at yahoo dot com
  2022-11-11  5:05 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-07-06 23:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

--- Comment #3 from Michael_S <already5chosen at yahoo dot com> ---
-march-haswell is not very important.
I added it only because in absence of BMI extension an issue is somewhat
obscured by need to keep shift count in CL register.

-O2 is also not important. -O3 is the same. And -O1, due to absence of
if-conversion, demonstrates the same issue in different form.
In practice, I'd guess -O1 code would perform quite well, unlike -O2 and -O3,
but it does not make it less ugly looking.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/106220] x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
  2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
                   ` (2 preceding siblings ...)
  2022-07-06 23:34 ` already5chosen at yahoo dot com
@ 2022-11-11  5:05 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2022-11-11  5:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Try to add combine splitter

(define_insn_and_split "*x86_64_shrd_lshiftrtti"
  [(set (match_operand:DI 0 "nonimmediate_operand")
        (subreg:DI (lshiftrt:TI (match_operand:TI 1 "nonimmediate_operand")
                                (subreg:QI
                                  (and:SI
                                    (match_operand:SI 2 "register_operand")
                                    (const_int 63)) 0)) 0))
   (clobber (reg:CC FLAGS_REG))]

 but failed if there's more than 2 shrd insns since there's flags clobber in
the first shrd which prevent the second shrd to be combined. By the time it's
splitted after reload, it's too later to optimize off redudant cmovne.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-11  5:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
2022-07-06 22:14 ` pinskia at gcc dot gnu.org
2022-07-06 23:34 ` already5chosen at yahoo dot com
2022-11-11  5:05 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).