public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity
@ 2022-07-06 22:05 already5chosen at yahoo dot com
  2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-07-06 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220

            Bug ID: 106220
           Summary: x86-64 optimizer forgets about shrd peephole
                    optimization pattern when faced with more than one in
                    close proximity
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: already5chosen at yahoo dot com
  Target Milestone: ---

I am reporting about right shift issue, but left shift has the same issues as
well.

In theory, gcc knows how to calculate lower 64 bits of the right shift of
128-bit number with a single instruction when it is provable that shift count
is in range [0:63]. In practice, it does it only under very special condition.
See here: https://godbolt.org/z/fhdo8xhxW

foo1to1 is good
foo2to1 is good
foo1to2 starts well but is broken near the end but hyperactive vectorizer.
But that's a separate issue already reported in 105617.

foo2to2, foo2to3, foo3to4 - looks like compiler forgot all it knew about
double-word right shifts, or, more likely, forgot that (x % 64) is always in
range [0:63].

I am reporting it as a target issue despite being sure that the problem is not
in the x86-64 back end itself, but somehow in interaction between various
phases of optimizer. As 80+ percents of my reports.
However it's your call, not mine. In practice, an impact is most visible on
x86-64, because, due to existence of shrd instruction, x86-64 is potentially
very good in this sort of tasks. On ARM64 or on POWER64LE the relative slowdown
is lower, because an optimal code is not as fast.

P.S
82261 sounds similar, but I am not sure it is related.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-11  5:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 22:05 [Bug c/106220] New: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity already5chosen at yahoo dot com
2022-07-06 22:12 ` [Bug target/106220] " pinskia at gcc dot gnu.org
2022-07-06 22:14 ` pinskia at gcc dot gnu.org
2022-07-06 23:34 ` already5chosen at yahoo dot com
2022-11-11  5:05 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).