[Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal
@ 2021-07-13 10:55 ubizjak at gmail dot com
  2021-07-13 12:15 ` [Bug tree-optimization/101434] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2021-07-13 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101434

            Bug ID: 101434
           Summary: vector-by-vector left shift expansion for char/short
                    is not optimal
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase:

--cut here--
short r[8], a[8], b[8];

void f1 (void)
{
  int i;

  for (i = 0; i < 8; i++)
    r[i] = a[i] << b[i];
}
--cut here--

compiles with -O2 -ftree-vectorize -mxop to:

        vmovdqa a(%rip), %xmm0
        vmovdqa b(%rip), %xmm1
        vpmovsxwd       %xmm0, %xmm2
        vpsrldq $8, %xmm0, %xmm0
        vpmovsxwd       %xmm1, %xmm3
        vpsrldq $8, %xmm1, %xmm1
        vpshad  %xmm3, %xmm2, %xmm2
        vpmovsxwd       %xmm0, %xmm0
        vpmovsxwd       %xmm1, %xmm1
        vpshad  %xmm1, %xmm0, %xmm0
        vpperm  .LC0(%rip), %xmm0, %xmm2, %xmm2
        vmovdqa %xmm2, r(%rip)
        ret

SImode vpshad is used together with lots of other instructions, but a HImode
vpshaw should be emitted instead.

Similar testcase:

--cut here--
short r[8], a[8], b[8];

void f2 (void)
{
  int i;

  for (i = 0; i < 8; i++)
    r[i] = a[i] >> b[i];
}
--cut here--

results in expected HImode vect-by-vect shift insn:

        vpxor   %xmm0, %xmm0, %xmm0
        vpsubw  b(%rip), %xmm0, %xmm0
        vpshaw  %xmm0, a(%rip), %xmm0
        vmovdqa %xmm0, r(%rip)
        ret

(do not bother with vpxor and vpsubw, these are just one of XOP peculiarities.)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/101434] vector-by-vector left shift expansion for char/short is not optimal
  2021-07-13 10:55 [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal ubizjak at gmail dot com
@ 2021-07-13 12:15 ` rguenth at gcc dot gnu.org
  2021-07-13 12:20 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-13 12:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101434

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-07-13
             Target|                            |x86_64-*-* i?86-*-*
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Probably low priority if not doable nicely w/o XOP.

Note this is mainly due to integer promotion rules (we see shifts of int by
int)
and fear of introducing undefined behavior (the int by int shift has larger
valid ranges for the RHS than a truncated one).

There must be a duplicate bugreport.

IMHO we might consider to make shifts of smaller than int types with
out of bound shift amounts well-defined.  I think there's no way to
rewrite types to avoid the undefined behavior like we can do with
signed arithmetic -> unsigned arithmetic (besides division by -1 where
the sign matters).

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/101434] vector-by-vector left shift expansion for char/short is not optimal
  2021-07-13 10:55 [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal ubizjak at gmail dot com
  2021-07-13 12:15 ` [Bug tree-optimization/101434] " rguenth at gcc dot gnu.org
@ 2021-07-13 12:20 ` rguenth at gcc dot gnu.org
  2021-07-13 12:23 ` ubizjak at gmail dot com
  2021-08-25  3:27 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-13 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101434

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So technically 

  (int)short-var << a

->  short-var << (min (a, 15))

we know a is <= 31 because of the int shift (and >= 0) but we cannot simply
emit short-var << a because how the target behaves is not well-defined
(SHIFT_COUNT_TRUNCATED) but the behavior is well-defined for the int << int
shift.  Pattern recog has code to deal with this in theory but it gives up
here and does not bother to emit a min ().

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/101434] vector-by-vector left shift expansion for char/short is not optimal
  2021-07-13 10:55 [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal ubizjak at gmail dot com
  2021-07-13 12:15 ` [Bug tree-optimization/101434] " rguenth at gcc dot gnu.org
  2021-07-13 12:20 ` rguenth at gcc dot gnu.org
@ 2021-07-13 12:23 ` ubizjak at gmail dot com
  2021-08-25  3:27 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: ubizjak at gmail dot com @ 2021-07-13 12:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101434

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> Probably low priority if not doable nicely w/o XOP.

-mxop can be substituted with -mavx512bw -mavx512vl for the same effect.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/101434] vector-by-vector left shift expansion for char/short is not optimal
  2021-07-13 10:55 [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2021-07-13 12:23 ` ubizjak at gmail dot com
@ 2021-08-25  3:27 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  3:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101434

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Richard Biener from comment #1)
> > Probably low priority if not doable nicely w/o XOP.
> 
> -mxop can be substituted with -mavx512bw -mavx512vl for the same effect.

or -mavx2.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-25  3:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 10:55 [Bug tree-optimization/101434] New: vector-by-vector left shift expansion for char/short is not optimal ubizjak at gmail dot com
2021-07-13 12:15 ` [Bug tree-optimization/101434] " rguenth at gcc dot gnu.org
2021-07-13 12:20 ` rguenth at gcc dot gnu.org
2021-07-13 12:23 ` ubizjak at gmail dot com
2021-08-25  3:27 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).