[Bug target/95125] New: Unoptimal code for vectorized conversions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/95125] New: Unoptimal code for vectorized conversions
@ 2020-05-14 10:04 ubizjak at gmail dot com
  2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 10:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

            Bug ID: 95125
           Summary: Unoptimal code for vectorized conversions
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase

--cut here--
float f[4];
double d[4];
int i[4];

void
float_truncate (void)
{
  for (int n = 0; n < 4; n++)
    f[n] = d[n];
}

void
float_extend (void)
{
  for (int n = 0; n < 4; n++)
    d[n] = f[n];
}

void
float_float (void)
{
  for (int n = 0; n < 4; n++)
    f[n] = i[n];
}

void
fix_float (void)
{
  for (int n = 0; n < 4; n++)
    i[n] = f[n];
}

void
float_double (void)
{
  for (int n = 0; n < 4; n++)
    d[n] = i[n];
}

void
fix_double (void)
{
  for (int n = 0; n < 4; n++)
    i[n] = d[n];
}
--cut here--

when compiled with "-O3 -mavx" should result in a single conversion
instruction.

float_truncate:
        vxorps  %xmm0, %xmm0, %xmm0
        vcvtsd2ss       d+8(%rip), %xmm0, %xmm2
        vmovaps %xmm2, %xmm3
        vcvtsd2ss       d(%rip), %xmm0, %xmm1
        vcvtsd2ss       d+16(%rip), %xmm0, %xmm2
        vcvtsd2ss       d+24(%rip), %xmm0, %xmm0
        vunpcklps       %xmm0, %xmm2, %xmm2
        vunpcklps       %xmm3, %xmm1, %xmm0
        vmovlhps        %xmm2, %xmm0, %xmm0
        vmovaps %xmm0, f(%rip)
        ret

float_extend:
        vcvtps2pd       f(%rip), %xmm0
        vmovapd %xmm0, d(%rip)
        vxorps  %xmm0, %xmm0, %xmm0
        vmovlps f+8(%rip), %xmm0, %xmm0
        vcvtps2pd       %xmm0, %xmm0
        vmovapd %xmm0, d+16(%rip)
        ret

float_float:
        vcvtdq2ps       i(%rip), %xmm0
        vmovaps %xmm0, f(%rip)
        ret

fix_float:
        vcvttps2dq      f(%rip), %xmm0
        vmovdqa %xmm0, i(%rip)
        ret

float_double:
        vcvtdq2pd       i(%rip), %xmm0
        vmovapd %xmm0, d(%rip)
        vpshufd $238, i(%rip), %xmm0
        vcvtdq2pd       %xmm0, %xmm0
        vmovapd %xmm0, d+16(%rip)
        ret

fix_double:
        pushq   %rbp
        vmovapd d(%rip), %xmm1
        vinsertf128     $0x1, d+16(%rip), %ymm1, %ymm0
        movq    %rsp, %rbp
        vcvttpd2dqy     %ymm0, %xmm0
        vmovdqa %xmm0, i(%rip)
        vzeroupper
        popq    %rbp
        ret

Clang manages to emit optimal code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
@ 2020-05-14 11:55 ` rguenth at gcc dot gnu.org
  2020-05-14 12:40 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-14 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |11.0
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-05-14
             Target|                            |x86_64-*-* i?86-*-*
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
ISTR I filed a duplicate 10 years ago or so.  The issue is the vectorizer
could not handle V4DFmode -> V4SFmode conversions.

Could, because for SVE we added the capability but this requires
additional instruction patterns (IIRC I filed a but about this last
year).  Yep.  PR92658 it is.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
  2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
@ 2020-05-14 12:40 ` ubizjak at gmail dot com
  2020-05-14 12:43 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 12:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> ISTR I filed a duplicate 10 years ago or so.  The issue is the vectorizer
> could not handle V4DFmode -> V4SFmode conversions.
> 
> Could, because for SVE we added the capability but this requires
> additional instruction patterns (IIRC I filed a but about this last
> year).  Yep.  PR92658 it is.

Oh... yes. And it is even assigned to me. And there is a patch... ;)

Anyway, I got surprised, since my soon-to-be committed v2sf-v2df conversion
patch was able to fully vectorize similar testcase involving double[2] and
float[2], while code involving [4] compiled to he mess below.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
  2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
  2020-05-14 12:40 ` ubizjak at gmail dot com
@ 2020-05-14 12:43 ` ubizjak at gmail dot com
  2020-05-21  7:23 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
It turns out that a bunch of patterns have to be renamed (and testcases added).

Easyhack, waiting for someone to show some love to conversion patterns in
sse.md.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2020-05-14 12:43 ` ubizjak at gmail dot com
@ 2020-05-21  7:23 ` crazylht at gmail dot com
  2020-05-22  7:46 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-05-21  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
> 
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.

I'll take a look.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2020-05-21  7:23 ` crazylht at gmail dot com
@ 2020-05-22  7:46 ` crazylht at gmail dot com
  2020-05-22  8:00 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-05-22  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
> 
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.

expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.

if change **float_double fix_double** to
---
void
float_double (void)
{
    d[0] = i[0];
    d[1] = i[1];
    d[2] = i[2];
    d[3] = i[3];
}

void
fix_double (void)
{
    i[0] = d[0];
    i[1] = d[1];
    i[2] = d[2];
    i[3] = d[3];
}
----

it successfully generate

---
float_double():
        vcvtdq2pd       i(%rip), %ymm0
        vmovapd %ymm0, d(%rip)
        vzeroupper
        ret
fix_double():
        vcvttpd2dqy     d(%rip), %xmm0
        vmovdqa %xmm0, i(%rip)
        ret
l:
-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2020-05-22  7:46 ` crazylht at gmail dot com
@ 2020-05-22  8:00 ` ubizjak at gmail dot com
  2020-05-22  8:56 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-22  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Uroš Bizjak from comment #3)
> > It turns out that a bunch of patterns have to be renamed (and testcases
> > added).
> > 
> > Easyhack, waiting for someone to show some love to conversion patterns in
> > sse.md.
> 
> expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> 
> if change **float_double fix_double** to
> ---
> void
> float_double (void)
> {
>     d[0] = i[0];
>     d[1] = i[1];
>     d[2] = i[2];
>     d[3] = i[3];
> }

Hm, the above is vectorized, but the equivalent:

void
float_double (void)
{
  for (int n = 0; n < 4; n++)
    d[n] = i[n];
}

is not?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2020-05-22  8:00 ` ubizjak at gmail dot com
@ 2020-05-22  8:56 ` rguenth at gcc dot gnu.org
  2020-05-22  9:18 ` rsandifo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-22  8:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Uroš Bizjak from comment #3)
> > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > added).
> > > 
> > > Easyhack, waiting for someone to show some love to conversion patterns in
> > > sse.md.
> > 
> > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> > 
> > if change **float_double fix_double** to
> > ---
> > void
> > float_double (void)
> > {
> >     d[0] = i[0];
> >     d[1] = i[1];
> >     d[2] = i[2];
> >     d[3] = i[3];
> > }
> 
> Hm, the above is vectorized, but the equivalent:
> 
> void
> float_double (void)
> {
>   for (int n = 0; n < 4; n++)
>     d[n] = i[n];
> }
> 
> is not?

Yes, we're committing to a too high VF here, likely because we pick the
"wrong" vector mode too early.  We could eventually fix this up in
the early vectype analysis.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2020-05-22  8:56 ` rguenth at gcc dot gnu.org
@ 2020-05-22  9:18 ` rsandifo at gcc dot gnu.org
  2020-05-25  1:58 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-05-22  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> > > (In reply to Uroš Bizjak from comment #3)
> > > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > > added).
> > > > 
> > > > Easyhack, waiting for someone to show some love to conversion patterns in
> > > > sse.md.
> > > 
> > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> > > 
> > > if change **float_double fix_double** to
> > > ---
> > > void
> > > float_double (void)
> > > {
> > >     d[0] = i[0];
> > >     d[1] = i[1];
> > >     d[2] = i[2];
> > >     d[3] = i[3];
> > > }
> > 
> > Hm, the above is vectorized, but the equivalent:
> > 
> > void
> > float_double (void)
> > {
> >   for (int n = 0; n < 4; n++)
> >     d[n] = i[n];
> > }
> > 
> > is not?
> 
> Yes, we're committing to a too high VF here, likely because we pick the
> "wrong" vector mode too early.  We could eventually fix this up in
> the early vectype analysis.
It might be worth investigating VECT_COMPARE_COSTS, which weighs
the cost of different VFs against each other and is how SVE copes
with this.  I guess the danger is that it might interfere with
-mprefer-* options (although the first VF listed by
autovectorize_vector_modes wins in a tie).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2020-05-22  9:18 ` rsandifo at gcc dot gnu.org
@ 2020-05-25  1:58 ` cvs-commit at gcc dot gnu.org
  2021-08-03  3:28 ` pinskia at gcc dot gnu.org
  2021-08-03  3:35 ` crazylht at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-25  1:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:94c0409717bf8bf783963c1d50bb8f4a4732dce7

commit r11-596-g94c0409717bf8bf783963c1d50bb8f4a4732dce7
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat May 23 15:30:58 2020 +0800

    Add missing expander for vector float_extend and float_truncate.

    2020-05-25  Hongtao Liu  <hongtao.liu@intel.com>

    gcc/ChangeLog
            PR target/95125
            * config/i386/sse.md (sf2dfmode_lower): New mode attribute.
            (trunc<mode><sf2dfmode_lower>2) New expander.
            (extend<sf2dfmode_lower><mode>2): Ditto.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/pr95125-avx.c: New test.
            * gcc.target/i386/pr95125-avx512f.c: Ditto.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2020-05-25  1:58 ` cvs-commit at gcc dot gnu.org
@ 2021-08-03  3:28 ` pinskia at gcc dot gnu.org
  2021-08-03  3:35 ` crazylht at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-03  3:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
float_double and fix_double don't produce the best code yet.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95125] Unoptimal code for vectorized conversions
  2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2021-08-03  3:28 ` pinskia at gcc dot gnu.org
@ 2021-08-03  3:35 ` crazylht at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2021-08-03  3:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #10)
> float_double and fix_double don't produce the best code yet.

It's because loop vectorizer can only use one vector size, since BB vect
supports different vector sizes in the same instance, w/ "-O2
-ftree-slp-vectorize -march=skylake-avx512 -funroll-loops" produce optimal
codes, this is related to PR101097.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-03  3:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
2020-05-14 12:40 ` ubizjak at gmail dot com
2020-05-14 12:43 ` ubizjak at gmail dot com
2020-05-21  7:23 ` crazylht at gmail dot com
2020-05-22  7:46 ` crazylht at gmail dot com
2020-05-22  8:00 ` ubizjak at gmail dot com
2020-05-22  8:56 ` rguenth at gcc dot gnu.org
2020-05-22  9:18 ` rsandifo at gcc dot gnu.org
2020-05-25  1:58 ` cvs-commit at gcc dot gnu.org
2021-08-03  3:28 ` pinskia at gcc dot gnu.org
2021-08-03  3:35 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).