public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95125] New: Unoptimal code for vectorized conversions
@ 2020-05-14 10:04 ubizjak at gmail dot com
2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 10:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
Bug ID: 95125
Summary: Unoptimal code for vectorized conversions
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
Following testcase
--cut here--
float f[4];
double d[4];
int i[4];
void
float_truncate (void)
{
for (int n = 0; n < 4; n++)
f[n] = d[n];
}
void
float_extend (void)
{
for (int n = 0; n < 4; n++)
d[n] = f[n];
}
void
float_float (void)
{
for (int n = 0; n < 4; n++)
f[n] = i[n];
}
void
fix_float (void)
{
for (int n = 0; n < 4; n++)
i[n] = f[n];
}
void
float_double (void)
{
for (int n = 0; n < 4; n++)
d[n] = i[n];
}
void
fix_double (void)
{
for (int n = 0; n < 4; n++)
i[n] = d[n];
}
--cut here--
when compiled with "-O3 -mavx" should result in a single conversion
instruction.
float_truncate:
vxorps %xmm0, %xmm0, %xmm0
vcvtsd2ss d+8(%rip), %xmm0, %xmm2
vmovaps %xmm2, %xmm3
vcvtsd2ss d(%rip), %xmm0, %xmm1
vcvtsd2ss d+16(%rip), %xmm0, %xmm2
vcvtsd2ss d+24(%rip), %xmm0, %xmm0
vunpcklps %xmm0, %xmm2, %xmm2
vunpcklps %xmm3, %xmm1, %xmm0
vmovlhps %xmm2, %xmm0, %xmm0
vmovaps %xmm0, f(%rip)
ret
float_extend:
vcvtps2pd f(%rip), %xmm0
vmovapd %xmm0, d(%rip)
vxorps %xmm0, %xmm0, %xmm0
vmovlps f+8(%rip), %xmm0, %xmm0
vcvtps2pd %xmm0, %xmm0
vmovapd %xmm0, d+16(%rip)
ret
float_float:
vcvtdq2ps i(%rip), %xmm0
vmovaps %xmm0, f(%rip)
ret
fix_float:
vcvttps2dq f(%rip), %xmm0
vmovdqa %xmm0, i(%rip)
ret
float_double:
vcvtdq2pd i(%rip), %xmm0
vmovapd %xmm0, d(%rip)
vpshufd $238, i(%rip), %xmm0
vcvtdq2pd %xmm0, %xmm0
vmovapd %xmm0, d+16(%rip)
ret
fix_double:
pushq %rbp
vmovapd d(%rip), %xmm1
vinsertf128 $0x1, d+16(%rip), %ymm1, %ymm0
movq %rsp, %rbp
vcvttpd2dqy %ymm0, %xmm0
vmovdqa %xmm0, i(%rip)
vzeroupper
popq %rbp
ret
Clang manages to emit optimal code.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
@ 2020-05-14 11:55 ` rguenth at gcc dot gnu.org
2020-05-14 12:40 ` ubizjak at gmail dot com
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-14 11:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |11.0
Ever confirmed|0 |1
Last reconfirmed| |2020-05-14
Target| |x86_64-*-* i?86-*-*
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
ISTR I filed a duplicate 10 years ago or so. The issue is the vectorizer
could not handle V4DFmode -> V4SFmode conversions.
Could, because for SVE we added the capability but this requires
additional instruction patterns (IIRC I filed a but about this last
year). Yep. PR92658 it is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
@ 2020-05-14 12:40 ` ubizjak at gmail dot com
2020-05-14 12:43 ` ubizjak at gmail dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 12:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> ISTR I filed a duplicate 10 years ago or so. The issue is the vectorizer
> could not handle V4DFmode -> V4SFmode conversions.
>
> Could, because for SVE we added the capability but this requires
> additional instruction patterns (IIRC I filed a but about this last
> year). Yep. PR92658 it is.
Oh... yes. And it is even assigned to me. And there is a patch... ;)
Anyway, I got surprised, since my soon-to-be committed v2sf-v2df conversion
patch was able to fully vectorize similar testcase involving double[2] and
float[2], while code involving [4] compiled to he mess below.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
2020-05-14 12:40 ` ubizjak at gmail dot com
@ 2020-05-14 12:43 ` ubizjak at gmail dot com
2020-05-21 7:23 ` crazylht at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-14 12:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
It turns out that a bunch of patterns have to be renamed (and testcases added).
Easyhack, waiting for someone to show some love to conversion patterns in
sse.md.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (2 preceding siblings ...)
2020-05-14 12:43 ` ubizjak at gmail dot com
@ 2020-05-21 7:23 ` crazylht at gmail dot com
2020-05-22 7:46 ` crazylht at gmail dot com
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-05-21 7:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
>
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.
I'll take a look.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (3 preceding siblings ...)
2020-05-21 7:23 ` crazylht at gmail dot com
@ 2020-05-22 7:46 ` crazylht at gmail dot com
2020-05-22 8:00 ` ubizjak at gmail dot com
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-05-22 7:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #3)
> It turns out that a bunch of patterns have to be renamed (and testcases
> added).
>
> Easyhack, waiting for someone to show some love to conversion patterns in
> sse.md.
expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
if change **float_double fix_double** to
---
void
float_double (void)
{
d[0] = i[0];
d[1] = i[1];
d[2] = i[2];
d[3] = i[3];
}
void
fix_double (void)
{
i[0] = d[0];
i[1] = d[1];
i[2] = d[2];
i[3] = d[3];
}
----
it successfully generate
---
float_double():
vcvtdq2pd i(%rip), %ymm0
vmovapd %ymm0, d(%rip)
vzeroupper
ret
fix_double():
vcvttpd2dqy d(%rip), %xmm0
vmovdqa %xmm0, i(%rip)
ret
l:
-----
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (4 preceding siblings ...)
2020-05-22 7:46 ` crazylht at gmail dot com
@ 2020-05-22 8:00 ` ubizjak at gmail dot com
2020-05-22 8:56 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2020-05-22 8:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Uroš Bizjak from comment #3)
> > It turns out that a bunch of patterns have to be renamed (and testcases
> > added).
> >
> > Easyhack, waiting for someone to show some love to conversion patterns in
> > sse.md.
>
> expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
>
> if change **float_double fix_double** to
> ---
> void
> float_double (void)
> {
> d[0] = i[0];
> d[1] = i[1];
> d[2] = i[2];
> d[3] = i[3];
> }
Hm, the above is vectorized, but the equivalent:
void
float_double (void)
{
for (int n = 0; n < 4; n++)
d[n] = i[n];
}
is not?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (5 preceding siblings ...)
2020-05-22 8:00 ` ubizjak at gmail dot com
@ 2020-05-22 8:56 ` rguenth at gcc dot gnu.org
2020-05-22 9:18 ` rsandifo at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-22 8:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Uroš Bizjak from comment #3)
> > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > added).
> > >
> > > Easyhack, waiting for someone to show some love to conversion patterns in
> > > sse.md.
> >
> > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> >
> > if change **float_double fix_double** to
> > ---
> > void
> > float_double (void)
> > {
> > d[0] = i[0];
> > d[1] = i[1];
> > d[2] = i[2];
> > d[3] = i[3];
> > }
>
> Hm, the above is vectorized, but the equivalent:
>
> void
> float_double (void)
> {
> for (int n = 0; n < 4; n++)
> d[n] = i[n];
> }
>
> is not?
Yes, we're committing to a too high VF here, likely because we pick the
"wrong" vector mode too early. We could eventually fix this up in
the early vectype analysis.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (6 preceding siblings ...)
2020-05-22 8:56 ` rguenth at gcc dot gnu.org
@ 2020-05-22 9:18 ` rsandifo at gcc dot gnu.org
2020-05-25 1:58 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-05-22 9:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> > > (In reply to Uroš Bizjak from comment #3)
> > > > It turns out that a bunch of patterns have to be renamed (and testcases
> > > > added).
> > > >
> > > > Easyhack, waiting for someone to show some love to conversion patterns in
> > > > sse.md.
> > >
> > > expander for floatv4siv4df2, fix_truncv4dfv4si2 already exists.
> > >
> > > if change **float_double fix_double** to
> > > ---
> > > void
> > > float_double (void)
> > > {
> > > d[0] = i[0];
> > > d[1] = i[1];
> > > d[2] = i[2];
> > > d[3] = i[3];
> > > }
> >
> > Hm, the above is vectorized, but the equivalent:
> >
> > void
> > float_double (void)
> > {
> > for (int n = 0; n < 4; n++)
> > d[n] = i[n];
> > }
> >
> > is not?
>
> Yes, we're committing to a too high VF here, likely because we pick the
> "wrong" vector mode too early. We could eventually fix this up in
> the early vectype analysis.
It might be worth investigating VECT_COMPARE_COSTS, which weighs
the cost of different VFs against each other and is how SVE copes
with this. I guess the danger is that it might interfere with
-mprefer-* options (although the first VF listed by
autovectorize_vector_modes wins in a tie).
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (7 preceding siblings ...)
2020-05-22 9:18 ` rsandifo at gcc dot gnu.org
@ 2020-05-25 1:58 ` cvs-commit at gcc dot gnu.org
2021-08-03 3:28 ` pinskia at gcc dot gnu.org
2021-08-03 3:35 ` crazylht at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-25 1:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:94c0409717bf8bf783963c1d50bb8f4a4732dce7
commit r11-596-g94c0409717bf8bf783963c1d50bb8f4a4732dce7
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat May 23 15:30:58 2020 +0800
Add missing expander for vector float_extend and float_truncate.
2020-05-25 Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
PR target/95125
* config/i386/sse.md (sf2dfmode_lower): New mode attribute.
(trunc<mode><sf2dfmode_lower>2) New expander.
(extend<sf2dfmode_lower><mode>2): Ditto.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr95125-avx.c: New test.
* gcc.target/i386/pr95125-avx512f.c: Ditto.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (8 preceding siblings ...)
2020-05-25 1:58 ` cvs-commit at gcc dot gnu.org
@ 2021-08-03 3:28 ` pinskia at gcc dot gnu.org
2021-08-03 3:35 ` crazylht at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-03 3:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
float_double and fix_double don't produce the best code yet.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95125] Unoptimal code for vectorized conversions
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
` (9 preceding siblings ...)
2021-08-03 3:28 ` pinskia at gcc dot gnu.org
@ 2021-08-03 3:35 ` crazylht at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2021-08-03 3:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125
--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Andrew Pinski from comment #10)
> float_double and fix_double don't produce the best code yet.
It's because loop vectorizer can only use one vector size, since BB vect
supports different vector sizes in the same instance, w/ "-O2
-ftree-slp-vectorize -march=skylake-avx512 -funroll-loops" produce optimal
codes, this is related to PR101097.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-03 3:35 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 10:04 [Bug target/95125] New: Unoptimal code for vectorized conversions ubizjak at gmail dot com
2020-05-14 11:55 ` [Bug target/95125] " rguenth at gcc dot gnu.org
2020-05-14 12:40 ` ubizjak at gmail dot com
2020-05-14 12:43 ` ubizjak at gmail dot com
2020-05-21 7:23 ` crazylht at gmail dot com
2020-05-22 7:46 ` crazylht at gmail dot com
2020-05-22 8:00 ` ubizjak at gmail dot com
2020-05-22 8:56 ` rguenth at gcc dot gnu.org
2020-05-22 9:18 ` rsandifo at gcc dot gnu.org
2020-05-25 1:58 ` cvs-commit at gcc dot gnu.org
2021-08-03 3:28 ` pinskia at gcc dot gnu.org
2021-08-03 3:35 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).