public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab
@ 2024-06-17 8:05 liuhongt at gcc dot gnu.org
2024-06-18 6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-17 8:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
Bug ID: 115517
Summary: Fix regression after dropping uses of
vcond{,u,eq}_optab
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Depends on: 114189
Target Milestone: ---
Target: x86_64-*-* i?86-*-*
> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > I know riscv doesn't implement any of the legacy optabs. But less
> > maintained vector targets might need adjustments.
> >
> At GCC14, I tried to remove these expanders in the x86 backend, and it
> regressed some testcases, mainly because of the optimizations we did
> in ix86_expand_{int,fp}_vcond.
> I've started testing your patch, it's possible that we still need to
> move the ix86_expand_{int,fp}_vcond optimizations to the
> middle-end(isel or match.pd)or add extra patterns to handle it at the
> rtl pas_combine.
These are new failures I got
g++: g++.target/i386/avx-pr54700-1.C scan-assembler-not vpcmpgt[bdq]
g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times vblendvpd 4
g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times vblendvps 4
g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times vpblendvb 2
g++: g++.target/i386/avx2-pr54700-1.C scan-assembler-not vpcmpgt[bdq]
g++: g++.target/i386/avx2-pr54700-1.C scan-assembler-times vblendvpd 4
g++: g++.target/i386/avx2-pr54700-1.C scan-assembler-times vblendvps 4
g++: g++.target/i386/avx2-pr54700-1.C scan-assembler-times vpblendvb 2
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++14
g++scan-assembler-times vmaxph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++14
g++scan-assembler-times vminph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++17
g++scan-assembler-times vmaxph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++17
g++scan-assembler-times vminph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++20
g++scan-assembler-times vmaxph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++20
g++scan-assembler-times vminph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++98
g++scan-assembler-times vmaxph 3
g++: g++.target/i386/avx512fp16-vcondmn-minmax.C -std=gnu++98
g++scan-assembler-times vminph 3
g++: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times
g++pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times
g++pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times
g++pcmpeqw 2
g++: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times
g++pcmpeqw 2
g++: g++.target/i386/pr100738-1.C -std=gnu++14 scan-assembler-not
g++vpcmpeqd[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++14 scan-assembler-not
g++vpxor[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++14 scan-assembler-times
g++vblendvps[ \\t] 2
g++: g++.target/i386/pr100738-1.C -std=gnu++17 scan-assembler-not
g++vpcmpeqd[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++17 scan-assembler-not
g++vpxor[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++17 scan-assembler-times
g++vblendvps[ \\t] 2
g++: g++.target/i386/pr100738-1.C -std=gnu++20 scan-assembler-not
g++vpcmpeqd[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++20 scan-assembler-not
g++vpxor[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++20 scan-assembler-times
g++vblendvps[ \\t] 2
g++: g++.target/i386/pr100738-1.C -std=gnu++98 scan-assembler-not
g++vpcmpeqd[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++98 scan-assembler-not
g++vpxor[ \\t]
g++: g++.target/i386/pr100738-1.C -std=gnu++98 scan-assembler-times
g++vblendvps[ \\t] 2
g++: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times
g++pcmpeqb 2
g++: g++.target/i386/pr61747.C -std=gnu++14 scan-assembler-times max 4
g++: g++.target/i386/pr61747.C -std=gnu++14 scan-assembler-times min 4
g++: g++.target/i386/pr61747.C -std=gnu++17 scan-assembler-times max 4
g++: g++.target/i386/pr61747.C -std=gnu++17 scan-assembler-times min 4
g++: g++.target/i386/pr61747.C -std=gnu++20 scan-assembler-times max 4
g++: g++.target/i386/pr61747.C -std=gnu++20 scan-assembler-times min 4
g++: g++.target/i386/sse4_1-pr54700-1.C scan-assembler-not pcmpgt[bdq]
g++: g++.target/i386/sse4_1-pr54700-1.C scan-assembler-times blendvpd 4
g++: g++.target/i386/sse4_1-pr54700-1.C scan-assembler-times blendvps 4
g++: g++.target/i386/sse4_1-pr54700-1.C scan-assembler-times pblendvb 2
gcc: gcc.target/i386/avx2-pr99908.c scan-assembler-not \tvpcmpeq
gcc: gcc.target/i386/avx512bw-pr96891-1.c scan-assembler-not %k[0-7]
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-not %k[0-9]
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsb[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsd[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsq[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsw[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminub[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminud[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuq[\t ] 2
gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuw[\t ] 2
gcc: gcc.target/i386/blendv-3.c scan-assembler-not vpcmp
gcc: gcc.target/i386/pr88540.c scan-assembler minpd
gcc: gcc.target/i386/sse4_1-pr99908.c scan-assembler-not \tpcmpeq
unix/-m32: g++: g++.target/i386/avx-pr54700-1.C scan-assembler-not
vpcmpgt[bdq]
unix/-m32: g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times
vblendvpd 4
unix/-m32: g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times
vblendvps 4
unix/-m32: g++: g++.target/i386/avx-pr54700-1.C scan-assembler-times
vpblendvb 2
unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C scan-assembler-not
vpcmpgt[bdq]
unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vblendvpd 4
unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vblendvps 4
unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vpblendvb 2
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++14 scan-assembler-times vmaxph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++14 scan-assembler-times vminph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++17 scan-assembler-times vmaxph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++17 scan-assembler-times vminph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++20 scan-assembler-times vmaxph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++20 scan-assembler-times vminph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++98 scan-assembler-times vmaxph 3
unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++98 scan-assembler-times vminph 3
unix/-m32: g++: g++.target/i386/pr100637-1b.C -std=gnu++14
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr100637-1b.C -std=gnu++17
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr100637-1b.C -std=gnu++20
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr100637-1b.C -std=gnu++98
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr100637-1w.C -std=gnu++14
scan-assembler-times pcmpeqw 2
unix/-m32: g++: g++.target/i386/pr100637-1w.C -std=gnu++17
scan-assembler-times pcmpeqw 2
unix/-m32: g++: g++.target/i386/pr100637-1w.C -std=gnu++20
scan-assembler-times pcmpeqw 2
unix/-m32: g++: g++.target/i386/pr100637-1w.C -std=gnu++98
scan-assembler-times pcmpeqw 2
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++14
scan-assembler-not vpcmpeqd[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++14
scan-assembler-not vpxor[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++14
scan-assembler-times vblendvps[ \\t] 2
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++17
scan-assembler-not vpcmpeqd[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++17
scan-assembler-not vpxor[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++17
scan-assembler-times vblendvps[ \\t] 2
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++20
scan-assembler-not vpcmpeqd[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++20
scan-assembler-not vpxor[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++20
scan-assembler-times vblendvps[ \\t] 2
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++98
scan-assembler-not vpcmpeqd[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++98
scan-assembler-not vpxor[ \\t]
unix/-m32: g++: g++.target/i386/pr100738-1.C -std=gnu++98
scan-assembler-times vblendvps[ \\t] 2
unix/-m32: g++: g++.target/i386/pr103861-1.C -std=gnu++14
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr103861-1.C -std=gnu++17
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr103861-1.C -std=gnu++20
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr103861-1.C -std=gnu++98
scan-assembler-times pcmpeqb 2
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++14
scan-assembler-times max 4
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++14
scan-assembler-times min 4
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++17
scan-assembler-times max 4
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++17
scan-assembler-times min 4
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++20
scan-assembler-times max 4
unix/-m32: g++: g++.target/i386/pr61747.C -std=gnu++20
scan-assembler-times min 4
unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-not pcmpgt[bdq]
unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times blendvpd 4
unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times blendvps 4
unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times pblendvb 2
unix/-m32: gcc: gcc.target/i386/avx2-pr99908.c scan-assembler-not \tvpcmpeq
unix/-m32: gcc: gcc.target/i386/avx512bw-pr96891-1.c scan-assembler-not %k[0-7]
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-not %k[0-9]
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsb[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsd[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsq[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsw[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminub[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminud[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminuq[\t ] 2
unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminuw[\t ] 2
unix/-m32: gcc: gcc.target/i386/blendv-3.c scan-assembler-not vpcmp
unix/-m32: gcc: gcc.target/i386/pr88540.c scan-assembler minpd
unix/-m32: gcc: gcc.target/i386/sse4_1-pr99908.c scan-assembler-not \tpcmpeq
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189
[Bug 114189] Target implements obsolete vcond{,u,eq} expanders
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
@ 2024-06-18 6:20 ` rguenth at gcc dot gnu.org
2024-06-18 8:39 ` liuhongt at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-18 6:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, I had opened PR115490 with my results for this already. Some mitigation
should be from optimizing ISEL expansion to vcond_mask and I'd start with
looking at some of the fallout from that side (note that might require
the backend reject not natively implemented vec_cmp via its operand 1
predicate)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
2024-06-18 6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
@ 2024-06-18 8:39 ` liuhongt at gcc dot gnu.org
2024-06-18 10:49 ` rguenther at suse dot de
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18 8:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Btw, I had opened PR115490 with my results for this already. Some mitigation
> should be from optimizing ISEL expansion to vcond_mask and I'd start with
> looking at some of the fallout from that side (note that might require
> the backend reject not natively implemented vec_cmp via its operand 1
> predicate)
w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
negative the vector mask)
If we restrict the predicate of operand 1, would middle-end reject
vectorization (or lower it to scalar version)?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
2024-06-18 6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
2024-06-18 8:39 ` liuhongt at gcc dot gnu.org
@ 2024-06-18 10:49 ` rguenther at suse dot de
2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2024-06-18 10:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
>
> --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> > Btw, I had opened PR115490 with my results for this already. Some mitigation
> > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > looking at some of the fallout from that side (note that might require
> > the backend reject not natively implemented vec_cmp via its operand 1
> > predicate)
>
> w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> negative the vector mask)
> If we restrict the predicate of operand 1, would middle-end reject
> vectorization (or lower it to scalar version)?
Richard suggests that we implement the "obvious" transforms like
inversion in the middle-end but if for example unsigned compares
are not supported the us_minus + eq + negative trick isn't on
that list.
The main reason to restrict vec_cmp would be to avoid
a <= b ? c : d going with an unsupported vec_cmp but instead
do a > b ? d : c - the alternative is trying to fix this
on the RTL side via combine. I understand the non-native
compares are already expanded to supported form and we
don't use a split after combine to make combinations to
a supported form easier?
I don't have a good feeling which approach is going to be better
maintainable here. But for example even for the unsigned compare
"lowering" the middle-end would have range info while RTL does
not (to some extent it's available at RTL expansion time).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
` (2 preceding siblings ...)
2024-06-18 10:49 ` rguenther at suse dot de
@ 2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
2024-06-18 11:17 ` rguenther at suse dot de
2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18 11:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #3)
> On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> >
> > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #1)
> > > Btw, I had opened PR115490 with my results for this already. Some mitigation
> > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > looking at some of the fallout from that side (note that might require
> > > the backend reject not natively implemented vec_cmp via its operand 1
> > > predicate)
> >
> > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > negative the vector mask)
> > If we restrict the predicate of operand 1, would middle-end reject
> > vectorization (or lower it to scalar version)?
>
> Richard suggests that we implement the "obvious" transforms like
> inversion in the middle-end but if for example unsigned compares
> are not supported the us_minus + eq + negative trick isn't on
> that list.
>
> The main reason to restrict vec_cmp would be to avoid
> a <= b ? c : d going with an unsupported vec_cmp but instead
> do a > b ? d : c - the alternative is trying to fix this
> on the RTL side via combine. I understand the non-native
Yes, I have a patch which can fix most regressions via pattern match in
combine.
Still there is a situation that is difficult to deal with, mainly the
optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
vcond_mask, and the combine matches up to 4 instructions, which makes it
currently impossible to use the combine to recover those optimizations in the
vcond{,u,eq}.i.e min/max.
In the case of sse 4.1 and above, there is basically no regression anymore.
the regression testcases w/o sse4.1
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times pcmpeqb
2
FAIL: gcc.target/i386/pr88540.c scan-assembler minpd
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
` (3 preceding siblings ...)
2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
@ 2024-06-18 11:17 ` rguenther at suse dot de
2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2024-06-18 11:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
>
> --- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #3)
> > On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > >
> > > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #1)
> > > > Btw, I had opened PR115490 with my results for this already. Some mitigation
> > > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > > looking at some of the fallout from that side (note that might require
> > > > the backend reject not natively implemented vec_cmp via its operand 1
> > > > predicate)
> > >
> > > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > > negative the vector mask)
> > > If we restrict the predicate of operand 1, would middle-end reject
> > > vectorization (or lower it to scalar version)?
> >
> > Richard suggests that we implement the "obvious" transforms like
> > inversion in the middle-end but if for example unsigned compares
> > are not supported the us_minus + eq + negative trick isn't on
> > that list.
> >
> > The main reason to restrict vec_cmp would be to avoid
> > a <= b ? c : d going with an unsupported vec_cmp but instead
> > do a > b ? d : c - the alternative is trying to fix this
> > on the RTL side via combine. I understand the non-native
>
> Yes, I have a patch which can fix most regressions via pattern match in
> combine.
> Still there is a situation that is difficult to deal with, mainly the
> optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
> sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
> vcond_mask, and the combine matches up to 4 instructions, which makes it
> currently impossible to use the combine to recover those optimizations in the
> vcond{,u,eq}.i.e min/max.
> In the case of sse 4.1 and above, there is basically no regression anymore.
Maybe it's possible to use a define_insn_and_split for blends w/o SSE 4.1?
That would allow combine matching the high-level blend operation and
we'd only lower it afterwards? The question is what we lose in
combinations of/into the loweredn pand/pandn/por of course.
Maybe it's possible to catch the higher-level optimization (min/max)
on the GIMPLE level instead?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
` (4 preceding siblings ...)
2024-06-18 11:17 ` rguenther at suse dot de
@ 2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18 11:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #5)
> On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> >
> > --- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to rguenther@suse.de from comment #3)
> > > On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > > >
> > > > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > > (In reply to Richard Biener from comment #1)
> > > > > Btw, I had opened PR115490 with my results for this already. Some mitigation
> > > > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > > > looking at some of the fallout from that side (note that might require
> > > > > the backend reject not natively implemented vec_cmp via its operand 1
> > > > > predicate)
> > > >
> > > > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > > > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > > > negative the vector mask)
> > > > If we restrict the predicate of operand 1, would middle-end reject
> > > > vectorization (or lower it to scalar version)?
> > >
> > > Richard suggests that we implement the "obvious" transforms like
> > > inversion in the middle-end but if for example unsigned compares
> > > are not supported the us_minus + eq + negative trick isn't on
> > > that list.
> > >
> > > The main reason to restrict vec_cmp would be to avoid
> > > a <= b ? c : d going with an unsupported vec_cmp but instead
> > > do a > b ? d : c - the alternative is trying to fix this
> > > on the RTL side via combine. I understand the non-native
> >
> > Yes, I have a patch which can fix most regressions via pattern match in
> > combine.
> > Still there is a situation that is difficult to deal with, mainly the
> > optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
> > sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
> > vcond_mask, and the combine matches up to 4 instructions, which makes it
> > currently impossible to use the combine to recover those optimizations in the
> > vcond{,u,eq}.i.e min/max.
> > In the case of sse 4.1 and above, there is basically no regression anymore.
>
> Maybe it's possible to use a define_insn_and_split for blends w/o SSE 4.1?
> That would allow combine matching the high-level blend operation and
> we'd only lower it afterwards? The question is what we lose in
> combinations of/into the loweredn pand/pandn/por of course.
I'd rather live with those regressions since they're only existed below sse4.1.
>
> Maybe it's possible to catch the higher-level optimization (min/max)
> on the GIMPLE level instead?
For integral part, I believe the optimization is already there at gimple level.
For floating point part, x86 {max,min}{ps,pd} is not ieee-conformant, it's a
exact match of cond_expr a < b ? a : b (w/ consideration of -0.0 and NAN.)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-18 11:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-17 8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
2024-06-18 6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
2024-06-18 8:39 ` liuhongt at gcc dot gnu.org
2024-06-18 10:49 ` rguenther at suse dot de
2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
2024-06-18 11:17 ` rguenther at suse dot de
2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).