[Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab
@ 2024-06-17  8:05 liuhongt at gcc dot gnu.org
  2024-06-18  6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-17  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

            Bug ID: 115517
           Summary: Fix regression after dropping uses of
                    vcond{,u,eq}_optab
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
        Depends on: 114189
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

> I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > I know riscv doesn't implement any of the legacy optabs.  But less
> > maintained vector targets might need adjustments.
> >
> At GCC14, I tried to remove these expanders in the x86 backend, and it
> regressed some testcases, mainly because of the optimizations we did
> in ix86_expand_{int,fp}_vcond.
> I've started testing your patch, it's possible that we still need to
> move the ix86_expand_{int,fp}_vcond optimizations to the
> middle-end(isel or match.pd)or add extra patterns to handle it at the
> rtl pas_combine.
These are new failures I got

g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-not vpcmpgt[bdq]

g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vblendvpd 4

g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vblendvps 4

g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vpblendvb 2

g++: g++.target/i386/avx2-pr54700-1.C   scan-assembler-not vpcmpgt[bdq]

g++: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vblendvpd 4

g++: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vblendvps 4

g++: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vpblendvb 2

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++14

g++scan-assembler-times vmaxph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++14

g++scan-assembler-times vminph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++17

g++scan-assembler-times vmaxph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++17

g++scan-assembler-times vminph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++20

g++scan-assembler-times vmaxph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++20

g++scan-assembler-times vminph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++98

g++scan-assembler-times vmaxph 3

g++: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++98

g++scan-assembler-times vminph 3

g++: g++.target/i386/pr100637-1b.C  -std=gnu++14  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr100637-1b.C  -std=gnu++17  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr100637-1b.C  -std=gnu++20  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr100637-1b.C  -std=gnu++98  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr100637-1w.C  -std=gnu++14  scan-assembler-times

g++pcmpeqw 2

g++: g++.target/i386/pr100637-1w.C  -std=gnu++17  scan-assembler-times

g++pcmpeqw 2

g++: g++.target/i386/pr100637-1w.C  -std=gnu++20  scan-assembler-times

g++pcmpeqw 2

g++: g++.target/i386/pr100637-1w.C  -std=gnu++98  scan-assembler-times

g++pcmpeqw 2

g++: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-not

g++vpcmpeqd[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-not

g++vpxor[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-times

g++vblendvps[ \\t] 2

g++: g++.target/i386/pr100738-1.C  -std=gnu++17  scan-assembler-not

g++vpcmpeqd[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++17  scan-assembler-not

g++vpxor[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++17  scan-assembler-times

g++vblendvps[ \\t] 2

g++: g++.target/i386/pr100738-1.C  -std=gnu++20  scan-assembler-not

g++vpcmpeqd[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++20  scan-assembler-not

g++vpxor[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++20  scan-assembler-times

g++vblendvps[ \\t] 2

g++: g++.target/i386/pr100738-1.C  -std=gnu++98  scan-assembler-not

g++vpcmpeqd[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++98  scan-assembler-not

g++vpxor[ \\t]

g++: g++.target/i386/pr100738-1.C  -std=gnu++98  scan-assembler-times

g++vblendvps[ \\t] 2

g++: g++.target/i386/pr103861-1.C  -std=gnu++14  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr103861-1.C  -std=gnu++17  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr103861-1.C  -std=gnu++20  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr103861-1.C  -std=gnu++98  scan-assembler-times

g++pcmpeqb 2

g++: g++.target/i386/pr61747.C  -std=gnu++14  scan-assembler-times max 4

g++: g++.target/i386/pr61747.C  -std=gnu++14  scan-assembler-times min 4

g++: g++.target/i386/pr61747.C  -std=gnu++17  scan-assembler-times max 4

g++: g++.target/i386/pr61747.C  -std=gnu++17  scan-assembler-times min 4

g++: g++.target/i386/pr61747.C  -std=gnu++20  scan-assembler-times max 4

g++: g++.target/i386/pr61747.C  -std=gnu++20  scan-assembler-times min 4

g++: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-not pcmpgt[bdq]

g++: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times blendvpd 4

g++: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times blendvps 4

g++: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times pblendvb 2

gcc: gcc.target/i386/avx2-pr99908.c scan-assembler-not \tvpcmpeq

gcc: gcc.target/i386/avx512bw-pr96891-1.c scan-assembler-not %k[0-7]

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-not %k[0-9]

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsb[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsd[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsq[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsw[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminub[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminud[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuq[\t ] 2

gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuw[\t ] 2

gcc: gcc.target/i386/blendv-3.c scan-assembler-not vpcmp

gcc: gcc.target/i386/pr88540.c scan-assembler minpd

gcc: gcc.target/i386/sse4_1-pr99908.c scan-assembler-not \tpcmpeq

unix/-m32: g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-not
vpcmpgt[bdq]

unix/-m32: g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times
vblendvpd 4

unix/-m32: g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times
vblendvps 4

unix/-m32: g++: g++.target/i386/avx-pr54700-1.C   scan-assembler-times
vpblendvb 2

unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C   scan-assembler-not
vpcmpgt[bdq]

unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vblendvpd 4

unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vblendvps 4

unix/-m32: g++: g++.target/i386/avx2-pr54700-1.C
scan-assembler-times vpblendvb 2

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++14  scan-assembler-times vmaxph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++14  scan-assembler-times vminph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++17  scan-assembler-times vmaxph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++17  scan-assembler-times vminph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++20  scan-assembler-times vmaxph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++20  scan-assembler-times vminph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++98  scan-assembler-times vmaxph 3

unix/-m32: g++: g++.target/i386/avx512fp16-vcondmn-minmax.C
-std=gnu++98  scan-assembler-times vminph 3

unix/-m32: g++: g++.target/i386/pr100637-1b.C  -std=gnu++14
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr100637-1b.C  -std=gnu++17
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr100637-1b.C  -std=gnu++20
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr100637-1b.C  -std=gnu++98
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr100637-1w.C  -std=gnu++14
scan-assembler-times pcmpeqw 2

unix/-m32: g++: g++.target/i386/pr100637-1w.C  -std=gnu++17
scan-assembler-times pcmpeqw 2

unix/-m32: g++: g++.target/i386/pr100637-1w.C  -std=gnu++20
scan-assembler-times pcmpeqw 2

unix/-m32: g++: g++.target/i386/pr100637-1w.C  -std=gnu++98
scan-assembler-times pcmpeqw 2

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++14
scan-assembler-not vpcmpeqd[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++14
scan-assembler-not vpxor[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++14
scan-assembler-times vblendvps[ \\t] 2

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++17
scan-assembler-not vpcmpeqd[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++17
scan-assembler-not vpxor[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++17
scan-assembler-times vblendvps[ \\t] 2

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++20
scan-assembler-not vpcmpeqd[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++20
scan-assembler-not vpxor[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++20
scan-assembler-times vblendvps[ \\t] 2

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++98
scan-assembler-not vpcmpeqd[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++98
scan-assembler-not vpxor[ \\t]

unix/-m32: g++: g++.target/i386/pr100738-1.C  -std=gnu++98
scan-assembler-times vblendvps[ \\t] 2

unix/-m32: g++: g++.target/i386/pr103861-1.C  -std=gnu++14
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr103861-1.C  -std=gnu++17
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr103861-1.C  -std=gnu++20
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr103861-1.C  -std=gnu++98
scan-assembler-times pcmpeqb 2

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++14
scan-assembler-times max 4

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++14
scan-assembler-times min 4

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++17
scan-assembler-times max 4

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++17
scan-assembler-times min 4

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++20
scan-assembler-times max 4

unix/-m32: g++: g++.target/i386/pr61747.C  -std=gnu++20
scan-assembler-times min 4

unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-not pcmpgt[bdq]

unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times blendvpd 4

unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times blendvps 4

unix/-m32: g++: g++.target/i386/sse4_1-pr54700-1.C
scan-assembler-times pblendvb 2

unix/-m32: gcc: gcc.target/i386/avx2-pr99908.c scan-assembler-not \tvpcmpeq

unix/-m32: gcc: gcc.target/i386/avx512bw-pr96891-1.c scan-assembler-not %k[0-7]

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-not %k[0-9]

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsb[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsd[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsq[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminsw[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminub[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminud[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminuq[\t ] 2

unix/-m32: gcc: gcc.target/i386/avx512vl-pr88547-1.c
scan-assembler-times vpminuw[\t ] 2

unix/-m32: gcc: gcc.target/i386/blendv-3.c scan-assembler-not vpcmp

unix/-m32: gcc: gcc.target/i386/pr88540.c scan-assembler minpd

unix/-m32: gcc: gcc.target/i386/sse4_1-pr99908.c scan-assembler-not \tpcmpeq


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189
[Bug 114189] Target implements obsolete vcond{,u,eq} expanders

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
@ 2024-06-18  6:20 ` rguenth at gcc dot gnu.org
  2024-06-18  8:39 ` liuhongt at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-18  6:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, I had opened PR115490 with my results for this already.  Some mitigation
should be from optimizing ISEL expansion to vcond_mask and I'd start with
looking at some of the fallout from that side (note that might require
the backend reject not natively implemented vec_cmp via its operand 1
predicate)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
  2024-06-18  6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
@ 2024-06-18  8:39 ` liuhongt at gcc dot gnu.org
  2024-06-18 10:49 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Btw, I had opened PR115490 with my results for this already.  Some mitigation
> should be from optimizing ISEL expansion to vcond_mask and I'd start with
> looking at some of the fallout from that side (note that might require
> the backend reject not natively implemented vec_cmp via its operand 1
> predicate)

w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
negative the vector mask)
If we restrict the predicate of operand 1, would middle-end reject
vectorization (or lower it to scalar version)?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
  2024-06-18  6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
  2024-06-18  8:39 ` liuhongt at gcc dot gnu.org
@ 2024-06-18 10:49 ` rguenther at suse dot de
  2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2024-06-18 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> 
> --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> > Btw, I had opened PR115490 with my results for this already.  Some mitigation
> > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > looking at some of the fallout from that side (note that might require
> > the backend reject not natively implemented vec_cmp via its operand 1
> > predicate)
> 
> w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> negative the vector mask)
> If we restrict the predicate of operand 1, would middle-end reject
> vectorization (or lower it to scalar version)?

Richard suggests that we implement the "obvious" transforms like
inversion in the middle-end but if for example unsigned compares
are not supported the us_minus + eq + negative trick isn't on
that list.

The main reason to restrict vec_cmp would be to avoid
a <= b ? c : d going with an unsupported vec_cmp but instead
do a > b ? d : c - the alternative is trying to fix this
on the RTL side via combine.  I understand the non-native
compares are already expanded to supported form and we
don't use a split after combine to make combinations to
a supported form easier?

I don't have a good feeling which approach is going to be better
maintainable here.  But for example even for the unsigned compare
"lowering" the middle-end would have range info while RTL does
not (to some extent it's available at RTL expansion time).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-18 10:49 ` rguenther at suse dot de
@ 2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
  2024-06-18 11:17 ` rguenther at suse dot de
  2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18 11:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #3)
> On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > 
> > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #1)
> > > Btw, I had opened PR115490 with my results for this already.  Some mitigation
> > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > looking at some of the fallout from that side (note that might require
> > > the backend reject not natively implemented vec_cmp via its operand 1
> > > predicate)
> > 
> > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > negative the vector mask)
> > If we restrict the predicate of operand 1, would middle-end reject
> > vectorization (or lower it to scalar version)?
> 
> Richard suggests that we implement the "obvious" transforms like
> inversion in the middle-end but if for example unsigned compares
> are not supported the us_minus + eq + negative trick isn't on
> that list.
> 
> The main reason to restrict vec_cmp would be to avoid
> a <= b ? c : d going with an unsupported vec_cmp but instead
> do a > b ? d : c - the alternative is trying to fix this
> on the RTL side via combine.  I understand the non-native

Yes, I have a patch which can fix most regressions via pattern match in
combine.
Still there is a situation that is difficult to deal with, mainly the
optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
vcond_mask, and the combine matches up to 4 instructions, which makes it
currently impossible to use the combine to recover those optimizations in the
vcond{,u,eq}.i.e min/max.
In the case of sse 4.1 and above, there is basically no regression anymore.


the regression testcases w/o sse4.1

FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++14  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++17  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++20  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1b.C  -std=gnu++98  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++14  scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++17  scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++20  scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr100637-1w.C  -std=gnu++98  scan-assembler-times pcmpeqw
2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++14  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++17  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++20  scan-assembler-times pcmpeqb
2
FAIL: g++.target/i386/pr103861-1.C  -std=gnu++98  scan-assembler-times pcmpeqb
2
FAIL: gcc.target/i386/pr88540.c scan-assembler minpd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
@ 2024-06-18 11:17 ` rguenther at suse dot de
  2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2024-06-18 11:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> 
> --- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #3)
> > On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > > 
> > > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #1)
> > > > Btw, I had opened PR115490 with my results for this already.  Some mitigation
> > > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > > looking at some of the fallout from that side (note that might require
> > > > the backend reject not natively implemented vec_cmp via its operand 1
> > > > predicate)
> > > 
> > > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > > negative the vector mask)
> > > If we restrict the predicate of operand 1, would middle-end reject
> > > vectorization (or lower it to scalar version)?
> > 
> > Richard suggests that we implement the "obvious" transforms like
> > inversion in the middle-end but if for example unsigned compares
> > are not supported the us_minus + eq + negative trick isn't on
> > that list.
> > 
> > The main reason to restrict vec_cmp would be to avoid
> > a <= b ? c : d going with an unsupported vec_cmp but instead
> > do a > b ? d : c - the alternative is trying to fix this
> > on the RTL side via combine.  I understand the non-native
> 
> Yes, I have a patch which can fix most regressions via pattern match in
> combine.
> Still there is a situation that is difficult to deal with, mainly the
> optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
> sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
> vcond_mask, and the combine matches up to 4 instructions, which makes it
> currently impossible to use the combine to recover those optimizations in the
> vcond{,u,eq}.i.e min/max.
> In the case of sse 4.1 and above, there is basically no regression anymore.

Maybe it's possible to use a define_insn_and_split for blends w/o SSE 4.1?
That would allow combine matching the high-level blend operation and
we'd only lower it afterwards?  The question is what we lose in
combinations of/into the loweredn pand/pandn/por of course.

Maybe it's possible to catch the higher-level optimization (min/max)
on the GIMPLE level instead?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/115517] Fix x86 regressions after dropping uses of vcond{,u,eq}_optab
  2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-18 11:17 ` rguenther at suse dot de
@ 2024-06-18 11:29 ` liuhongt at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-18 11:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517

--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #5)
> On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > 
> > --- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > (In reply to rguenther@suse.de from comment #3)
> > > On Tue, 18 Jun 2024, liuhongt at gcc dot gnu.org wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115517
> > > > 
> > > > --- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > > (In reply to Richard Biener from comment #1)
> > > > > Btw, I had opened PR115490 with my results for this already.  Some mitigation
> > > > > should be from optimizing ISEL expansion to vcond_mask and I'd start with
> > > > > looking at some of the fallout from that side (note that might require
> > > > > the backend reject not natively implemented vec_cmp via its operand 1
> > > > > predicate)
> > > > 
> > > > w/o AVX512, vector integer comparison only supports EQ/GT, others comparison
> > > > rtx_cost is transformed to that. (.i.e GTU is emulated with us_minus + eq +
> > > > negative the vector mask)
> > > > If we restrict the predicate of operand 1, would middle-end reject
> > > > vectorization (or lower it to scalar version)?
> > > 
> > > Richard suggests that we implement the "obvious" transforms like
> > > inversion in the middle-end but if for example unsigned compares
> > > are not supported the us_minus + eq + negative trick isn't on
> > > that list.
> > > 
> > > The main reason to restrict vec_cmp would be to avoid
> > > a <= b ? c : d going with an unsupported vec_cmp but instead
> > > do a > b ? d : c - the alternative is trying to fix this
> > > on the RTL side via combine.  I understand the non-native
> > 
> > Yes, I have a patch which can fix most regressions via pattern match in
> > combine.
> > Still there is a situation that is difficult to deal with, mainly the
> > optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under
> > sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the
> > vcond_mask, and the combine matches up to 4 instructions, which makes it
> > currently impossible to use the combine to recover those optimizations in the
> > vcond{,u,eq}.i.e min/max.
> > In the case of sse 4.1 and above, there is basically no regression anymore.
> 
> Maybe it's possible to use a define_insn_and_split for blends w/o SSE 4.1?
> That would allow combine matching the high-level blend operation and
> we'd only lower it afterwards?  The question is what we lose in
> combinations of/into the loweredn pand/pandn/por of course.
I'd rather live with those regressions since they're only existed below sse4.1.
> 
> Maybe it's possible to catch the higher-level optimization (min/max)
> on the GIMPLE level instead?
For integral part, I believe the optimization is already there at gimple level.
For floating point part, x86 {max,min}{ps,pd} is not ieee-conformant, it's a
exact match of cond_expr a < b ? a : b (w/ consideration of -0.0 and NAN.)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-18 11:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-17  8:05 [Bug target/115517] New: Fix regression after dropping uses of vcond{,u,eq}_optab liuhongt at gcc dot gnu.org
2024-06-18  6:20 ` [Bug target/115517] Fix x86 regressions " rguenth at gcc dot gnu.org
2024-06-18  8:39 ` liuhongt at gcc dot gnu.org
2024-06-18 10:49 ` rguenther at suse dot de
2024-06-18 11:08 ` liuhongt at gcc dot gnu.org
2024-06-18 11:17 ` rguenther at suse dot de
2024-06-18 11:29 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).