* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
@ 2021-05-21 6:42 ` rguenth at gcc dot gnu.org
2021-05-21 10:36 ` segher at gcc dot gnu.org
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-21 6:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
7: r82:SI=~r89:SI
REG_DEAD r89:SI
8: r88:V4SI=vec_duplicate(r82:SI)
REG_DEAD r82:SI
9: r87:V4SI=r88:V4SI&r90:V4SI
REG_DEAD r90:V4SI
REG_DEAD r88:V4SI
I suppose we're confused about the vec_duplicate. Would generally swapping
the duplicate and the bit_not be profitable? Eventually it's a simplification
combine could try - I belive it has some cases where it tries variants of the
original instructions when combining. Adding a combine helper pattern
looks like putting too much burden on the backend IMHO.
We don't have a generic nand optab so handling this in ISEL on gimple
isn't straight-forward.
But combine and/or forwprop could do this.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
2021-05-21 6:42 ` [Bug target/100711] " rguenth at gcc dot gnu.org
@ 2021-05-21 10:36 ` segher at gcc dot gnu.org
2021-05-25 5:40 ` crazylht at gmail dot com
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: segher at gcc dot gnu.org @ 2021-05-21 10:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
Segher Boessenkool <segher at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-05-21
--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> I suppose we're confused about the vec_duplicate. Would generally swapping
> the duplicate and the bit_not be profitable? Eventually it's a
> simplification
> combine could try - I belive it has some cases where it tries variants of the
> original instructions when combining. Adding a combine helper pattern
> looks like putting too much burden on the backend IMHO.
>
> We don't have a generic nand optab so handling this in ISEL on gimple
> isn't straight-forward.
>
> But combine and/or forwprop could do this.
Combine never tries anything. Combine makes *one* result; if that does not
work,
it does not do the combination. (This is not completely true, but in essence
that is how it works, and it has to to not have exponential complexity).
It would be good to define a canonical form for anything vec_duplicate. It
probably is a good idea to pull the vec_duplicate as far outside as possible?
Canonical forms hugely reduce the amount of work needed. Compare to how "andc"
is represented (canonically with the inverted input first), or how "nand" is
(we
write that as an "orcc", an "or" with both inputs inverted, in canonical RTL).
Because only one form is allowed, we only have to check for that one form
everywhere.
Confirmed.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
2021-05-21 6:42 ` [Bug target/100711] " rguenth at gcc dot gnu.org
2021-05-21 10:36 ` segher at gcc dot gnu.org
@ 2021-05-25 5:40 ` crazylht at gmail dot com
2021-05-25 6:52 ` crazylht at gmail dot com
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-05-25 5:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Segher Boessenkool from comment #2)
> (In reply to Richard Biener from comment #1)
> > I suppose we're confused about the vec_duplicate. Would generally swapping
> > the duplicate and the bit_not be profitable? Eventually it's a
> > simplification
> > combine could try - I belive it has some cases where it tries variants of the
> > original instructions when combining. Adding a combine helper pattern
> > looks like putting too much burden on the backend IMHO.
> >
> > We don't have a generic nand optab so handling this in ISEL on gimple
> > isn't straight-forward.
> >
> > But combine and/or forwprop could do this.
>
> Combine never tries anything. Combine makes *one* result; if that does not
> work,
> it does not do the combination. (This is not completely true, but in essence
> that is how it works, and it has to to not have exponential complexity).
>
> It would be good to define a canonical form for anything vec_duplicate. It
> probably is a good idea to pull the vec_duplicate as far outside as possible?
>
> Canonical forms hugely reduce the amount of work needed. Compare to how
> "andc"
> is represented (canonically with the inverted input first), or how "nand" is
> (we
> write that as an "orcc", an "or" with both inputs inverted, in canonical
> RTL).
> Because only one form is allowed, we only have to check for that one form
> everywhere.
>
> Confirmed.
Even w/ canonical RTL, i think a combine splitter is also needed here, the
canonical RTL only helps combine/forwprop to match more possibility but won't
split patterns by itselies.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (2 preceding siblings ...)
2021-05-25 5:40 ` crazylht at gmail dot com
@ 2021-05-25 6:52 ` crazylht at gmail dot com
2021-05-25 10:09 ` segher at gcc dot gnu.org
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-05-25 6:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
>
> (In reply to Segher Boessenkool from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > I suppose we're confused about the vec_duplicate. Would generally
> Even w/ canonical RTL, i think a combine splitter is also needed here, the
> canonical RTL only helps combine/forwprop to match more possibility but
> won't split patterns by itselies.
I was wrong, i thought combine only support n->1 combining, but actually
pass_combine also support 3->2 combining which means a define_split is not
needed here.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (3 preceding siblings ...)
2021-05-25 6:52 ` crazylht at gmail dot com
@ 2021-05-25 10:09 ` segher at gcc dot gnu.org
2021-11-30 8:37 ` cvs-commit at gcc dot gnu.org
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: segher at gcc dot gnu.org @ 2021-05-25 10:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #4)
> > Even w/ canonical RTL, i think a combine splitter is also needed here, the
> > canonical RTL only helps combine/forwprop to match more possibility but
> > won't split patterns by itselies.
>
> I was wrong, i thought combine only support n->1 combining, but actually
> pass_combine also support 3->2 combining which means a define_split is not
> needed here.
<anything>->1 and <anything>->2, yes. But note that combine can often split
RTL
without having an explicit define_split; and also note the opposite, combine
does
not always pick the best spot to split, "manual" help (a define_split) can be
needed for good results.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (4 preceding siblings ...)
2021-05-25 10:09 ` segher at gcc dot gnu.org
@ 2021-11-30 8:37 ` cvs-commit at gcc dot gnu.org
2021-11-30 8:46 ` crazylht at gmail dot com
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-30 8:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:c39d77f252e895306ef88c1efb3eff04e4232554
commit r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Tue Nov 30 08:35:39 2021 +0000
x86_64: PR target/100711: Splitters for pandn
This patch addresses PR target/100711 by introducing define_split
patterns so that not/broadcast/pand may be simplified (by combine)
to broadcast/pandn. This introduces two splitters one for optimizing
pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on
TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI. Each splitter
has its own new testcase.
I've also confirmed that not/broadcast/pandn is already getting
simplified to broadcast/pand by the middle-end optimizers.
2021-11-30 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/100711
* config/i386/sse.md (define_split): New splitters to simplify
not;vec_duplicate;and as vec_duplicate;andn.
gcc/testsuite/ChangeLog
PR target/100711
* gcc.target/i386/pr100711-1.c: New test case.
* gcc.target/i386/pr100711-2.c: New test case.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (5 preceding siblings ...)
2021-11-30 8:37 ` cvs-commit at gcc dot gnu.org
@ 2021-11-30 8:46 ` crazylht at gmail dot com
2023-05-24 8:42 ` jbeulich at suse dot com
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-11-30 8:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC12.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (6 preceding siblings ...)
2021-11-30 8:46 ` crazylht at gmail dot com
@ 2023-05-24 8:42 ` jbeulich at suse dot com
2023-05-25 6:52 ` crazylht at gmail dot com
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jbeulich at suse dot com @ 2023-05-24 8:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
jbeulich at suse dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jbeulich at suse dot com
--- Comment #8 from jbeulich at suse dot com ---
Since the commit doesn't really explain it (maybe it's obvious to others, but
it isn't to me), may I ask why two splitters were introduced, yet then still
not covering all possible modes? VI48_128 only covers two of the four possible
SSE2 modes, while VI124_AVX2 leaves out all DI-element-size ones as well as all
512-bit ones. Shouldn't both be folded, using VI_AVX2 as the mode iterator?
As an aside, it is also interesting that the 1st splitter uses TARGET_SSE
without the corresponding testcase limiting itself to just SSE. When building
that testcase with SSE2 turned off, foo() uses shufps and andnps as expected,
but the splitter doesn't appear to come into play at all for bar(), when really
it is only the broadcast that needs synthesizing, while andnps can be used
regardless of mode.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (7 preceding siblings ...)
2023-05-24 8:42 ` jbeulich at suse dot com
@ 2023-05-25 6:52 ` crazylht at gmail dot com
2023-05-25 8:36 ` jbeulich at suse dot com
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2023-05-25 6:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to jbeulich from comment #8)
> Since the commit doesn't really explain it (maybe it's obvious to others,
> but it isn't to me), may I ask why two splitters were introduced, yet then
> still not covering all possible modes? VI48_128 only covers two of the four
> possible SSE2 modes, while VI124_AVX2 leaves out all DI-element-size ones as
> well as all 512-bit ones. Shouldn't both be folded, using VI_AVX2 as the
> mode iterator?
We don't have single instruction for V8HI/V16QImode broadcast without AVX2,
that's why the first splitter only have VI48_128.
And yes, for the second splitter, I think we should use VI_AVX2 to cover all
modes.
>
> As an aside, it is also interesting that the 1st splitter uses TARGET_SSE
> without the corresponding testcase limiting itself to just SSE. When
> building that testcase with SSE2 turned off, foo() uses shufps and andnps as
> expected, but the splitter doesn't appear to come into play at all for
> bar(), when really it is only the broadcast that needs synthesizing, while
> andnps can be used regardless of mode.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (8 preceding siblings ...)
2023-05-25 6:52 ` crazylht at gmail dot com
@ 2023-05-25 8:36 ` jbeulich at suse dot com
2023-05-25 8:50 ` crazylht at gmail dot com
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: jbeulich at suse dot com @ 2023-05-25 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #10 from jbeulich at suse dot com ---
(In reply to Hongtao.liu from comment #9)
> We don't have single instruction for V8HI/V16QImode broadcast without AVX2,
> that's why the first splitter only have VI48_128.
Does this matter? The splitters are about subsuming the "not". How the
"vec_duplicate" is carried out isn't really relevant here, is it?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (9 preceding siblings ...)
2023-05-25 8:36 ` jbeulich at suse dot com
@ 2023-05-25 8:50 ` crazylht at gmail dot com
2023-05-27 9:28 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2023-05-25 8:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to jbeulich from comment #10)
> (In reply to Hongtao.liu from comment #9)
> > We don't have single instruction for V8HI/V16QImode broadcast without AVX2,
> > that's why the first splitter only have VI48_128.
>
> Does this matter? The splitters are about subsuming the "not". How the
> "vec_duplicate" is carried out isn't really relevant here, is it?
It's splitted to 2 patterns, but there's no V8HI/V16QImode define_insn for the
first pattern w/o AVX2, there will be ICE of unrecognisable insn.
17110 [(set (match_dup 3)
17111 (vec_duplicate:VI48_128 (match_dup 1)))
17112 (set (match_dup 0)
17113 (and:VI48_128 (not:VI48_128 (match_dup 3))
17114 (match_dup 2)))]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (10 preceding siblings ...)
2023-05-25 8:50 ` crazylht at gmail dot com
@ 2023-05-27 9:28 ` cvs-commit at gcc dot gnu.org
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-27 9:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:ed6a9a35799c9298321d1589533767c2bb6f8d42
commit r14-1307-ged6a9a35799c9298321d1589533767c2bb6f8d42
Author: liuhongt <hongtao.liu@intel.com>
Date: Thu May 25 16:14:14 2023 +0800
Split notl + pbraodcast + pand to pbroadcast + pandn more modes.
r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554 adds 2 splitter to
transform notl + pbroadcast + pand to pbroadcast + pandn for
VI124_AVX2 which leaves out all DI-element-size ones as
well as all 512-bit ones.
This patch extend the splitter to VI_AVX2 which will handle DImode for
AVX2, and V64QImode,V32HImode,V16SImode,V8DImode for AVX512.
gcc/ChangeLog:
PR target/100711
* config/i386/sse.md (*andnot<mode>3): Extend below splitter
to VI_AVX2 to cover more modes.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr100711-2.c: Add v4di/v2di testcases.
* gcc.target/i386/pr100711-3.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (11 preceding siblings ...)
2023-05-27 9:28 ` cvs-commit at gcc dot gnu.org
@ 2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-05 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Beulich <jbeulich@gcc.gnu.org>:
https://gcc.gnu.org/g:3186ef0cb9e2d25e8455f9990e50187e3d1eee19
commit r14-2312-g3186ef0cb9e2d25e8455f9990e50187e3d1eee19
Author: Jan Beulich <jbeulich@suse.com>
Date: Wed Jul 5 09:48:19 2023 +0200
x86: allow memory operand for AVX2 splitter for PR target/100711
The intended broadcast (with AVX512) can very well be done right from
memory.
gcc/
PR target/100711
* config/i386/sse.md: Permit non-immediate operand 1 in AVX2
form of splitter for PR target/100711.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (12 preceding siblings ...)
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
@ 2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-05 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Beulich <jbeulich@gcc.gnu.org>:
https://gcc.gnu.org/g:fa58c2871a1235cb5ba475303a2bd11ae90416d5
commit r14-2313-gfa58c2871a1235cb5ba475303a2bd11ae90416d5
Author: Jan Beulich <jbeulich@suse.com>
Date: Wed Jul 5 09:48:47 2023 +0200
x86: further PR target/100711-like splitting
With respective two-operand bitwise operations now expressable by a
single VPTERNLOG, add splitters to also deal with ior and xor
counterparts of the original and-only case. Note that the splitters need
to be separate, as the placement of "not" differs in the final insns
(*iornot<mode>3, *xnor<mode>3) which are intended to pick up one half of
the result.
gcc/
PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.
gcc/testsuite/
PR target/100711
* gcc.target/i386/pr100711-4.c: New test.
* gcc.target/i386/pr100711-5.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/100711] Miss optimization for pandn
2021-05-21 2:22 [Bug target/100711] New: Miss optimization for pandn crazylht at gmail dot com
` (13 preceding siblings ...)
2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
@ 2023-07-05 7:49 ` cvs-commit at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-05 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100711
--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Beulich <jbeulich@gcc.gnu.org>:
https://gcc.gnu.org/g:e007369c8b67bcabd57c4fed8cff2a6db82e78e6
commit r14-2314-ge007369c8b67bcabd57c4fed8cff2a6db82e78e6
Author: Jan Beulich <jbeulich@suse.com>
Date: Wed Jul 5 09:49:16 2023 +0200
x86: yet more PR target/100711-like splitting
Following two-operand bitwise operations, add another splitter to also
deal with not followed by broadcast all on its own, which can be
expressed as simple embedded broadcast instead once a broadcast operand
is actually permitted in the respective insn. While there also permit
a broadcast operand in the corresponding expander.
gcc/
PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate as a singular vpternlog.
(one_cmpl<mode>2): Allow broadcast for operand 1.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Likewise.
gcc/testsuite/
PR target/100711
* gcc.target/i386/pr100711-6.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread