* [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it
@ 2024-08-16 21:35 Andrew Pinski
2024-08-16 21:35 ` [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042] Andrew Pinski
2024-08-20 16:45 ` [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Richard Sandiford
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Pinski @ 2024-08-16 21:35 UTC (permalink / raw)
To: gcc-patches; +Cc: Andrew Pinski
On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
if there was an optab for the type, so this changes that to check the optab to see if we should expand
or have the backend handle it.
Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* builtins.cc (fold_builtin_bit_query): Don't expand double
`unsigned long long` typess if there is an optab entry for that
type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
gcc/builtins.cc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..b4d51eaeba5 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
tree call = NULL_TREE, tem;
if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
&& (TYPE_PRECISION (arg0_type)
- == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
+ == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
+ /* If the target supports the optab, then don't do the expansion. */
+ && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
{
/* __int128 expansions using up to 2 long long builtins. */
arg0 = save_expr (arg0);
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]
2024-08-16 21:35 [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Andrew Pinski
@ 2024-08-16 21:35 ` Andrew Pinski
2024-08-20 16:51 ` Richard Sandiford
2024-08-20 16:45 ` [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Richard Sandiford
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Pinski @ 2024-08-16 21:35 UTC (permalink / raw)
To: gcc-patches; +Cc: Andrew Pinski
When CSSC is not enabled, 128bit popcount can be implemented
just via the vector (v16qi) cnt instruction followed by a reduction,
like how the 64bit one is currently implemented instead of
splitting into 2 64bit popcount.
Build and tested for aarch64-linux-gnu.
PR target/113042
gcc/ChangeLog:
* config/aarch64/aarch64.md (popcountti2): New define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt10.c: New test.
* gcc.target/aarch64/popcnt9.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
gcc/config/aarch64/aarch64.md | 16 +++++++++++++
gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 +++++++++++++++++++++
gcc/testsuite/gcc.target/aarch64/popcnt9.c | 25 +++++++++++++++++++++
3 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 12dcc16529a..73506e71f43 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
}
})
+(define_expand "popcountti2"
+ [(set (match_operand:TI 0 "register_operand")
+ (popcount:TI (match_operand:TI 1 "register_operand")))]
+ "TARGET_SIMD && !TARGET_CSSC"
+{
+ rtx v = gen_reg_rtx (V16QImode);
+ rtx v1 = gen_reg_rtx (V16QImode);
+ emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
+ emit_insn (gen_popcountv16qi2 (v1, v));
+ rtx out = gen_reg_rtx (DImode);
+ emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
+ out = convert_to_mode (TImode, out, true);
+ emit_move_insn (operands[0], out);
+ DONE;
+})
+
(define_insn "clrsb<mode>2"
[(set (match_operand:GPI 0 "register_operand" "=r")
(clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
new file mode 100644
index 00000000000..4d01fc67022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+cssc"
+
+/*
+** h128:
+** ldp x([0-9]+), x([0-9]+), \[x0\]
+** cnt x([0-9]+), x([0-9]+)
+** cnt x([0-9]+), x([0-9]+)
+** add w0, w([0-9]+), w([0-9]+)
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+ return __builtin_popcountg (a[0]);
+}
+
+/* popcount with CSSC should be split into 2 sections. */
+/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
new file mode 100644
index 00000000000..c778fc7f420
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h128:
+** ldr q([0-9]+), \[x0\]
+** cnt v([0-9]+).16b, v\1.16b
+** addv b([0-9]+), v\2.16b
+** fmov w0, s\3
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+ return __builtin_popcountg (a[0]);
+}
+
+/* There should be only one POPCOUNT. */
+/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " __builtin_popcount" "optimized" } } */
+
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it
2024-08-16 21:35 [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Andrew Pinski
2024-08-16 21:35 ` [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042] Andrew Pinski
@ 2024-08-20 16:45 ` Richard Sandiford
2024-08-21 0:22 ` Andrew Pinski
1 sibling, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2024-08-20 16:45 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches
Andrew Pinski <quic_apinski@quicinc.com> writes:
> On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
> instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
> reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
> if there was an optab for the type, so this changes that to check the optab to see if we should expand
> or have the backend handle it.
>
> Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
>
> gcc/ChangeLog:
>
> * builtins.cc (fold_builtin_bit_query): Don't expand double
> `unsigned long long` typess if there is an optab entry for that
> type.
OK. The logic in the function seems a bit twisty (the same condition
is checked later), but all my attempts to improve it only made it worse.
Thanks,
Richard
>
> Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> ---
> gcc/builtins.cc | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 0b902896ddd..b4d51eaeba5 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
> tree call = NULL_TREE, tem;
> if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
> && (TYPE_PRECISION (arg0_type)
> - == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> + == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> + /* If the target supports the optab, then don't do the expansion. */
> + && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
> {
> /* __int128 expansions using up to 2 long long builtins. */
> arg0 = save_expr (arg0);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]
2024-08-16 21:35 ` [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042] Andrew Pinski
@ 2024-08-20 16:51 ` Richard Sandiford
2024-08-20 18:17 ` Richard Sandiford
0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2024-08-20 16:51 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches
Andrew Pinski <quic_apinski@quicinc.com> writes:
> When CSSC is not enabled, 128bit popcount can be implemented
> just via the vector (v16qi) cnt instruction followed by a reduction,
> like how the 64bit one is currently implemented instead of
> splitting into 2 64bit popcount.
>
> Build and tested for aarch64-linux-gnu.
>
> PR target/113042
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (popcountti2): New define_expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/popcnt10.c: New test.
> * gcc.target/aarch64/popcnt9.c: New test.
OK if there are no other comments in the next 24 hours.
Thanks,
Richard
>
> Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> ---
> gcc/config/aarch64/aarch64.md | 16 +++++++++++++
> gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 +++++++++++++++++++++
> gcc/testsuite/gcc.target/aarch64/popcnt9.c | 25 +++++++++++++++++++++
> 3 files changed, 66 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 12dcc16529a..73506e71f43 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
> }
> })
>
> +(define_expand "popcountti2"
> + [(set (match_operand:TI 0 "register_operand")
> + (popcount:TI (match_operand:TI 1 "register_operand")))]
> + "TARGET_SIMD && !TARGET_CSSC"
> +{
> + rtx v = gen_reg_rtx (V16QImode);
> + rtx v1 = gen_reg_rtx (V16QImode);
> + emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
> + emit_insn (gen_popcountv16qi2 (v1, v));
> + rtx out = gen_reg_rtx (DImode);
> + emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
> + out = convert_to_mode (TImode, out, true);
> + emit_move_insn (operands[0], out);
> + DONE;
> +})
> +
> (define_insn "clrsb<mode>2"
> [(set (match_operand:GPI 0 "register_operand" "=r")
> (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
> new file mode 100644
> index 00000000000..4d01fc67022
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* PR target/113042 */
> +
> +#pragma GCC target "+cssc"
> +
> +/*
> +** h128:
> +** ldp x([0-9]+), x([0-9]+), \[x0\]
> +** cnt x([0-9]+), x([0-9]+)
> +** cnt x([0-9]+), x([0-9]+)
> +** add w0, w([0-9]+), w([0-9]+)
> +** ret
> +*/
> +
> +
> +unsigned h128 (const unsigned __int128 *a) {
> + return __builtin_popcountg (a[0]);
> +}
> +
> +/* popcount with CSSC should be split into 2 sections. */
> +/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
> new file mode 100644
> index 00000000000..c778fc7f420
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* PR target/113042 */
> +
> +#pragma GCC target "+nocssc"
> +
> +/*
> +** h128:
> +** ldr q([0-9]+), \[x0\]
> +** cnt v([0-9]+).16b, v\1.16b
> +** addv b([0-9]+), v\2.16b
> +** fmov w0, s\3
> +** ret
> +*/
> +
> +
> +unsigned h128 (const unsigned __int128 *a) {
> + return __builtin_popcountg (a[0]);
> +}
> +
> +/* There should be only one POPCOUNT. */
> +/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " __builtin_popcount" "optimized" } } */
> +
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]
2024-08-20 16:51 ` Richard Sandiford
@ 2024-08-20 18:17 ` Richard Sandiford
2024-08-21 0:24 ` Andrew Pinski
0 siblings, 1 reply; 7+ messages in thread
From: Richard Sandiford @ 2024-08-20 18:17 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches
Richard Sandiford <richard.sandiford@arm.com> writes:
> Andrew Pinski <quic_apinski@quicinc.com> writes:
>> When CSSC is not enabled, 128bit popcount can be implemented
>> just via the vector (v16qi) cnt instruction followed by a reduction,
>> like how the 64bit one is currently implemented instead of
>> splitting into 2 64bit popcount.
>>
>> Build and tested for aarch64-linux-gnu.
>>
>> PR target/113042
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64.md (popcountti2): New define_expand.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/popcnt10.c: New test.
>> * gcc.target/aarch64/popcnt9.c: New test.
>
> OK if there are no other comments in the next 24 hours.
Sorry, only thought about it later, but:
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 12dcc16529a..73506e71f43 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
>> }
>> })
>>
>> +(define_expand "popcountti2"
>> + [(set (match_operand:TI 0 "register_operand")
>> + (popcount:TI (match_operand:TI 1 "register_operand")))]
Could you try making the output :DI instead of :TI? I'd expect
internal-fn.cc to handle that correctly and extend the result to
128 bits where needed.
That would make the dummy popcount rtx malformed, so I suppose
the pattern should just be:
[(match_operand:DI 0 "register_operand")
(match_operand:TI 1 "register_operand")]
>> + "TARGET_SIMD && !TARGET_CSSC"
>> +{
>> + rtx v = gen_reg_rtx (V16QImode);
>> + rtx v1 = gen_reg_rtx (V16QImode);
>> + emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
>> + emit_insn (gen_popcountv16qi2 (v1, v));
>> + rtx out = gen_reg_rtx (DImode);
>> + emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
We could then use operands[0] directly as the output here.
Thanks,
Richard
>> + out = convert_to_mode (TImode, out, true);
>> + emit_move_insn (operands[0], out);
>> + DONE;
>> +})
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it
2024-08-20 16:45 ` [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Richard Sandiford
@ 2024-08-21 0:22 ` Andrew Pinski
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Pinski @ 2024-08-21 0:22 UTC (permalink / raw)
To: Andrew Pinski, gcc-patches, richard.sandiford
On Tue, Aug 20, 2024 at 9:46 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Andrew Pinski <quic_apinski@quicinc.com> writes:
> > On aarch64 (without !CSSC instructions), since popcount is implemented using the SIMD instruction cnt,
> > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt (V16QI mode). And only one
> > reduction addition instead of 2. Currently fold_builtin_bit_query will expand always without checking
> > if there was an optab for the type, so this changes that to check the optab to see if we should expand
> > or have the backend handle it.
> >
> > Bootstrapped and tested on x86_64-linux-gnu and built and tested for aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * builtins.cc (fold_builtin_bit_query): Don't expand double
> > `unsigned long long` typess if there is an optab entry for that
> > type.
>
> OK. The logic in the function seems a bit twisty (the same condition
> is checked later), but all my attempts to improve it only made it worse.
I tried to look if there was a good refactoring here too but I didn't
see any either.
Anyways I have now pushed it as
r15-3056-g50b5000a5e430aaf99a5e00465cc9e25563d908b .
Thanks,
Andrew
>
> Thanks,
> Richard
>
> >
> > Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
> > ---
> > gcc/builtins.cc | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 0b902896ddd..b4d51eaeba5 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum built_in_function fcode,
> > tree call = NULL_TREE, tem;
> > if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
> > && (TYPE_PRECISION (arg0_type)
> > - == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> > + == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> > + /* If the target supports the optab, then don't do the expansion. */
> > + && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
> > {
> > /* __int128 expansions using up to 2 long long builtins. */
> > arg0 = save_expr (arg0);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]
2024-08-20 18:17 ` Richard Sandiford
@ 2024-08-21 0:24 ` Andrew Pinski
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Pinski @ 2024-08-21 0:24 UTC (permalink / raw)
To: Andrew Pinski, gcc-patches, richard.sandiford
On Tue, Aug 20, 2024 at 11:18 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Andrew Pinski <quic_apinski@quicinc.com> writes:
> >> When CSSC is not enabled, 128bit popcount can be implemented
> >> just via the vector (v16qi) cnt instruction followed by a reduction,
> >> like how the 64bit one is currently implemented instead of
> >> splitting into 2 64bit popcount.
> >>
> >> Build and tested for aarch64-linux-gnu.
> >>
> >> PR target/113042
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/aarch64/aarch64.md (popcountti2): New define_expand.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/aarch64/popcnt10.c: New test.
> >> * gcc.target/aarch64/popcnt9.c: New test.
> >
> > OK if there are no other comments in the next 24 hours.
>
> Sorry, only thought about it later, but:
Yes that is a good idea since that would be the same code in the end
anyways and it is slightly cleaner.
I was not 100% sure if you removed your approval or approved it with
the changes so I submitted a new patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660960.html
Thanks,
Andrew Pinski
>
> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> >> index 12dcc16529a..73506e71f43 100644
> >> --- a/gcc/config/aarch64/aarch64.md
> >> +++ b/gcc/config/aarch64/aarch64.md
> >> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
> >> }
> >> })
> >>
> >> +(define_expand "popcountti2"
> >> + [(set (match_operand:TI 0 "register_operand")
> >> + (popcount:TI (match_operand:TI 1 "register_operand")))]
>
> Could you try making the output :DI instead of :TI? I'd expect
> internal-fn.cc to handle that correctly and extend the result to
> 128 bits where needed.
>
> That would make the dummy popcount rtx malformed, so I suppose
> the pattern should just be:
>
> [(match_operand:DI 0 "register_operand")
> (match_operand:TI 1 "register_operand")]
>
> >> + "TARGET_SIMD && !TARGET_CSSC"
> >> +{
> >> + rtx v = gen_reg_rtx (V16QImode);
> >> + rtx v1 = gen_reg_rtx (V16QImode);
> >> + emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
> >> + emit_insn (gen_popcountv16qi2 (v1, v));
> >> + rtx out = gen_reg_rtx (DImode);
> >> + emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
>
> We could then use operands[0] directly as the output here.
>
> Thanks,
> Richard
>
> >> + out = convert_to_mode (TImode, out, true);
> >> + emit_move_insn (operands[0], out);
> >> + DONE;
> >> +})
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-21 0:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-16 21:35 [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Andrew Pinski
2024-08-16 21:35 ` [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042] Andrew Pinski
2024-08-20 16:51 ` Richard Sandiford
2024-08-20 18:17 ` Richard Sandiford
2024-08-21 0:24 ` Andrew Pinski
2024-08-20 16:45 ` [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it Richard Sandiford
2024-08-21 0:22 ` Andrew Pinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).