* [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
@ 2023-12-15 2:57 Juzhe-Zhong
2023-12-15 11:14 ` Robin Dapp
0 siblings, 1 reply; 9+ messages in thread
From: Juzhe-Zhong @ 2023-12-15 2:57 UTC (permalink / raw)
To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong
This patch fixes the following FAILs in "full coverage" testing:
Running target riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c execution test
FAIL: gcc.dg/vect/vect-strided-u8-i2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-strided-u8-i2.c execution test
The root cause is vmerge optimization on this following IR:
_45 = VEC_PERM_EXPR <vect__3.13_47, vect__4.14_46, { 0, 257, 2, 259, 4, 261, 6, 263, 8, 265, 10, 267, 12, 269, 14, 271, 16, 273, 18, 275, 20, 277, 22, 279, 24, 281, 26, 283, 28, 285, 30, 287, 32, 289, 34, 291, 36, 293, 38, 295, 40, 297, 42, 299, 44, 301, 46, 303, 48, 305, 50, 307, 52, 309, 54, 311, 56, 313, 58, 315, 60, 317, 62, 319, 64, 321, 66, 323, 68, 325, 70, 327, 72, 329, 74, 331, 76, 333, 78, 335, 80, 337, 82, 339, 84, 341, 86, 343, 88, 345, 90, 347, 92, 349, 94, 351, 96, 353, 98, 355, 100, 357, 102, 359, 104, 361, 106, 363, 108, 365, 110, 367, 112, 369, 114, 371, 116, 373, 118, 375, 120, 377, 122, 379, 124, 381, 126, 383, 128, 385, 130, 387, 132, 389, 134, 391, 136, 393, 138, 395, 140, 397, 142, 399, 144, 401, 146, 403, 148, 405, 150, 407, 152, 409, 154, 411, 156, 413, 158, 415, 160, 417, 162, 419, 164, 421, 166, 423, 168, 425, 170, 427, 172, 429, 174, 431, 176, 433, 178, 435, 180, 437, 182, 439, 184, 441, 186, 443, 188, 445, 190, 447, 192, 449, 194, 451, 196, 453, 198, 455, 200, 457, 202, 459, 204, 461, 206, 463, 208, 465, 210, 467, 212, 469, 214, 471, 216, 473, 218, 475, 220, 477, 222, 479, 224, 481, 226, 483, 228, 485, 230, 487, 232, 489, 234, 491, 236, 493, 238, 495, 240, 497, 242, 499, 244, 501, 246, 503, 248, 505, 250, 507, 252, 509, 254, 511 }>;
It's obvious we have many index > 255 in shuffle indice. Here we use vmerge optimizaiton which is available but incorrect codgen cause run fail.
The bug codegen:
vsetvli zero,a4,e8,m8,ta,ma
vmsltu.vi v0,v0,0 -> it should be 256 instead of 0, but since it is EEW8 vector, 256 is not a available value that 8bit register can hold it.
vmerge.vvm v8,v8,v16,v0
After this patch:
vmv.v.x v0,a6
vmerge.vvm v8,v8,v16,v0
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_merge_patterns): Fix bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/bug-1.c: New test.
---
gcc/config/riscv/riscv-v.cc | 49 ++++++++++++++++---
.../gcc.target/riscv/rvv/autovec/bug-1.c | 39 +++++++++++++++
2 files changed, 81 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 680e2a0e03a..b8ba597682b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2987,20 +2987,55 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d)
&& !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
return false;
+ /* We need to use precomputed mask for such situation and such mask
+ can only be computed in compile-time known size modes. */
+ if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 256)
+ && !vec_len.is_constant ())
+ return false;
+
if (d->testing_p)
return true;
machine_mode mask_mode = get_mask_mode (vmode);
rtx mask = gen_reg_rtx (mask_mode);
- rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
-
/* MASK = SELECTOR < NUNTIS ? 1 : 0. */
- rtx x = gen_int_mode (vec_len, GET_MODE_INNER (sel_mode));
- insn_code icode = code_for_pred_cmp_scalar (sel_mode);
- rtx cmp = gen_rtx_fmt_ee (LTU, mask_mode, sel, x);
- rtx ops[] = {mask, cmp, sel, x};
- emit_vlmax_insn (icode, COMPARE_OP, ops);
+ if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256))
+ {
+ rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+ rtx x = gen_int_mode (vec_len, GET_MODE_INNER (sel_mode));
+ insn_code icode = code_for_pred_cmp_scalar (sel_mode);
+ rtx cmp = gen_rtx_fmt_ee (LTU, mask_mode, sel, x);
+ rtx ops[] = {mask, cmp, sel, x};
+ emit_vlmax_insn (icode, COMPARE_OP, ops);
+ }
+ else
+ {
+ /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
+ directly to generate the selector mask, instead, we can only use
+ precomputed mask.
+
+ E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
+ don't have a QImode scalar register to hold larger than 255. */
+ gcc_assert (vec_len.is_constant ());
+ int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
+ machine_mode mode = get_vector_mode (QImode, size).require ();
+ rtx tmp = gen_reg_rtx (mode);
+ rvv_builder v (mode, 1, size);
+ for (int i = 0; i < vec_len.to_constant () / 8; i++)
+ {
+ uint8_t value = 0;
+ for (int j = 0; j < 8; j++)
+ {
+ int index = i * 8 + j;
+ if (known_lt (d->perm[index], 256))
+ value |= 1 << j;
+ }
+ v.quick_push (gen_int_mode (value, QImode));
+ }
+ emit_move_insn (tmp, v.build ());
+ emit_move_insn (mask, gen_lowpart (mask_mode, tmp));
+ }
/* TARGET = MASK ? OP0 : OP1. */
/* swap op0 and op1 since the order is opposite to pred_merge. */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c
new file mode 100644
index 00000000000..88059971503
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d --param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -O3 -fdump-tree-optimized" } */
+
+#include <assert.h>
+
+#define N 64
+
+typedef struct
+{
+ unsigned char a;
+ unsigned char b;
+} s;
+
+int
+main1 (s *arr)
+{
+ s *ptr = arr;
+ s res[N];
+ int i;
+
+ for (i = 0; i < N; i++)
+ {
+ res[i].a = ptr->b - ptr->a;
+ res[i].b = ptr->b + ptr->a;
+ ptr++;
+ }
+ /* check results: */
+#pragma GCC novector
+ for (i = 0; i < N; i++)
+ {
+ if (res[i].a != arr[i].b - arr[i].a || res[i].b != arr[i].a + arr[i].b)
+ assert (0);
+ }
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 3 "optimized" } } */
+/* { dg-final { scan-assembler-not {vmsltu\.vi} } } */
--
2.36.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 2:57 [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization Juzhe-Zhong
@ 2023-12-15 11:14 ` Robin Dapp
2023-12-15 12:16 ` juzhe.zhong
0 siblings, 1 reply; 9+ messages in thread
From: Robin Dapp @ 2023-12-15 11:14 UTC (permalink / raw)
To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw
Hi Juzhe,
in general looks OK.
> + /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes. */
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 256)
> + && !vec_len.is_constant ())
> + return false;
> +
We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.
> /* MASK = SELECTOR < NUNTIS ? 1 : 0. */
Comment should go inside the if branch.
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256))
> + }
> + else
> + {
> + /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> + directly to generate the selector mask, instead, we can only use
> + precomputed mask.
I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.
> +
> + E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> + don't have a QImode scalar register to hold larger than 255. */
We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.
Also add this as comment:
As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.
> + gcc_assert (vec_len.is_constant ());
> + int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> + machine_mode mode = get_vector_mode (QImode, size).require ();
> + rtx tmp = gen_reg_rtx (mode);
> + rvv_builder v (mode, 1, size);
> + for (int i = 0; i < vec_len.to_constant () / 8; i++)
> + {
> + uint8_t value = 0;
> + for (int j = 0; j < 8; j++)
> + {
> + int index = i * 8 + j;
> + if (known_lt (d->perm[index], 256))
> + value |= 1 << j;
> + }
I would have hoped that a simple
v.quick_push (gen_int_mode (0b01010101, QImode));
suffices but that will probably clash if there are more than
two npatterns.
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 11:14 ` Robin Dapp
@ 2023-12-15 12:16 ` juzhe.zhong
2023-12-15 12:25 ` Robin Dapp
0 siblings, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-12-15 12:16 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw
[-- Attachment #1: Type: text/plain, Size: 3771 bytes --]
>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
No, I think it will make us miss some optimization.
For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us missed merge optimization but
we definitely can do merge optimization.
>> Also add a comment that the non-constant case is handled by
>> shuffle_decompress_patterns in case we have a HImode vector twice the
>> size that can hold our indices.
Ok.
>> Comment should go inside the if branch.
Ok.
>>Also add this as comment:
>>As the mask is a simple {0, 1, ...} pattern and the length is known we can
>>store it in a scalar register and broadcast it to a mask register.
Ok.
>>I would have hoped that a simple
>> v.quick_push (gen_int_mode (0b01010101, QImode));
>>suffices but that will probably clash if there are more than
>>two npatterns.
No, we definitely can not use this. more details you can see the current test vmerge-*.c .
We have various patterns:
E.g.
0, nunits + 1, nunits+ 2, ... it is 011
nunits, 1, 2 it 100.
....
Many different kinds of patterns can be used vmerge optimization.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-15 19:14
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
Hi Juzhe,
in general looks OK.
> + /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes. */
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 256)
> + && !vec_len.is_constant ())
> + return false;
> +
We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.
> /* MASK = SELECTOR < NUNTIS ? 1 : 0. */
Comment should go inside the if branch.
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256))
> + }
> + else
> + {
> + /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> + directly to generate the selector mask, instead, we can only use
> + precomputed mask.
I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.
> +
> + E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> + don't have a QImode scalar register to hold larger than 255. */
We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.
Also add this as comment:
As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.
> + gcc_assert (vec_len.is_constant ());
> + int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> + machine_mode mode = get_vector_mode (QImode, size).require ();
> + rtx tmp = gen_reg_rtx (mode);
> + rvv_builder v (mode, 1, size);
> + for (int i = 0; i < vec_len.to_constant () / 8; i++)
> + {
> + uint8_t value = 0;
> + for (int j = 0; j < 8; j++)
> + {
> + int index = i * 8 + j;
> + if (known_lt (d->perm[index], 256))
> + value |= 1 << j;
> + }
I would have hoped that a simple
v.quick_push (gen_int_mode (0b01010101, QImode));
suffices but that will probably clash if there are more than
two npatterns.
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:16 ` juzhe.zhong
@ 2023-12-15 12:25 ` Robin Dapp
2023-12-15 12:28 ` juzhe.zhong
2023-12-15 12:32 ` juzhe.zhong
0 siblings, 2 replies; 9+ messages in thread
From: Robin Dapp @ 2023-12-15 12:25 UTC (permalink / raw)
To: juzhe.zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw
On 12/15/23 13:16, juzhe.zhong@rivai.ai wrote:
>
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
>
> For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us missed merge optimization but
> we definitely can do merge optimization.
I didn't mean to skip the && !vec_len.is_constant (), that should
stay. Just the first part of condition that can be re-used in the
if as well (inverted).
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:25 ` Robin Dapp
@ 2023-12-15 12:28 ` juzhe.zhong
2023-12-15 12:32 ` juzhe.zhong
1 sibling, 0 replies; 9+ messages in thread
From: juzhe.zhong @ 2023-12-15 12:28 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw
[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]
Do you mean like this ?
/* We need to use precomputed mask for such situation and such mask
can only be computed in compile-time known size modes. */
bool indices_fit_selector_p
= maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8、
&& indices_fit_selector_p
&& !vec_len.is_constant ())
return false;
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-15 20:25
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
On 12/15/23 13:16, juzhe.zhong@rivai.ai wrote:
>
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
>
> For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us missed merge optimization but
> we definitely can do merge optimization.
I didn't mean to skip the && !vec_len.is_constant (), that should
stay. Just the first part of condition that can be re-used in the
if as well (inverted).
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:25 ` Robin Dapp
2023-12-15 12:28 ` juzhe.zhong
@ 2023-12-15 12:32 ` juzhe.zhong
2023-12-15 12:44 ` Robin Dapp
1 sibling, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-12-15 12:32 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw
[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]
Oh. I think it should be renamed into not_fit.
Is this following make sense to you ?
/* We need to use precomputed mask for such situation and such mask
can only be computed in compile-time known size modes. */
bool indices_not_fit_selector_p
= maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
&& indices_not_fit_selector_p
&& !vec_len.is_constant ())
return false;
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-15 20:25
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
On 12/15/23 13:16, juzhe.zhong@rivai.ai wrote:
>
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
>
> For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us missed merge optimization but
> we definitely can do merge optimization.
I didn't mean to skip the && !vec_len.is_constant (), that should
stay. Just the first part of condition that can be re-used in the
if as well (inverted).
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:32 ` juzhe.zhong
@ 2023-12-15 12:44 ` Robin Dapp
2023-12-15 12:52 ` juzhe.zhong
0 siblings, 1 reply; 9+ messages in thread
From: Robin Dapp @ 2023-12-15 12:44 UTC (permalink / raw)
To: juzhe.zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw
> Oh. I think it should be renamed into not_fit.
>
> Is this following make sense to you ?
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_not_fit_selector_p
> = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
> if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
> && indices_not_fit_selector_p
> && !vec_len.is_constant ())
> return false;
Mhm, right, I don't think this makes it nicer overall. Maybe just like
the following then:
bool ..._p = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
if (!..._p && !vec_len.is_constant ())
then later
if (..._p)
...
else
...
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:44 ` Robin Dapp
@ 2023-12-15 12:52 ` juzhe.zhong
2023-12-15 12:54 ` Robin Dapp
0 siblings, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-12-15 12:52 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: Robin Dapp, kito.cheng, Kito.cheng, jeffreyalaw
[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]
Do you mean :
/* We need to use precomputed mask for such situation and such mask
can only be computed in compile-time known size modes. */
bool indices_fit_selector_p
= GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
if (!indices_fit_selector_p && !vec_len.is_constant ())
return false;
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-15 20:44
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
> Oh. I think it should be renamed into not_fit.
>
> Is this following make sense to you ?
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_not_fit_selector_p
> = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
> if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
> && indices_not_fit_selector_p
> && !vec_len.is_constant ())
> return false;
Mhm, right, I don't think this makes it nicer overall. Maybe just like
the following then:
bool ..._p = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
if (!..._p && !vec_len.is_constant ())
then later
if (..._p)
...
else
...
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization
2023-12-15 12:52 ` juzhe.zhong
@ 2023-12-15 12:54 ` Robin Dapp
0 siblings, 0 replies; 9+ messages in thread
From: Robin Dapp @ 2023-12-15 12:54 UTC (permalink / raw)
To: juzhe.zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, Kito.cheng, jeffreyalaw
On 12/15/23 13:52, juzhe.zhong@rivai.ai wrote:
> Do you mean :
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_fit_selector_p
> = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
> if (!indices_fit_selector_p && !vec_len.is_constant ())
> return false;
Yes and then reuse this in the if.
Regards
Robin
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-15 12:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-15 2:57 [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization Juzhe-Zhong
2023-12-15 11:14 ` Robin Dapp
2023-12-15 12:16 ` juzhe.zhong
2023-12-15 12:25 ` Robin Dapp
2023-12-15 12:28 ` juzhe.zhong
2023-12-15 12:32 ` juzhe.zhong
2023-12-15 12:44 ` Robin Dapp
2023-12-15 12:52 ` juzhe.zhong
2023-12-15 12:54 ` Robin Dapp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).