* Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-19 12:52 ` Richard Sandiford
@ 2022-08-19 14:10 ` 钟居哲
2022-08-19 14:34 ` 钟居哲
2022-08-20 1:23 ` 钟居哲
2 siblings, 0 replies; 14+ messages in thread
From: 钟居哲 @ 2022-08-19 14:10 UTC (permalink / raw)
To: richard.sandiford; +Cc: gcc-patches, rguenther, kito.cheng
As you mentioned. For poly_uint16 (1, 1), the pattern should be:
shape: nelts_per_pattern == 2, npatterns == 1
elements: a, b[, b, b, b, b, ...]
I tried to print out the rtx that tests create to test:
static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
unsigned int min_nunits = constant_lower_bound (nunits);
scalar_mode int_mode = GET_MODE_INNER (inner_mode);
unsigned int count = gcd (min_nunits, 4);
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
rtx x = builder.build ();
test_vector_subregs_modes (x);
print_rtl_single(stdout,x);
if (!nunits.is_constant ())
test_vector_subregs_modes (x, nunits - min_nunits, count);
}
the x value:
(const_vector:VNx1DI repeat [
(const_int 0 [0])
])
It seems that it doesn't match the pattern you said. Am I understanding correctly? And would you mind taking a look at the codes which generate x:
static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
unsigned int min_nunits = constant_lower_bound (nunits);
scalar_mode int_mode = GET_MODE_INNER (inner_mode);
unsigned int count = gcd (min_nunits, 4);
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
rtx x = builder.build ();
The nunits = (1,1), min_nunits = 1, count = 1. I print out these variable value may helpful for you to check. Thanks!
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong? Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
> expected: (nil)
>
> actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
Hmm, ok, so the subreg operation is unexpected succeeding.
> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
The stepped case is 3 elements per pattern rather than 2. In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
Why is (1,1) different though? The test is constructing:
nunits: 1 + 1x
shape: nelts_per_pattern == 2, npatterns == 1
elements: a, b[, b, b, b, b, ...]
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
nunits: 2 + 2x
shape: nelts_per_pattern == 2, npatterns == 2
elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
and we should test subregs starting at 0 + 2x (so starting at the
first b1). The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
(1,1) shouldn't (need to) be a special case though. Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated. But it shouldn't be necessary to do that for correctness.
Thanks,
Richard
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>
> Ah, right, sorry for the bogus suggestion.
>
> In that case, what specifically goes wrong? Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Thanks,
> Richard
>
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>> (const_int 8 [0x8])
>>> (const_int 7 [0x7])
>>> ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>> gcc/simplify-rtx.cc | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>> rtx x = builder.build ();
>>>
>>> test_vector_subregs_modes (x);
>>> - if (!nunits.is_constant ())
>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>
>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>> {
>> test_vector_ops_series (mode, scalar_reg);
>> test_vector_subregs (mode);
>> }
>>
>> Thanks,
>> Richard
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-19 12:52 ` Richard Sandiford
2022-08-19 14:10 ` 钟居哲
@ 2022-08-19 14:34 ` 钟居哲
2022-08-20 1:23 ` 钟居哲
2 siblings, 0 replies; 14+ messages in thread
From: 钟居哲 @ 2022-08-19 14:34 UTC (permalink / raw)
To: richard.sandiford; +Cc: gcc-patches, rguenther, kito.cheng
I rewrite test_vector_subregs_fore_back as follows:
static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
unsigned int min_nunits = constant_lower_bound (nunits);
scalar_mode int_mode = GET_MODE_INNER (inner_mode);
unsigned int count = gcd (min_nunits, 4);
rtx_vector_builder builder (inner_mode, count, 2);
unsigned int nelts_per_pattern = count == 1 ? 2 : count;
for (unsigned int i = 0; i < nelts_per_pattern; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
rtx x = builder.build ();
test_vector_subregs_modes (x);
if (!nunits.is_constant ())
test_vector_subregs_modes (x, nunits - min_nunits, count);
}
I add the code: unsigned int nelts_per_pattern = count == 1 ? 2 : count;
then replace the "count" into "nelts_per_pattern " in the first loop.
It can pass now. And "x" value I print out seems to be correct:
(const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 1 [0x1])
]
])
Is this correct solution ?
Thanks.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong? Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
> expected: (nil)
>
> actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
Hmm, ok, so the subreg operation is unexpected succeeding.
> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
The stepped case is 3 elements per pattern rather than 2. In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
Why is (1,1) different though? The test is constructing:
nunits: 1 + 1x
shape: nelts_per_pattern == 2, npatterns == 1
elements: a, b[, b, b, b, b, ...]
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
nunits: 2 + 2x
shape: nelts_per_pattern == 2, npatterns == 2
elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
and we should test subregs starting at 0 + 2x (so starting at the
first b1). The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
(1,1) shouldn't (need to) be a special case though. Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated. But it shouldn't be necessary to do that for correctness.
Thanks,
Richard
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>
> Ah, right, sorry for the bogus suggestion.
>
> In that case, what specifically goes wrong? Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Thanks,
> Richard
>
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>> (const_int 8 [0x8])
>>> (const_int 7 [0x7])
>>> ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>> gcc/simplify-rtx.cc | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>> rtx x = builder.build ();
>>>
>>> test_vector_subregs_modes (x);
>>> - if (!nunits.is_constant ())
>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>
>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>> {
>> test_vector_ops_series (mode, scalar_reg);
>> test_vector_subregs (mode);
>> }
>>
>> Thanks,
>> Richard
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-19 12:52 ` Richard Sandiford
2022-08-19 14:10 ` 钟居哲
2022-08-19 14:34 ` 钟居哲
@ 2022-08-20 1:23 ` 钟居哲
2022-08-22 8:31 ` Richard Sandiford
2 siblings, 1 reply; 14+ messages in thread
From: 钟居哲 @ 2022-08-20 1:23 UTC (permalink / raw)
To: richard.sandiford; +Cc: gcc-patches, rguenther, kito.cheng
After deep analysis and several tries. I have made a analysis and conclusion as follows.
I really appreciate if you could spend some time take a look at the following analysis that I made:
According to the codes in test_vector_subregs_fore_back:
~~~
poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
unsigned int min_nunits = constant_lower_bound (nunits);
scalar_mode int_mode = GET_MODE_INNER (inner_mode);
unsigned int count = gcd (min_nunits, 4);
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
rtx x = builder.build ();
~~~~
Analyze the codes and tried several patterns:
For poly_uint16 (2,2):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
repeat [
(const_int 0 [0])
(const_int -1 [0xffffffffffffffff])
]
])
For poly_uint16 (4,4):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
repeat [
(const_int 0 [0])
(const_int -1 [0xffffffffffffffff])
(const_int -2 [0xfffffffffffffffe])
(const_int -3 [0xfffffffffffffffd])
]
])
So, I think I can conclude the pattern rule as follows:
poly_uint16 (n,n):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
..................
(const_int n [hex(n)])
repeat [
(const_int 0 [0])
(const_int -1 [0xffffffffffffffff])
(const_int -2 [0xfffffffffffffffe])
(const_int -3 [0xfffffffffffffffd])
.........
(const_int -n [hex(-n)])
]
])
Am I understanding right?
Let's first take a look at the codes you write:
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
rtx x = builder.build ();
So for poly_uint (1,1):
Ideally according to the codes, you want to generate the pattern like this:
x = (const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 0 [0])
]
])
In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
In this case, i = -i = 0.
So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
x = (const_vector:VNx1DI repeat [
(const_int 0 [0])
])
This is a const_vector duplicate vector.
This is not the correct pattern you want so it fails in the following check.
I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
================
Original codes:
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
================
================
1st solution:
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
x = (const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 1 [0x1])
]
])
================
================
2nd solution:
rtx_vector_builder builder (inner_mode, count, 2);
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
x=
(const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int -1 [0xffffffffffffffff])
]
])
================
Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong? Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
> expected: (nil)
>
> actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
Hmm, ok, so the subreg operation is unexpected succeeding.
> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
The stepped case is 3 elements per pattern rather than 2. In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
Why is (1,1) different though? The test is constructing:
nunits: 1 + 1x
shape: nelts_per_pattern == 2, npatterns == 1
elements: a, b[, b, b, b, b, ...]
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
nunits: 2 + 2x
shape: nelts_per_pattern == 2, npatterns == 2
elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
and we should test subregs starting at 0 + 2x (so starting at the
first b1). The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
(1,1) shouldn't (need to) be a special case though. Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).
So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated. But it shouldn't be necessary to do that for correctness.
Thanks,
Richard
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>
> Ah, right, sorry for the bogus suggestion.
>
> In that case, what specifically goes wrong? Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Thanks,
> Richard
>
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> Finally, I tried:
>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It passed with no warning.
>>
>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>> Thanks!
>>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 16:03
>> To: juzhe.zhong
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> juzhe.zhong@rivai.ai writes:
>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>
>>> Hello. This patch is preparing for following RVV support.
>>>
>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>
>>> So I define the machine_mode as follows:
>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>
>>> The compilation is failed for the stepped vector test:
>>> (const_vector:VNx1DI repeat [
>>> (const_int 8 [0x8])
>>> (const_int 7 [0x7])
>>> ])
>>>
>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>
>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>
>>> ---
>>> gcc/simplify-rtx.cc | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>> index 7d09bf7103d..61e0dfa00d0 100644
>>> --- a/gcc/simplify-rtx.cc
>>> +++ b/gcc/simplify-rtx.cc
>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>> rtx x = builder.build ();
>>>
>>> test_vector_subregs_modes (x);
>>> - if (!nunits.is_constant ())
>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>> the fore_back tests require vectors that have a minimum of 2 elements.
>> Something like poly_uint16 (1, 2) would have the same problem as
>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>> principle.)
>>
>> This corresponds to the minimum of 3 elements for the stepped tests:
>>
>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>> {
>> test_vector_ops_series (mode, scalar_reg);
>> test_vector_subregs (mode);
>> }
>>
>> Thanks,
>> Richard
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-20 1:23 ` 钟居哲
@ 2022-08-22 8:31 ` Richard Sandiford
2022-08-22 8:45 ` juzhe.zhong
0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2022-08-22 8:31 UTC (permalink / raw)
To: 钟居哲; +Cc: rguenther, gcc-patches, kito.cheng
钟居哲 <juzhe.zhong@rivai.ai> writes:
> After deep analysis and several tries. I have made a analysis and conclusion as follows.
> I really appreciate if you could spend some time take a look at the following analysis that I made:
>
> According to the codes in test_vector_subregs_fore_back:
>
> ~~~
> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
> unsigned int min_nunits = constant_lower_bound (nunits);
> scalar_mode int_mode = GET_MODE_INNER (inner_mode);
> unsigned int count = gcd (min_nunits, 4);
>
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> rtx x = builder.build ();
> ~~~~
>
> Analyze the codes and tried several patterns:
>
> For poly_uint16 (2,2):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> ]
> ])
>
> For poly_uint16 (4,4):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> (const_int -2 [0xfffffffffffffffe])
> (const_int -3 [0xfffffffffffffffd])
> ]
> ])
>
> So, I think I can conclude the pattern rule as follows:
>
> poly_uint16 (n,n):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> ..................
> (const_int n [hex(n)])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> (const_int -2 [0xfffffffffffffffe])
> (const_int -3 [0xfffffffffffffffd])
> .........
> (const_int -n [hex(-n)])
> ]
> ])
> Am I understanding right?
>
> Let's first take a look at the codes you write:
>
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> rtx x = builder.build ();
>
> So for poly_uint (1,1):
> Ideally according to the codes, you want to generate the pattern like this:
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int 0 [0])
> ]
> ])
Ah, right, the two subsequences are supposed to be different.
> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
> In this case, i = -i = 0.
>
> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>
> x = (const_vector:VNx1DI repeat [
> (const_int 0 [0])
> ])
>
> This is a const_vector duplicate vector.
> This is not the correct pattern you want so it fails in the following check.
>
> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>
> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>
> ================
> Original codes:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> ================
>
> ================
> 1st solution:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int 1 [0x1])
> ]
> ])
> ================
>
> ================
> 2nd solution:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
I don't think we need to single out count == 1. It should be OK to use
-1 - (int) i unconditionally.
Thanks,
Richard
> x=
> (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int -1 [0xffffffffffffffff])
> ]
> ])
> ================
>
> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-19 20:52
> To: juzhe.zhong\@rivai.ai
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Ah, right, sorry for the bogus suggestion.
>>> In that case, what specifically goes wrong? Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>> The fail ICE:
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>> expected: (nil)
>>
>> actual: (const_int 0 [0])
>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>> 0x1332504 test_vector_subregs_modes
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>> 0x1332988 test_vector_subregs_fore_back
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>> 0x1332ae7 test_vector_subregs
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>> 0x1332c57 test_vector_ops
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>> 0x21318fa selftest::run_tests()
>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>> 0x1362a76 toplev::run_self_tests()
>> ../../../riscv-gcc/gcc/toplev.cc:2205
>>
>> I analyzed the codes:
>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>> So the assertion fails.
>
> Hmm, ok, so the subreg operation is unexpected succeeding.
>
>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
>
> The stepped case is 3 elements per pattern rather than 2. In a stepped
> pattern: a, b, b+n are represented explicitly, then the rest are
> implicitly b+n*2, b+n*3, etc.
>
> The case being handled by this code is instead the 2-element case:
> a, b are represented explicitly, then the rest are implicitly all b.
>
> Why is (1,1) different though? The test is constructing:
>
> nunits: 1 + 1x
> shape: nelts_per_pattern == 2, npatterns == 1
> elements: a, b[, b, b, b, b, ...]
>
> It then tests subregs starting at 0 + 1x (so starting at the first b).
> But for (2,2) we should have:
>
> nunits: 2 + 2x
> shape: nelts_per_pattern == 2, npatterns == 2
> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>
> and we should test subregs starting at 0 + 2x (so starting at the
> first b1). The two cases should be very similar, it's just that the
> (2,2) case doubles the number of patterns.
>
>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>
> Following on from what I said above, it doesn't look like this particular
> case is related to stepped vectors.
>
> (1,1) shouldn't (need to) be a special case though. Any potentital
> problems that would occur for (1,1) with npatterns==1 would also occur
> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
> would be problematic for (2,2).
>
> So yeah, preventing a mode being used for autovectorisation would allow
> the target to have a bit more control over which constants are actually
> generated. But it shouldn't be necessary to do that for correctness.
>
> Thanks,
> Richard
>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 17:35
>> To: juzhe.zhong\@rivai.ai
>> CC: rguenther; gcc-patches; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Hi, Richard. I tried the codes:
>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>
>> Ah, right, sorry for the bogus suggestion.
>>
>> In that case, what specifically goes wrong? Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Thanks,
>> Richard
>>
>>> But I tried:
>>> if (!nunits.is_constant () && known_gt (nunits, 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>
>>> Finally, I tried:
>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It passed with no warning.
>>>
>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>> Thanks!
>>>
>>>
>>> juzhe.zhong@rivai.ai
>>>
>>> From: Richard Sandiford
>>> Date: 2022-08-19 16:03
>>> To: juzhe.zhong
>>> CC: gcc-patches; rguenther; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> juzhe.zhong@rivai.ai writes:
>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>
>>>> Hello. This patch is preparing for following RVV support.
>>>>
>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>
>>>> So I define the machine_mode as follows:
>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>
>>>> The compilation is failed for the stepped vector test:
>>>> (const_vector:VNx1DI repeat [
>>>> (const_int 8 [0x8])
>>>> (const_int 7 [0x7])
>>>> ])
>>>>
>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>
>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>
>>>> ---
>>>> gcc/simplify-rtx.cc | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>> --- a/gcc/simplify-rtx.cc
>>>> +++ b/gcc/simplify-rtx.cc
>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>> rtx x = builder.build ();
>>>>
>>>> test_vector_subregs_modes (x);
>>>> - if (!nunits.is_constant ())
>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>> Something like poly_uint16 (1, 2) would have the same problem as
>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>>> principle.)
>>>
>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>
>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>> {
>>> test_vector_ops_series (mode, scalar_reg);
>>> test_vector_subregs (mode);
>>> }
>>>
>>> Thanks,
>>> Richard
>>>
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-22 8:31 ` Richard Sandiford
@ 2022-08-22 8:45 ` juzhe.zhong
2022-08-22 8:56 ` Richard Sandiford
0 siblings, 1 reply; 14+ messages in thread
From: juzhe.zhong @ 2022-08-22 8:45 UTC (permalink / raw)
To: richard.sandiford; +Cc: rguenther, gcc-patches, kito.cheng
Thanks for your reply. Your suggestion "-1 - (int) i" is better.
Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?
Or you change it yourself?
Thank you so much.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-08-22 16:31
To: 钟居哲
CC: rguenther; gcc-patches; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
钟居哲 <juzhe.zhong@rivai.ai> writes:
> After deep analysis and several tries. I have made a analysis and conclusion as follows.
> I really appreciate if you could spend some time take a look at the following analysis that I made:
>
> According to the codes in test_vector_subregs_fore_back:
>
> ~~~
> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
> unsigned int min_nunits = constant_lower_bound (nunits);
> scalar_mode int_mode = GET_MODE_INNER (inner_mode);
> unsigned int count = gcd (min_nunits, 4);
>
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> rtx x = builder.build ();
> ~~~~
>
> Analyze the codes and tried several patterns:
>
> For poly_uint16 (2,2):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> ]
> ])
>
> For poly_uint16 (4,4):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> (const_int -2 [0xfffffffffffffffe])
> (const_int -3 [0xfffffffffffffffd])
> ]
> ])
>
> So, I think I can conclude the pattern rule as follows:
>
> poly_uint16 (n,n):
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> ..................
> (const_int n [hex(n)])
> repeat [
> (const_int 0 [0])
> (const_int -1 [0xffffffffffffffff])
> (const_int -2 [0xfffffffffffffffe])
> (const_int -3 [0xfffffffffffffffd])
> .........
> (const_int -n [hex(-n)])
> ]
> ])
> Am I understanding right?
>
> Let's first take a look at the codes you write:
>
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> rtx x = builder.build ();
>
> So for poly_uint (1,1):
> Ideally according to the codes, you want to generate the pattern like this:
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int 0 [0])
> ]
> ])
Ah, right, the two subsequences are supposed to be different.
> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
> In this case, i = -i = 0.
>
> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>
> x = (const_vector:VNx1DI repeat [
> (const_int 0 [0])
> ])
>
> This is a const_vector duplicate vector.
> This is not the correct pattern you want so it fails in the following check.
>
> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>
> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>
> ================
> Original codes:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
> ================
>
> ================
> 1st solution:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>
> x = (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int 1 [0x1])
> ]
> ])
> ================
>
> ================
> 2nd solution:
> rtx_vector_builder builder (inner_mode, count, 2);
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (i, int_mode));
> for (unsigned int i = 0; i < count; ++i)
> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
I don't think we need to single out count == 1. It should be OK to use
-1 - (int) i unconditionally.
Thanks,
Richard
> x=
> (const_vector:VNx1DI [
> (const_int 0 [0])
> repeat [
> (const_int -1 [0xffffffffffffffff])
> ]
> ])
> ================
>
> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-19 20:52
> To: juzhe.zhong\@rivai.ai
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Ah, right, sorry for the bogus suggestion.
>>> In that case, what specifically goes wrong? Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>> The fail ICE:
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>> expected: (nil)
>>
>> actual: (const_int 0 [0])
>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>> 0x1332504 test_vector_subregs_modes
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>> 0x1332988 test_vector_subregs_fore_back
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>> 0x1332ae7 test_vector_subregs
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>> 0x1332c57 test_vector_ops
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>> 0x21318fa selftest::run_tests()
>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>> 0x1362a76 toplev::run_self_tests()
>> ../../../riscv-gcc/gcc/toplev.cc:2205
>>
>> I analyzed the codes:
>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>> So the assertion fails.
>
> Hmm, ok, so the subreg operation is unexpected succeeding.
>
>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
>
> The stepped case is 3 elements per pattern rather than 2. In a stepped
> pattern: a, b, b+n are represented explicitly, then the rest are
> implicitly b+n*2, b+n*3, etc.
>
> The case being handled by this code is instead the 2-element case:
> a, b are represented explicitly, then the rest are implicitly all b.
>
> Why is (1,1) different though? The test is constructing:
>
> nunits: 1 + 1x
> shape: nelts_per_pattern == 2, npatterns == 1
> elements: a, b[, b, b, b, b, ...]
>
> It then tests subregs starting at 0 + 1x (so starting at the first b).
> But for (2,2) we should have:
>
> nunits: 2 + 2x
> shape: nelts_per_pattern == 2, npatterns == 2
> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>
> and we should test subregs starting at 0 + 2x (so starting at the
> first b1). The two cases should be very similar, it's just that the
> (2,2) case doubles the number of patterns.
>
>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>
> Following on from what I said above, it doesn't look like this particular
> case is related to stepped vectors.
>
> (1,1) shouldn't (need to) be a special case though. Any potentital
> problems that would occur for (1,1) with npatterns==1 would also occur
> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
> would be problematic for (2,2).
>
> So yeah, preventing a mode being used for autovectorisation would allow
> the target to have a bit more control over which constants are actually
> generated. But it shouldn't be necessary to do that for correctness.
>
> Thanks,
> Richard
>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 17:35
>> To: juzhe.zhong\@rivai.ai
>> CC: rguenther; gcc-patches; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>> Hi, Richard. I tried the codes:
>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>
>> Ah, right, sorry for the bogus suggestion.
>>
>> In that case, what specifically goes wrong? Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>>
>> Thanks,
>> Richard
>>
>>> But I tried:
>>> if (!nunits.is_constant () && known_gt (nunits, 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>
>>> Finally, I tried:
>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>> It passed with no warning.
>>>
>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>> Thanks!
>>>
>>>
>>> juzhe.zhong@rivai.ai
>>>
>>> From: Richard Sandiford
>>> Date: 2022-08-19 16:03
>>> To: juzhe.zhong
>>> CC: gcc-patches; rguenther; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> juzhe.zhong@rivai.ai writes:
>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>
>>>> Hello. This patch is preparing for following RVV support.
>>>>
>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>
>>>> So I define the machine_mode as follows:
>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>
>>>> The compilation is failed for the stepped vector test:
>>>> (const_vector:VNx1DI repeat [
>>>> (const_int 8 [0x8])
>>>> (const_int 7 [0x7])
>>>> ])
>>>>
>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>
>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>
>>>> ---
>>>> gcc/simplify-rtx.cc | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>> --- a/gcc/simplify-rtx.cc
>>>> +++ b/gcc/simplify-rtx.cc
>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>> rtx x = builder.build ();
>>>>
>>>> test_vector_subregs_modes (x);
>>>> - if (!nunits.is_constant ())
>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>
>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>> Something like poly_uint16 (1, 2) would have the same problem as
>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>>> principle.)
>>>
>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>
>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>> {
>>> test_vector_ops_series (mode, scalar_reg);
>>> test_vector_subregs (mode);
>>> }
>>>
>>> Thanks,
>>> Richard
>>>
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-22 8:45 ` juzhe.zhong
@ 2022-08-22 8:56 ` Richard Sandiford
2022-08-22 9:12 ` juzhe.zhong
0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2022-08-22 8:56 UTC (permalink / raw)
To: juzhe.zhong; +Cc: gcc-patches, rguenther, kito.cheng
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks for your reply. Your suggestion "-1 - (int) i" is better.
>
> Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
> into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?
Yeah, please send a patch, and I'll merge it like you say.
Thanks,
Richard
>
> Or you change it yourself?
>
> Thank you so much.
>
>
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-22 16:31
> To: 钟居哲
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> 钟居哲 <juzhe.zhong@rivai.ai> writes:
>> After deep analysis and several tries. I have made a analysis and conclusion as follows.
>> I really appreciate if you could spend some time take a look at the following analysis that I made:
>>
>> According to the codes in test_vector_subregs_fore_back:
>>
>> ~~~
>> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>> unsigned int min_nunits = constant_lower_bound (nunits);
>> scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>> unsigned int count = gcd (min_nunits, 4);
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> rtx x = builder.build ();
>> ~~~~
>>
>> Analyze the codes and tried several patterns:
>>
>> For poly_uint16 (2,2):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> ]
>> ])
>>
>> For poly_uint16 (4,4):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> (const_int 2 [0x2])
>> (const_int 3 [0x3])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> (const_int -2 [0xfffffffffffffffe])
>> (const_int -3 [0xfffffffffffffffd])
>> ]
>> ])
>>
>> So, I think I can conclude the pattern rule as follows:
>>
>> poly_uint16 (n,n):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> (const_int 2 [0x2])
>> (const_int 3 [0x3])
>> ..................
>> (const_int n [hex(n)])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> (const_int -2 [0xfffffffffffffffe])
>> (const_int -3 [0xfffffffffffffffd])
>> .........
>> (const_int -n [hex(-n)])
>> ]
>> ])
>> Am I understanding right?
>>
>> Let's first take a look at the codes you write:
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> rtx x = builder.build ();
>>
>> So for poly_uint (1,1):
>> Ideally according to the codes, you want to generate the pattern like this:
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int 0 [0])
>> ]
>> ])
>
> Ah, right, the two subsequences are supposed to be different.
>
>> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
>> In this case, i = -i = 0.
>>
>> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
>> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>>
>> x = (const_vector:VNx1DI repeat [
>> (const_int 0 [0])
>> ])
>>
>> This is a const_vector duplicate vector.
>> This is not the correct pattern you want so it fails in the following check.
>>
>> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>>
>> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
>> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>>
>> ================
>> Original codes:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> ================
>>
>> ================
>> 1st solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int 1 [0x1])
>> ]
>> ])
>> ================
>>
>> ================
>> 2nd solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
>
> I don't think we need to single out count == 1. It should be OK to use
> -1 - (int) i unconditionally.
>
> Thanks,
> Richard
>
>> x=
>> (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int -1 [0xffffffffffffffff])
>> ]
>> ])
>> ================
>>
>> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 20:52
>> To: juzhe.zhong\@rivai.ai
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Ah, right, sorry for the bogus suggestion.
>>>> In that case, what specifically goes wrong? Which test in
>>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>>> The fail ICE:
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>> expected: (nil)
>>>
>>> actual: (const_int 0 [0])
>>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>>> 0x1332504 test_vector_subregs_modes
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>>> 0x1332988 test_vector_subregs_fore_back
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>>> 0x1332ae7 test_vector_subregs
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>>> 0x1332c57 test_vector_ops
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>>> 0x21318fa selftest::run_tests()
>>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>>> 0x1362a76 toplev::run_self_tests()
>>> ../../../riscv-gcc/gcc/toplev.cc:2205
>>>
>>> I analyzed the codes:
>>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>>> So the assertion fails.
>>
>> Hmm, ok, so the subreg operation is unexpected succeeding.
>>
>>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
>>
>> The stepped case is 3 elements per pattern rather than 2. In a stepped
>> pattern: a, b, b+n are represented explicitly, then the rest are
>> implicitly b+n*2, b+n*3, etc.
>>
>> The case being handled by this code is instead the 2-element case:
>> a, b are represented explicitly, then the rest are implicitly all b.
>>
>> Why is (1,1) different though? The test is constructing:
>>
>> nunits: 1 + 1x
>> shape: nelts_per_pattern == 2, npatterns == 1
>> elements: a, b[, b, b, b, b, ...]
>>
>> It then tests subregs starting at 0 + 1x (so starting at the first b).
>> But for (2,2) we should have:
>>
>> nunits: 2 + 2x
>> shape: nelts_per_pattern == 2, npatterns == 2
>> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>>
>> and we should test subregs starting at 0 + 2x (so starting at the
>> first b1). The two cases should be very similar, it's just that the
>> (2,2) case doubles the number of patterns.
>>
>>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>>
>> Following on from what I said above, it doesn't look like this particular
>> case is related to stepped vectors.
>>
>> (1,1) shouldn't (need to) be a special case though. Any potentital
>> problems that would occur for (1,1) with npatterns==1 would also occur
>> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
>> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
>> would be problematic for (2,2).
>>
>> So yeah, preventing a mode being used for autovectorisation would allow
>> the target to have a bit more control over which constants are actually
>> generated. But it shouldn't be necessary to do that for correctness.
>>
>> Thanks,
>> Richard
>>
>>> juzhe.zhong@rivai.ai
>>>
>>> From: Richard Sandiford
>>> Date: 2022-08-19 17:35
>>> To: juzhe.zhong\@rivai.ai
>>> CC: rguenther; gcc-patches; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Hi, Richard. I tried the codes:
>>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>>
>>> Ah, right, sorry for the bogus suggestion.
>>>
>>> In that case, what specifically goes wrong? Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Thanks,
>>> Richard
>>>
>>>> But I tried:
>>>> if (!nunits.is_constant () && known_gt (nunits, 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>>
>>>> Finally, I tried:
>>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It passed with no warning.
>>>>
>>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>>> Thanks!
>>>>
>>>>
>>>> juzhe.zhong@rivai.ai
>>>>
>>>> From: Richard Sandiford
>>>> Date: 2022-08-19 16:03
>>>> To: juzhe.zhong
>>>> CC: gcc-patches; rguenther; kito.cheng
>>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>>> juzhe.zhong@rivai.ai writes:
>>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>>
>>>>> Hello. This patch is preparing for following RVV support.
>>>>>
>>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>>
>>>>> So I define the machine_mode as follows:
>>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>>
>>>>> The compilation is failed for the stepped vector test:
>>>>> (const_vector:VNx1DI repeat [
>>>>> (const_int 8 [0x8])
>>>>> (const_int 7 [0x7])
>>>>> ])
>>>>>
>>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>>
>>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>>
>>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>>
>>>>> ---
>>>>> gcc/simplify-rtx.cc | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>>> --- a/gcc/simplify-rtx.cc
>>>>> +++ b/gcc/simplify-rtx.cc
>>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>> rtx x = builder.build ();
>>>>>
>>>>> test_vector_subregs_modes (x);
>>>>> - if (!nunits.is_constant ())
>>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>>> Something like poly_uint16 (1, 2) would have the same problem as
>>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>>>> principle.)
>>>>
>>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>>
>>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>> {
>>>> test_vector_ops_series (mode, scalar_reg);
>>>> test_vector_subregs (mode);
>>>> }
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
2022-08-22 8:56 ` Richard Sandiford
@ 2022-08-22 9:12 ` juzhe.zhong
0 siblings, 0 replies; 14+ messages in thread
From: juzhe.zhong @ 2022-08-22 9:12 UTC (permalink / raw)
To: richard.sandiford; +Cc: gcc-patches, rguenther, kito.cheng
I sent another patch called "Fix issue of poly_uint16 (1,1) in self test" to fix it.
I didn't send V2 following this patch because this patch name is misleading.
Thank you so much. Richard.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-08-22 16:56
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks for your reply. Your suggestion "-1 - (int) i" is better.
>
> Can I send another patch that changes "builder.quick_push (gen_int_mode (-(int) i, int_mode));"
> into “builder.quick_push (gen_int_mode (-1 - (int) i, int_mode));" then you merge it for me?
Yeah, please send a patch, and I'll merge it like you say.
Thanks,
Richard
>
> Or you change it yourself?
>
> Thank you so much.
>
>
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Sandiford
> Date: 2022-08-22 16:31
> To: 钟居哲
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
> 钟居哲 <juzhe.zhong@rivai.ai> writes:
>> After deep analysis and several tries. I have made a analysis and conclusion as follows.
>> I really appreciate if you could spend some time take a look at the following analysis that I made:
>>
>> According to the codes in test_vector_subregs_fore_back:
>>
>> ~~~
>> poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
>> unsigned int min_nunits = constant_lower_bound (nunits);
>> scalar_mode int_mode = GET_MODE_INNER (inner_mode);
>> unsigned int count = gcd (min_nunits, 4);
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> rtx x = builder.build ();
>> ~~~~
>>
>> Analyze the codes and tried several patterns:
>>
>> For poly_uint16 (2,2):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> ]
>> ])
>>
>> For poly_uint16 (4,4):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> (const_int 2 [0x2])
>> (const_int 3 [0x3])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> (const_int -2 [0xfffffffffffffffe])
>> (const_int -3 [0xfffffffffffffffd])
>> ]
>> ])
>>
>> So, I think I can conclude the pattern rule as follows:
>>
>> poly_uint16 (n,n):
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> (const_int 1 [0x1])
>> (const_int 2 [0x2])
>> (const_int 3 [0x3])
>> ..................
>> (const_int n [hex(n)])
>> repeat [
>> (const_int 0 [0])
>> (const_int -1 [0xffffffffffffffff])
>> (const_int -2 [0xfffffffffffffffe])
>> (const_int -3 [0xfffffffffffffffd])
>> .........
>> (const_int -n [hex(-n)])
>> ]
>> ])
>> Am I understanding right?
>>
>> Let's first take a look at the codes you write:
>>
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> rtx x = builder.build ();
>>
>> So for poly_uint (1,1):
>> Ideally according to the codes, you want to generate the pattern like this:
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int 0 [0])
>> ]
>> ])
>
> Ah, right, the two subsequences are supposed to be different.
>
>> In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration make i always is 0.
>> In this case, i = -i = 0.
>>
>> So we will end up with the first value = 0, and repeat value is also 0, in "rtx x = builder.build ();".
>> Because GCC thinks all the value are 0, so the pattern is changed as follows which is not a stepped vector:
>>
>> x = (const_vector:VNx1DI repeat [
>> (const_int 0 [0])
>> ])
>>
>> This is a const_vector duplicate vector.
>> This is not the correct pattern you want so it fails in the following check.
>>
>> I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the only corner case will go wrong.
>>
>> To fix this, we should adjust test pattern generator to specifically poly_uint16 (1,1). We should make first value
>> and repeat value different. I tried 2 solution and they both pass for poly_uint16 (1,1):
>>
>> ================
>> Original codes:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>> ================
>>
>> ================
>> 1st solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (-(int) i, int_mode));
>>
>> x = (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int 1 [0x1])
>> ]
>> ])
>> ================
>>
>> ================
>> 2nd solution:
>> rtx_vector_builder builder (inner_mode, count, 2);
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (i, int_mode));
>> for (unsigned int i = 0; i < count; ++i)
>> builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));
>
> I don't think we need to single out count == 1. It should be OK to use
> -1 - (int) i unconditionally.
>
> Thanks,
> Richard
>
>> x=
>> (const_vector:VNx1DI [
>> (const_int 0 [0])
>> repeat [
>> (const_int -1 [0xffffffffffffffff])
>> ]
>> ])
>> ================
>>
>> Do you agree with these solution? Or you could offer a better solution to fix this? Thank you so much.
>>
>>
>> juzhe.zhong@rivai.ai
>>
>> From: Richard Sandiford
>> Date: 2022-08-19 20:52
>> To: juzhe.zhong\@rivai.ai
>> CC: gcc-patches; rguenther; kito.cheng
>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Ah, right, sorry for the bogus suggestion.
>>>> In that case, what specifically goes wrong? Which test in
>>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute test_vector_subregs_modes.
>>> The fail ICE:
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>>> expected: (nil)
>>>
>>> actual: (const_int 0 [0])
>>> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
>>> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, rtx_def*, rtx_def*)
>>> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
>>> 0x1332504 test_vector_subregs_modes
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
>>> 0x1332988 test_vector_subregs_fore_back
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
>>> 0x1332ae7 test_vector_subregs
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
>>> 0x1332c57 test_vector_ops
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
>>> 0x1332c7b selftest::simplify_rtx_cc_tests()
>>> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
>>> 0x21318fa selftest::run_tests()
>>> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
>>> 0x1362a76 toplev::run_self_tests()
>>> ../../../riscv-gcc/gcc/toplev.cc:2205
>>>
>>> I analyzed the codes:
>>> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
>>> So the assertion fails.
>>
>> Hmm, ok, so the subreg operation is unexpected succeeding.
>>
>>> This is the test for stepped vector using 2 element per pattern. For poly_uint16 (1,1), it's true it is possible only has 1 element.
>>
>> The stepped case is 3 elements per pattern rather than 2. In a stepped
>> pattern: a, b, b+n are represented explicitly, then the rest are
>> implicitly b+n*2, b+n*3, etc.
>>
>> The case being handled by this code is instead the 2-element case:
>> a, b are represented explicitly, then the rest are implicitly all b.
>>
>> Why is (1,1) different though? The test is constructing:
>>
>> nunits: 1 + 1x
>> shape: nelts_per_pattern == 2, npatterns == 1
>> elements: a, b[, b, b, b, b, ...]
>>
>> It then tests subregs starting at 0 + 1x (so starting at the first b).
>> But for (2,2) we should have:
>>
>> nunits: 2 + 2x
>> shape: nelts_per_pattern == 2, npatterns == 2
>> elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
>>
>> and we should test subregs starting at 0 + 2x (so starting at the
>> first b1). The two cases should be very similar, it's just that the
>> (2,2) case doubles the number of patterns.
>>
>>> I think it makes sense to fail the test. However for poly (1,1) machine mode, can we have the chance that some target define this
>>> kind of machine mode only used for intrinsics? I already developed full RVV support in GCC (including intrinsc and auto-vectorization).
>>> I only enable auto-vectorization with mode larger than (2,2) and test it fully.
>>> From my experience, it seems the stepped vector only created during VLA auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics will
>>> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
>>
>> Following on from what I said above, it doesn't look like this particular
>> case is related to stepped vectors.
>>
>> (1,1) shouldn't (need to) be a special case though. Any potentital
>> problems that would occur for (1,1) with npatterns==1 would also occur
>> for (n,n) with npatterns==n. E.g. if stepped vectors are problematic
>> for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
>> would be problematic for (2,2).
>>
>> So yeah, preventing a mode being used for autovectorisation would allow
>> the target to have a bit more control over which constants are actually
>> generated. But it shouldn't be necessary to do that for correctness.
>>
>> Thanks,
>> Richard
>>
>>> juzhe.zhong@rivai.ai
>>>
>>> From: Richard Sandiford
>>> Date: 2022-08-19 17:35
>>> To: juzhe.zhong\@rivai.ai
>>> CC: rguenther; gcc-patches; kito.cheng
>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>> "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
>>>> Hi, Richard. I tried the codes:
>>>> if (!nunits.is_constant () && maybe_gt (nunits, 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>>>
>>> Ah, right, sorry for the bogus suggestion.
>>>
>>> In that case, what specifically goes wrong? Which test in
>>> test_vector_subregs_modes fails, and what does the ICE look like?
>>>
>>> Thanks,
>>> Richard
>>>
>>>> But I tried:
>>>> if (!nunits.is_constant () && known_gt (nunits, 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It pass. But it report a warning: "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]" during the compilation.
>>>>
>>>> Finally, I tried:
>>>> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1))
>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>> It passed with no warning.
>>>>
>>>> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
>>>> Thanks!
>>>>
>>>>
>>>> juzhe.zhong@rivai.ai
>>>>
>>>> From: Richard Sandiford
>>>> Date: 2022-08-19 16:03
>>>> To: juzhe.zhong
>>>> CC: gcc-patches; rguenther; kito.cheng
>>>> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)
>>>> juzhe.zhong@rivai.ai writes:
>>>>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>>>>>
>>>>> Hello. This patch is preparing for following RVV support.
>>>>>
>>>>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>>>>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of ARM SVE is always 128-bit blocks.
>>>>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' sub-extension and 64bit in 'Zve*' sub-extension.
>>>>>
>>>>> So I define the machine_mode as follows:
>>>>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>>>>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>>>>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>>>>
>>>>> The compilation is failed for the stepped vector test:
>>>>> (const_vector:VNx1DI repeat [
>>>>> (const_int 8 [0x8])
>>>>> (const_int 7 [0x7])
>>>>> ])
>>>>>
>>>>> I understand for stepped vector should always have aleast 2 elements and stepped vector initialization is common
>>>>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that report fail for stepped vector of poly_uint16 (1, 1).
>>>>>
>>>>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in intrinsics. And I would like to enable RVV auto-vectorization
>>>>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V backend. I think it will not create issue if we define
>>>>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or offer me some other better solutions. Thanks!
>>>>>
>>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for poly_uint16 (1, 1).
>>>>>
>>>>> ---
>>>>> gcc/simplify-rtx.cc | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>>>>> index 7d09bf7103d..61e0dfa00d0 100644
>>>>> --- a/gcc/simplify-rtx.cc
>>>>> +++ b/gcc/simplify-rtx.cc
>>>>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>>>> rtx x = builder.build ();
>>>>>
>>>>> test_vector_subregs_modes (x);
>>>>> - if (!nunits.is_constant ())
>>>>> + if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>>>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>>>
>>>> I think instead we should use maybe_gt (nunits, 1), on the basis that
>>>> the fore_back tests require vectors that have a minimum of 2 elements.
>>>> Something like poly_uint16 (1, 2) would have the same problem as
>>>> poly_uint16 (1, 1). ({1, 2} is an unlikely value, but it's OK in
>>>> principle.)
>>>>
>>>> This corresponds to the minimum of 3 elements for the stepped tests:
>>>>
>>>> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>>>> && maybe_gt (GET_MODE_NUNITS (mode), 2))
>>>> {
>>>> test_vector_ops_series (mode, scalar_reg);
>>>> test_vector_subregs (mode);
>>>> }
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread