* Re: Re: Lots of FAILs in gcc.target/riscv/rvv/autovec/*
2023-11-08 4:31 ` Maxim Blinov
@ 2023-11-08 4:40 ` juzhe.zhong
2023-11-08 4:44 ` Andrew Pinski
2023-11-08 14:31 ` Jeff Law
2 siblings, 0 replies; 6+ messages in thread
From: juzhe.zhong @ 2023-11-08 4:40 UTC (permalink / raw)
To: maxim.a.blinov, jeffreyalaw; +Cc: gcc, kito.cheng
[-- Attachment #1: Type: text/plain, Size: 3233 bytes --]
I am sure that Master GCC has much better VSETVL strategy than GCC-13.
And recent evaluation on our internal hardware, shows that master GCC overall is worse than previous RVV GCC I open souce in:
https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next (rvv-next)
It's odd, since I think I have support all middle-end features of rvv-next.
We are analyzing, and trying to figure out why. We must recover back the performance on GCC-14.
juzhe.zhong@rivai.ai
From: Maxim Blinov
Date: 2023-11-08 12:31
To: Jeff Law
CC: gcc; kito.cheng; juzhe.zhong
Subject: Re: Lots of FAILs in gcc.target/riscv/rvv/autovec/*
I see, thanks for clarifying, that makes sense.
In that case, what about doing the inverse? I mean, are there unique
patches in the vendor branch, and would it be useful to try to
upstream them into master? My motivation is to get the best
autovectorized code for RISC-V.
I had a go at building the TSVC benchmark (in the llvm-test-suite[1]
repository) both with the master and vendor branch gcc, and noticed
that the vendor branch gcc generally beats master in generating more
vector instructions.
If I simply count the number of instances of each vector instruction,
the average across all 36 test cases of vendor vs master gcc features
the following most prominent differences:
- vmv.x.s: 48 vs 0 (+ 48)
- vle32.v: 150 vs 50 (+ 100)
- vrgather.vv: 61 vs 0 (+ 61)
- vslidedown.vi: 61 vs 0 (+ 61)
- vse32.v: 472 vs 213 (+ 459)
- vmsgtu.vi: 30 vs 0 (+ 30)
- vadd.vi: 80 vs 30 (+ 50)
- vlm.v: 18 vs 0 (+ 18)
- vsm.v: 16 vs 0 (+ 16)
- vmv4r.v: 21 vs 7 (+ 14)
(For reference, the benchmarks are all between 20k-30k in code size.
Built with `-march=rv64imafdcv -O3`.)
Ofcourse that doesn't say anything about performance, but would it be
possible/fair to say that the vendor branch may still be better than
master for generating vectorized code for RISC-V?
What's interesting is that there's very little "regression" - I saw
only very few cases where the vendor branch removed a vector
instruction as compared to master gcc (the most often removed
instruction by the vendor branch, as compared to master, is
vsetvl/vsetvli.)
BR,
Maxim
[1]: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/Benchmarks/TSVC
On Tue, 7 Nov 2023 at 15:53, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 11/7/23 05:50, Maxim Blinov wrote:
> > Hi all,
> >
> > I can see about 500 failing tests on the
> > vendors/riscv/gcc-13-with-riscv-opts, a mostly-full list at the bottom
> > of this email. It's mostly test cases scraping for vector
> > instructions.
> Correct. There are generic vectorizer changes that would need to be
> ported over to that branch to make those tests pass. I looked at this a
> few times and ultimately gave up in the rats nest of inter-dependent
> patches in the vectorizer.
>
>
> Given the lifetime of that branch is likely nearing its end, I don't
> think there's much value left in trying to port those changes over. Any
> such effort would likely be better spent nailing down issues on the trunk.
>
> jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lots of FAILs in gcc.target/riscv/rvv/autovec/*
2023-11-08 4:31 ` Maxim Blinov
2023-11-08 4:40 ` juzhe.zhong
@ 2023-11-08 4:44 ` Andrew Pinski
2023-11-08 14:31 ` Jeff Law
2 siblings, 0 replies; 6+ messages in thread
From: Andrew Pinski @ 2023-11-08 4:44 UTC (permalink / raw)
To: Maxim Blinov; +Cc: Jeff Law, gcc, kito.cheng, juzhe.zhong
On Tue, Nov 7, 2023 at 8:33 PM Maxim Blinov via Gcc <gcc@gcc.gnu.org> wrote:
>
> I see, thanks for clarifying, that makes sense.
>
> In that case, what about doing the inverse? I mean, are there unique
> patches in the vendor branch, and would it be useful to try to
> upstream them into master? My motivation is to get the best
> autovectorized code for RISC-V.
>
> I had a go at building the TSVC benchmark (in the llvm-test-suite[1]
> repository) both with the master and vendor branch gcc, and noticed
> that the vendor branch gcc generally beats master in generating more
> vector instructions.
Note TSVC benchmark is part of GCC testsuite too:
https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=gcc/testsuite/gcc.dg/vect/tsvc/vect/tsvc;h=0a8f19a630bf39c28c6c6016bbc99a6421d83970;hb=HEAD
Thanks,
Andrew
>
> If I simply count the number of instances of each vector instruction,
> the average across all 36 test cases of vendor vs master gcc features
> the following most prominent differences:
>
> - vmv.x.s: 48 vs 0 (+ 48)
> - vle32.v: 150 vs 50 (+ 100)
> - vrgather.vv: 61 vs 0 (+ 61)
> - vslidedown.vi: 61 vs 0 (+ 61)
> - vse32.v: 472 vs 213 (+ 459)
> - vmsgtu.vi: 30 vs 0 (+ 30)
> - vadd.vi: 80 vs 30 (+ 50)
> - vlm.v: 18 vs 0 (+ 18)
> - vsm.v: 16 vs 0 (+ 16)
> - vmv4r.v: 21 vs 7 (+ 14)
>
> (For reference, the benchmarks are all between 20k-30k in code size.
> Built with `-march=rv64imafdcv -O3`.)
>
> Ofcourse that doesn't say anything about performance, but would it be
> possible/fair to say that the vendor branch may still be better than
> master for generating vectorized code for RISC-V?
>
> What's interesting is that there's very little "regression" - I saw
> only very few cases where the vendor branch removed a vector
> instruction as compared to master gcc (the most often removed
> instruction by the vendor branch, as compared to master, is
> vsetvl/vsetvli.)
>
> BR,
> Maxim
>
> [1]: https://github.com/llvm/llvm-test-suite/tree/main/MultiSource/Benchmarks/TSVC
>
> On Tue, 7 Nov 2023 at 15:53, Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 11/7/23 05:50, Maxim Blinov wrote:
> > > Hi all,
> > >
> > > I can see about 500 failing tests on the
> > > vendors/riscv/gcc-13-with-riscv-opts, a mostly-full list at the bottom
> > > of this email. It's mostly test cases scraping for vector
> > > instructions.
> > Correct. There are generic vectorizer changes that would need to be
> > ported over to that branch to make those tests pass. I looked at this a
> > few times and ultimately gave up in the rats nest of inter-dependent
> > patches in the vectorizer.
> >
> >
> > Given the lifetime of that branch is likely nearing its end, I don't
> > think there's much value left in trying to port those changes over. Any
> > such effort would likely be better spent nailing down issues on the trunk.
> >
> > jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Lots of FAILs in gcc.target/riscv/rvv/autovec/*
2023-11-08 4:31 ` Maxim Blinov
2023-11-08 4:40 ` juzhe.zhong
2023-11-08 4:44 ` Andrew Pinski
@ 2023-11-08 14:31 ` Jeff Law
2 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2023-11-08 14:31 UTC (permalink / raw)
To: Maxim Blinov; +Cc: gcc, kito.cheng, juzhe.zhong
On 11/7/23 21:31, Maxim Blinov wrote:
> I see, thanks for clarifying, that makes sense.
>
> In that case, what about doing the inverse? I mean, are there unique
> patches in the vendor branch, and would it be useful to try to
> upstream them into master? My motivation is to get the best
> autovectorized code for RISC-V.
There should be nothing on the vendor branch that is not already on the
trunk. If there is, something has gone horribly wrong.
The process we've used over there is pretty simple. Start with the
gcc-13 branch, then cherry pick risc-v backend & testsuite changes from
the trunk as well as limited target independent changes (primarily those
which the risc-v backend depends on, or which we know/expect are
important for risc-v for one reason or another).
>
> I had a go at building the TSVC benchmark (in the llvm-test-suite[1]
> repository) both with the master and vendor branch gcc, and noticed
> that the vendor branch gcc generally beats master in generating more
> vector instructions.
>
> If I simply count the number of instances of each vector instruction,
> the average across all 36 test cases of vendor vs master gcc features
> the following most prominent differences:
>
> - vmv.x.s: 48 vs 0 (+ 48)
> - vle32.v: 150 vs 50 (+ 100)
> - vrgather.vv: 61 vs 0 (+ 61)
> - vslidedown.vi: 61 vs 0 (+ 61)
> - vse32.v: 472 vs 213 (+ 459)
> - vmsgtu.vi: 30 vs 0 (+ 30)
> - vadd.vi: 80 vs 30 (+ 50)
> - vlm.v: 18 vs 0 (+ 18)
> - vsm.v: 16 vs 0 (+ 16)
> - vmv4r.v: 21 vs 7 (+ 14)
>
> (For reference, the benchmarks are all between 20k-30k in code size.
> Built with `-march=rv64imafdcv -O3`.)
>
> Ofcourse that doesn't say anything about performance, but would it be
> possible/fair to say that the vendor branch may still be better than
> master for generating vectorized code for RISC-V?
>
> What's interesting is that there's very little "regression" - I saw
> only very few cases where the vendor branch removed a vector
> instruction as compared to master gcc (the most often removed
> instruction by the vendor branch, as compared to master, is
> vsetvl/vsetvli.)
If the vendor branch is generating better code than the trunk then
that's an indication that target independent changes on the trunk from
the gcc-14 development cycle need some work ;)
Just comparing the static number of instructions isn't useful at all
IMHO. Now you can get dynamic instructions from various QEMU plugins at
which point the data becomes much more interesting -- though you have to
be careful interpreting that as well.
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread