From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Carl Love <cel@us.ibm.com>
Cc: Peter Bergner <bergner@vnet.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>
Subject: Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.
Date: Fri, 30 Jun 2023 11:37:21 +0800 [thread overview]
Message-ID: <c4614546-8b2c-895e-472a-0cf6818079ec@linux.ibm.com> (raw)
In-Reply-To: <4d3135956e493fc12311af8fce18d043769ab7a4.camel@us.ibm.com>
Hi Carl,
on 2023/6/30 05:36, Carl Love wrote:
> Kewen:
>
> On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
>>> Yea, I was going with a runnable test and didn't include the
>>> instruction counts. Added back in. Rather then doing by processor
>>> version (P8, P9, P10) I was able to do it by BE/LE. The
>>> instruction
>>> counts were the same for LE accross processor versions but there
>>> are a
>>> few instruction counts that vary with BE and LE.
>>
>> But the original test case only checks for cpu-types (processor
>> version)
>> but not for endianness, it means for the bif usages, there should not
>> be
>> different for endianness. Why does this changes with your new test
>> case?
>> Could you have a further look and make it consistent with some
>> adjustment
>> if possible? As we know, checking insn counts sometimes are fragile,
>> so
>> I think we should try our best to make it as robust as possible in
>> the
>> first place.
>>
>> Besides, the original case also have some differences between p7/p8
>> and
>> p9.
>>
>
> There are differences on P8 LE versus BE. I did a diff between the P8
> and P9 tests:
>
> diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> 3,4c3,4
> < /* { dg-require-effective-target powerpc_p8vector_ok } */
> < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> ---
>> /* { dg-require-effective-target powerpc_p9vector_ok } */
>> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> 12c12
> < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 23d22
> < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 37c36
> < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
>
> So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
> between the two architectures. I then wrote a script to compile the
> CPU specific test on Power 8, Power 9 and Power 10 architectures and
> then grep for the above list of instructions. If I run the scrip on P8
> BE and LE I get>
>
> Power 8 BE Power 8 LE Power 9 LE Power 9 BE Power 10 LE*
> (makalu-lp1) (genoa) (marlin) (nilram) (ltcd97-lp3)
> instruction count count count count count
> vperm 1 1 0 0 0
> vpermr 0 0 0 0 0
> xxpermr 0 0 1 0 1
> xvmsubadp 1 0 1 1 1
> xvmsubmdp 0 1 0 0 0
> xvsubdp 1 1 1 1 1
>
Thanks for looking into this and making this statistics.
Is there a typo for column nilram? Otherwise, the below insn check
/* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
would fail there.
>
> From the diff we see
>
> { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }
>
> This test picks up the correct subtraction instruction for LE versus BE
> so this "masks" the LE/BE difference. I changed the check in vsx-
> vector-6-func-3op.c to match. This eliminates the LE and BE checks and
> reduces the number of specific checks.
OK, nice.
>
> In vsx-vector-6-func-3op.c The new test checks the counts for
> xxpermdi, which the original test does not check. The check for
> xxpermdi are not needed. They are not directly related to the builtin
> tests. I removed them.
OK.
>
> Looking at the LE/BE checks in the other test file vsx-vector-6-func-
> 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
> the original test. The functions where these instructions are used get
> inlined. On LE, the binary instructions show up in the inlined code as
> well as what appears to be the binary for the original, non-inlined
> function. Best I can see, the binary for the original function is dead
> code. I don't see any calls to it. Seems like it shouldn't be there
> as it would make the binary smaller. On BE, I don't see the binary for
> the original non-inlined function.
>
> I had played with putting -Wno-inline on the command line but that
> didn't seem to make any difference. However, you suggestion of
> __attribute__ ((noipa)) does prevent the inlining and we don't get the
> second copy of the instructions showing up. The inlining eliminated the
> LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.
-Winline is a option for warning: "Warn if a function that is declared
as inline cannot be inlined.", I think what you wanted is -fno-inline,
and it's good to know noipa helps here.
>
> The instruction count test for xxlor in vsx-vector-6-func-2lop.c
> differs on LE and BE vsx-vector-6-func-2op.c. I believe the
> instruction is used with loads to reorder the data. I don't see anyway
> to get around the extra xxlor instructions and verify the vec_or
> builtin test generates the instruction.
>
OK, I'm still curious how the loads cause the difference.
> I was able to eliminate all of the LE/BE qualifiers in the instruction
> counts with the exception of xxlor. By using the same checks that look
> for multiple versions of xvmsumb*, as was done in the original test, we
> can also eliminate LE/BE specific tests and account for different
> instructions across CPU versions. We could go back to checking for
> specific instructions being generated on Power 8, Power 9, Power 10 if
> you prefer not using checks that cover multiple flavors of a given
> instruction across different CPU types.
>
> FYI, I eliminated the function call to do the various tests. Instead,
> I modified the macro to generate a function call to do the test and
> check the results.
Got it, I'll review the latest revision soon, thanks.
BR,
Kewen
next prev parent reply other threads:[~2023-06-30 3:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 20:46 Carl Love
2023-06-19 7:17 ` Kewen.Lin
2023-06-21 22:42 ` Carl Love
2023-06-28 8:35 ` Kewen.Lin
2023-06-29 21:36 ` Carl Love
2023-06-30 3:37 ` Kewen.Lin [this message]
2023-06-30 22:20 ` Carl Love
2023-06-30 23:50 ` Carl Love
2023-07-01 0:03 ` Peter Bergner
2023-06-30 23:59 ` Peter Bergner
2023-07-03 15:57 ` Carl Love
2023-07-04 2:08 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4614546-8b2c-895e-472a-0cf6818079ec@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=bergner@vnet.ibm.com \
--cc=cel@us.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).