From: Christophe Lyon <christophe.lyon.oss@gmail.com>
To: Jonathan Wright <Jonathan.Wright@arm.com>
Cc: Richard Sandiford <Richard.Sandiford@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification
Date: Tue, 3 Aug 2021 11:36:00 +0200 [thread overview]
Message-ID: <CAKhMtSKXmW9q5Cr7OftprtgvHwEf-WABKrfuXUAWf3SzF-=t4A@mail.gmail.com> (raw)
In-Reply-To: <DB9PR08MB695963BBBA1C7673F35C8F49EB129@DB9PR08MB6959.eurprd08.prod.outlook.com>
Hi,
Since the arm-linux toolchain build has been fixed, I have noticed
additional failures on armeb:
gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times
vdup.32\\tq[0-9]+, r[0-9]+ 4
gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times
vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
I don't see them mentioned in this thread though?
Can you check?
Thanks
Christophe
On Thu, Jul 15, 2021 at 3:07 PM Jonathan Wright <Jonathan.Wright@arm.com>
wrote:
> Ah, yes - those test results should have only been changed for little
> endian.
>
> I've submitted a patch to the list restoring the original expected results
> for big
> endian.
>
> Thanks,
> Jonathan
> ------------------------------
> *From:* Christophe Lyon <christophe.lyon.oss@gmail.com>
> *Sent:* 15 July 2021 10:09
> *To:* Richard Sandiford <Richard.Sandiford@arm.com>; Jonathan Wright <
> Jonathan.Wright@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>;
> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> *Subject:* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification
>
>
>
> On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> Jonathan Wright <Jonathan.Wright@arm.com> writes:
> > Hi,
> >
> > Version 2 of this patch adds more code generation tests to show the
> > benefit of this RTL simplification as well as adding a new helper
> function
> > 'rtx_vec_series_p' to reduce code duplication.
> >
> > Patch tested as version 1 - ok for master?
>
> Sorry for the slow reply.
>
> > Regression tested and bootstrapped on aarch64-none-linux-gnu,
> > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> > aarch64_be-none-linux-gnu - no issues.
>
> I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.
>
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
> 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode
> op0_mode, int in_dest,
> > - 1,
> > 0));
> > break;
> > + case VEC_SELECT:
> > + {
> > + rtx trueop0 = XEXP (x, 0);
> > + mode = GET_MODE (trueop0);
> > + rtx trueop1 = XEXP (x, 1);
> > + int nunits;
> > + /* If we select a low-part subreg, return that. */
> > + if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> > + && targetm.can_change_mode_class (mode, GET_MODE (x),
> ALL_REGS))
> > + {
> > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0)
> : 0;
> > +
> > + if (rtx_vec_series_p (trueop1, offset))
> > + {
> > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> > + if (new_rtx != NULL_RTX)
> > + return new_rtx;
> > + }
> > + }
> > + }
>
> Since this occurs three times, I think it would be worth having
> a new predicate:
>
> /* Return true if, for all OP of mode OP_MODE:
>
> (vec_select:RESULT_MODE OP SEL)
>
> is equivalent to the lowpart RESULT_MODE of OP. */
>
> bool
> vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx
> sel)
>
> containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
> and rtx_vec_series_p tests.
>
> I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
> is in emit-rtl.c.
>
> > diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md
> > index
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
> 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -1884,15 +1884,16 @@
> > )
> >
> > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64"
> > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m")))]
> > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand"
> "r,m,m,w")))]
> > ""
> > "@
> > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
> > ldr<SHORT:size>\t%w0, %1
> > - ldr\t%<SHORT:size>0, %1"
> > - [(set_attr "type" "logic_imm,load_4,f_loads")
> > - (set_attr "arch" "*,*,fp")]
> > + ldr\t%<SHORT:size>0, %1
> > + umov\t%w0, %1.<SHORT:size>[0]"
> > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> > + (set_attr "arch" "*,*,fp,fp")]
>
> FTR (just to show I thought about it): I don't know whether the umov
> can really be considered an fp operation rather than a simd operation,
> but since we don't support fp without simd, this is already a distinction
> without a difference. So the pattern is IMO OK as-is.
>
> > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> > index
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
> 100644
> > --- a/gcc/config/arm/vfp.md
> > +++ b/gcc/config/arm/vfp.md
> > @@ -224,7 +224,7 @@
> > ;; problems because small constants get converted into adds.
> > (define_insn "*arm_movsi_vfp"
> > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m
> ,*t,r,*t,*t, *Uv")
> > - (match_operand:SI 1 "general_operand" "rk,
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> > + (match_operand:SI 1 "general_operand" "rk,
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
> > "TARGET_ARM && TARGET_HARD_FLOAT
> > && ( s_register_operand (operands[0], SImode)
> > || s_register_operand (operands[1], SImode))"
>
> I'll assume that an Arm maintainer would have spoken up by now if
> they didn't want this for some reason.
>
> > diff --git a/gcc/rtl.c b/gcc/rtl.c
> > index
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
> 100644
> > --- a/gcc/rtl.c
> > +++ b/gcc/rtl.c
> > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
> > }
> > }
> >
> > +/* Return true if element-selection indices in VEC are in series. */
> > +
> > +bool
> > +rtx_vec_series_p (const_rtx vec, int start)
>
> I think rtvec_series_p would be better, for consistency with
> rtvec_all_equal_p. Also, let's generalise it to:
>
> /* Return true if VEC contains a linear series of integers
> { START, START+1, START+2, ... }. */
>
> bool
> rtvec_series_p (rtvec vec, int start)
> {
> }
>
> > +{
> > + for (int i = 0; i < XVECLEN (vec, 0); i++)
> > + {
> > + if (i + start != INTVAL (XVECEXP (vec, 0, i)))
> > + return false;
> > + }
> > + return true;
>
> With the general definition I think this should be:
>
> for (int i = 0; i < GET_NUM_ELEM (vec); i++)
> {
> rtx x = RTVEC_ELT (vec, i);
> if (!CONST_INT_P (x) || INTVAL (x) != i + start)
> return false;
> }
>
> Then pass XVEC (sel, 0) to the function, instead of just sel.
>
> OK with those changes, thanks.
>
>
> Hi,
>
> Some of the updated tests fail on aarch64_be:
> gcc.target/aarch64/sve/extract_1.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 2
> gcc.target/aarch64/sve/extract_1.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 2
> gcc.target/aarch64/sve/extract_2.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 2
> gcc.target/aarch64/sve/extract_2.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 2
> gcc.target/aarch64/sve/extract_3.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 5
> gcc.target/aarch64/sve/extract_3.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 5
> gcc.target/aarch64/sve/extract_4.c scan-assembler-times
> \\tfmov\\tw[0-9]+, s[0-9]\\n 6
> gcc.target/aarch64/sve/extract_4.c scan-assembler-times
> \\tfmov\\tx[0-9]+, d[0-9]\\n 6
>
> Can you check?
>
> Thanks,
>
> Christophe
>
>
>
>
> Richard
>
>
prev parent reply other threads:[~2021-08-03 9:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-02 9:53 [PATCH] " Jonathan Wright
2021-07-07 13:35 ` [PATCH V2] " Jonathan Wright
2021-07-12 15:30 ` Richard Sandiford
2021-07-15 9:09 ` Christophe Lyon
2021-07-15 13:06 ` Jonathan Wright
2021-08-03 9:36 ` Christophe Lyon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKhMtSKXmW9q5Cr7OftprtgvHwEf-WABKrfuXUAWf3SzF-=t4A@mail.gmail.com' \
--to=christophe.lyon.oss@gmail.com \
--cc=Jonathan.Wright@arm.com \
--cc=Kyrylo.Tkachov@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).