From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Tamar Christina <tamar.christina@arm.com>,
nd@arm.com, rguenther@suse.de
Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Date: Tue, 01 Nov 2022 14:25:21 +0000 [thread overview]
Message-ID: <mptr0ym21m6.fsf@arm.com> (raw)
In-Reply-To: <Y1+4Nu1ryQIKoOQA@arm.com> (Tamar Christina via Gcc-patches's message of "Mon, 31 Oct 2022 11:57:42 +0000")
Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi All,
>
> The current vector extract pattern can only extract from a vector when the
> position to extract is a multiple of the vector bitsize as a whole.
>
> That means extract something like a V2SI from a V4SI vector from position 32
> isn't possible as 32 is not a multiple of 64. Ideally this optab should have
> worked on multiple of the element size, but too many targets rely on this
> semantic now.
>
> So instead add a new case which allows any extraction as long as the bit pos
> is a multiple of the element size. We use a VEC_PERM to shuffle the elements
> into the bottom parts of the vector and then use a subreg to extract the values
> out. This now allows various vector operations that before were being
> decomposed into very inefficient scalar operations.
>
> NOTE: I added 3 testcases, I only fixed the 3rd one.
>
> The 1st one missed because we don't optimize VEC_PERM expressions into
> bitfields. The 2nd one is missed because extract_bit_field only works on
> vector modes. In this case the intermediate extract is DImode.
>
> On targets where the scalar mode is tiable to vector modes the extract should
> work fine.
>
> However I ran out of time to fix the first two and so will do so in GCC 14.
> For now this catches the case that my pattern now introduces more easily.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * expmed.cc (extract_bit_field_1): Add support for vector element
> extracts.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ext_1.c: New.
>
> --- inline copy of patch --
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
> return target;
> }
> }
> + else if (!known_eq (bitnum, 0U)
> + && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
> + {
> + /* The encoding has a single stepped pattern. */
> + poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
> + int nelts = nunits.to_constant ();
> + vec_perm_builder sel (nunits, nelts, 1);
> + int delta = -pos.to_constant ();
> + for (int i = 0; i < nelts; ++i)
> + sel.quick_push ((i - delta) % nelts);
> + vec_perm_indices indices (sel, 1, nunits);
Thanks for doing this, looks good. But I don't think the to_constant
calls are safe. new_mode and pos could in principle be nonconstant.
To build a stepped pattern, we just need:
vec_perm_builder sel (nunits, 1, 3);
and then push pos, pos + 1, and pos + 2 to it. There's no need to
clamp the position to nelts, it happens automatically.
> +
> + if (can_vec_perm_const_p (new_mode, new_mode, indices, false))
> + {
> + class expand_operand ops[4];
> + machine_mode outermode = new_mode;
> + machine_mode innermode = tmode;
> + enum insn_code icode
> + = direct_optab_handler (vec_perm_optab, outermode);
> + target = gen_reg_rtx (outermode);
> + if (icode != CODE_FOR_nothing)
> + {
> + rtx sel = vec_perm_indices_to_rtx (outermode, indices);
> + create_output_operand (&ops[0], target, outermode);
> + ops[0].target = 1;
> + create_input_operand (&ops[1], op0, outermode);
> + create_input_operand (&ops[2], op0, outermode);
> + create_input_operand (&ops[3], sel, outermode);
I think this should be GET_MODE (sel). Looks like the current
version would ICE for float vectors. That said...
> + if (maybe_expand_insn (icode, 4, ops))
> + return simplify_gen_subreg (innermode, target, outermode, 0);
> + }
> + else if (targetm.vectorize.vec_perm_const != NULL)
> + {
> + if (targetm.vectorize.vec_perm_const (outermode, outermode,
> + target, op0, op0, indices))
> + return simplify_gen_subreg (innermode, target, outermode, 0);
> + }
...can we use expand_vec_perm_const here? It will try the constant
expansion first, which is the preferred order. It also has a few
variations up its sleeve.
Thanks,
Richard
> + }
> + }
> }
>
> /* See if we can get a better vector mode before extracting. */
> diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <string.h>
> +
> +typedef unsigned int v4si __attribute__((vector_size (16)));
> +typedef unsigned int v2si __attribute__((vector_size (8)));
> +
> +/*
> +** extract: { xfail *-*-* }
> +** ext v0.16b, v0.16b, v0.16b, #4
> +** ret
> +*/
> +v2si extract (v4si x)
> +{
> + v2si res = {x[1], x[2]};
> + return res;
> +}
> +
> +/*
> +** extract1: { xfail *-*-* }
> +** ext v0.16b, v0.16b, v0.16b, #4
> +** ret
> +*/
> +v2si extract1 (v4si x)
> +{
> + v2si res;
> + memcpy (&res, ((int*)&x)+1, sizeof(res));
> + return res;
> +}
> +
> +typedef struct cast {
> + int a;
> + v2si b __attribute__((packed));
> +} cast_t;
> +
> +typedef union Data {
> + v4si x;
> + cast_t y;
> +} data;
> +
> +/*
> +** extract2:
> +** ext v0.16b, v0.16b, v0.16b, #4
> +** ret
> +*/
> +v2si extract2 (v4si x)
> +{
> + data d;
> + d.x = x;
> + return d.y.b;
> +}
> +
next prev parent reply other threads:[~2022-11-01 14:25 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 11:56 [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Tamar Christina
2022-10-31 11:57 ` [PATCH 2/8]middle-end: Recognize scalar widening reductions Tamar Christina
2022-10-31 21:42 ` Jeff Law
2022-11-07 13:21 ` Richard Biener
2022-10-31 11:57 ` [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector Tamar Christina
2022-10-31 21:44 ` Jeff Law
2022-11-01 14:25 ` Richard Sandiford [this message]
2022-11-11 14:33 ` Tamar Christina
2022-11-15 8:35 ` Hongtao Liu
2022-11-15 8:51 ` Tamar Christina
2022-11-15 9:37 ` Hongtao Liu
2022-11-15 10:00 ` Tamar Christina
2022-11-15 17:39 ` Richard Sandiford
2022-11-17 8:04 ` Hongtao Liu
2022-11-17 9:39 ` Richard Sandiford
2022-11-17 10:20 ` Hongtao Liu
2022-11-17 13:59 ` Richard Sandiford
2022-11-18 2:31 ` Hongtao Liu
2022-11-18 9:16 ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns Tamar Christina
2022-11-01 14:41 ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable Tamar Christina
2022-11-01 14:58 ` Richard Sandiford
2022-11-01 15:11 ` Tamar Christina
2022-11-11 14:39 ` Tamar Christina
2022-11-22 16:01 ` Tamar Christina
2022-11-30 4:26 ` Tamar Christina
2022-12-06 10:28 ` Richard Sandiford
2022-12-06 10:58 ` Tamar Christina
2022-12-06 11:05 ` Richard Sandiford
2022-10-31 11:59 ` [PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL Tamar Christina
2022-10-31 11:59 ` [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones Tamar Christina
2022-11-30 4:28 ` Tamar Christina
2022-12-06 15:59 ` Richard Sandiford
2022-10-31 12:00 ` [PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side Tamar Christina
2022-11-01 15:04 ` Richard Sandiford
2022-11-01 15:20 ` Tamar Christina
2022-10-31 21:41 ` [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Jeff Law
2022-11-05 11:32 ` Richard Biener
2022-11-07 7:16 ` Tamar Christina
2022-11-07 10:17 ` Richard Biener
2022-11-07 11:00 ` Tamar Christina
2022-11-07 11:22 ` Richard Biener
2022-11-07 11:56 ` Tamar Christina
2022-11-22 10:36 ` Richard Sandiford
2022-11-22 10:58 ` Richard Biener
2022-11-22 11:02 ` Tamar Christina
2022-11-22 11:06 ` Richard Sandiford
2022-11-22 11:08 ` Richard Biener
2022-11-22 14:33 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptr0ym21m6.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
--cc=tamar.christina@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).