public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Tamar Christina <tamar.christina@arm.com>,
	 nd@arm.com,  rguenther@suse.de
Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Date: Tue, 01 Nov 2022 14:25:21 +0000	[thread overview]
Message-ID: <mptr0ym21m6.fsf@arm.com> (raw)
In-Reply-To: <Y1+4Nu1ryQIKoOQA@arm.com> (Tamar Christina via Gcc-patches's message of "Mon, 31 Oct 2022 11:57:42 +0000")

Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi All,
>
> The current vector extract pattern can only extract from a vector when the
> position to extract is a multiple of the vector bitsize as a whole.
>
> That means extract something like a V2SI from a V4SI vector from position 32
> isn't possible as 32 is not a multiple of 64.  Ideally this optab should have
> worked on multiple of the element size, but too many targets rely on this
> semantic now.
>
> So instead add a new case which allows any extraction as long as the bit pos
> is a multiple of the element size.  We use a VEC_PERM to shuffle the elements
> into the bottom parts of the vector and then use a subreg to extract the values
> out.  This now allows various vector operations that before were being
> decomposed into very inefficient scalar operations.
>
> NOTE: I added 3 testcases, I only fixed the 3rd one.
>
> The 1st one missed because we don't optimize VEC_PERM expressions into
> bitfields.  The 2nd one is missed because extract_bit_field only works on
> vector modes.  In this case the intermediate extract is DImode.
>
> On targets where the scalar mode is tiable to vector modes the extract should
> work fine.
>
> However I ran out of time to fix the first two and so will do so in GCC 14.
> For now this catches the case that my pattern now introduces more easily.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* expmed.cc (extract_bit_field_1): Add support for vector element
> 	extracts.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/ext_1.c: New.
>
> --- inline copy of patch -- 
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
>  	      return target;
>  	    }
>  	}
> +      else if (!known_eq (bitnum, 0U)
> +	       && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
> +	{
> +	  /* The encoding has a single stepped pattern.  */
> +	  poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
> +	  int nelts = nunits.to_constant ();
> +	  vec_perm_builder sel (nunits, nelts, 1);
> +	  int delta = -pos.to_constant ();
> +	  for (int i = 0; i < nelts; ++i)
> +	    sel.quick_push ((i - delta) % nelts);
> +	  vec_perm_indices indices (sel, 1, nunits);

Thanks for doing this, looks good.  But I don't think the to_constant
calls are safe.  new_mode and pos could in principle be nonconstant.

To build a stepped pattern, we just need:

    vec_perm_builder sel (nunits, 1, 3);

and then push pos, pos + 1, and pos + 2 to it.  There's no need to
clamp the position to nelts, it happens automatically.

> +
> +	  if (can_vec_perm_const_p (new_mode, new_mode, indices, false))
> +	    {
> +	      class expand_operand ops[4];
> +	      machine_mode outermode = new_mode;
> +	      machine_mode innermode = tmode;
> +	      enum insn_code icode
> +		= direct_optab_handler (vec_perm_optab, outermode);
> +	      target = gen_reg_rtx (outermode);
> +	      if (icode != CODE_FOR_nothing)
> +		{
> +		  rtx sel = vec_perm_indices_to_rtx (outermode, indices);
> +		  create_output_operand (&ops[0], target, outermode);
> +		  ops[0].target = 1;
> +		  create_input_operand (&ops[1], op0, outermode);
> +		  create_input_operand (&ops[2], op0, outermode);
> +		  create_input_operand (&ops[3], sel, outermode);

I think this should be GET_MODE (sel).  Looks like the current
version would ICE for float vectors.  That said...

> +		  if (maybe_expand_insn (icode, 4, ops))
> +		    return simplify_gen_subreg (innermode, target, outermode, 0);
> +		}
> +	      else if (targetm.vectorize.vec_perm_const != NULL)
> +		{
> +		  if (targetm.vectorize.vec_perm_const (outermode, outermode,
> +							target, op0, op0, indices))
> +		    return simplify_gen_subreg (innermode, target, outermode, 0);
> +		}

...can we use expand_vec_perm_const here?  It will try the constant
expansion first, which is the preferred order.  It also has a few
variations up its sleeve.

Thanks,
Richard


> +	    }
> +	}
>      }
>  
>    /* See if we can get a better vector mode before extracting.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <string.h>
> +
> +typedef unsigned int v4si __attribute__((vector_size (16)));
> +typedef unsigned int v2si __attribute__((vector_size (8)));
> +
> +/*
> +** extract: { xfail *-*-* }
> +**	ext	v0.16b, v0.16b, v0.16b, #4
> +**	ret
> +*/
> +v2si extract (v4si x)
> +{
> +    v2si res = {x[1], x[2]};
> +    return res;
> +}
> +
> +/*
> +** extract1: { xfail *-*-* }
> +**	ext	v0.16b, v0.16b, v0.16b, #4
> +**	ret
> +*/
> +v2si extract1 (v4si x)
> +{
> +    v2si res;
> +    memcpy (&res, ((int*)&x)+1, sizeof(res));
> +    return res;
> +}
> +
> +typedef struct cast {
> +  int a;
> +  v2si b __attribute__((packed));
> +} cast_t;
> +
> +typedef union Data {
> +   v4si x;
> +   cast_t y;
> +} data;  
> +
> +/*
> +** extract2:
> +**	ext	v0.16b, v0.16b, v0.16b, #4
> +**	ret
> +*/
> +v2si extract2 (v4si x)
> +{
> +    data d;
> +    d.x = x;
> +    return d.y.b;
> +}
> +

  parent reply	other threads:[~2022-11-01 14:25 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 11:56 [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Tamar Christina
2022-10-31 11:57 ` [PATCH 2/8]middle-end: Recognize scalar widening reductions Tamar Christina
2022-10-31 21:42   ` Jeff Law
2022-11-07 13:21   ` Richard Biener
2022-10-31 11:57 ` [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector Tamar Christina
2022-10-31 21:44   ` Jeff Law
2022-11-01 14:25   ` Richard Sandiford [this message]
2022-11-11 14:33     ` Tamar Christina
2022-11-15  8:35       ` Hongtao Liu
2022-11-15  8:51         ` Tamar Christina
2022-11-15  9:37           ` Hongtao Liu
2022-11-15 10:00             ` Tamar Christina
2022-11-15 17:39               ` Richard Sandiford
2022-11-17  8:04                 ` Hongtao Liu
2022-11-17  9:39                   ` Richard Sandiford
2022-11-17 10:20                     ` Hongtao Liu
2022-11-17 13:59                       ` Richard Sandiford
2022-11-18  2:31                         ` Hongtao Liu
2022-11-18  9:16                           ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns Tamar Christina
2022-11-01 14:41   ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable Tamar Christina
2022-11-01 14:58   ` Richard Sandiford
2022-11-01 15:11     ` Tamar Christina
2022-11-11 14:39     ` Tamar Christina
2022-11-22 16:01       ` Tamar Christina
2022-11-30  4:26         ` Tamar Christina
2022-12-06 10:28       ` Richard Sandiford
2022-12-06 10:58         ` Tamar Christina
2022-12-06 11:05           ` Richard Sandiford
2022-10-31 11:59 ` [PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL Tamar Christina
2022-10-31 11:59 ` [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones Tamar Christina
2022-11-30  4:28   ` Tamar Christina
2022-12-06 15:59   ` Richard Sandiford
2022-10-31 12:00 ` [PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side Tamar Christina
2022-11-01 15:04   ` Richard Sandiford
2022-11-01 15:20     ` Tamar Christina
2022-10-31 21:41 ` [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Jeff Law
2022-11-05 11:32 ` Richard Biener
2022-11-07  7:16   ` Tamar Christina
2022-11-07 10:17     ` Richard Biener
2022-11-07 11:00       ` Tamar Christina
2022-11-07 11:22         ` Richard Biener
2022-11-07 11:56           ` Tamar Christina
2022-11-22 10:36             ` Richard Sandiford
2022-11-22 10:58               ` Richard Biener
2022-11-22 11:02                 ` Tamar Christina
2022-11-22 11:06                   ` Richard Sandiford
2022-11-22 11:08                     ` Richard Biener
2022-11-22 14:33                       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptr0ym21m6.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).