[PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <tamar.christina@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com, rguenther@suse.de, jeffreyalaw@gmail.com
Subject: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Date: Mon, 31 Oct 2022 11:57:42 +0000	[thread overview]
Message-ID: <Y1+4Nu1ryQIKoOQA@arm.com> (raw)
In-Reply-To: <patch-16240-tamar@arm.com>

[-- Attachment #1: Type: text/plain, Size: 4657 bytes --]

Hi All,

The current vector extract pattern can only extract from a vector when the
position to extract is a multiple of the vector bitsize as a whole.

That means extract something like a V2SI from a V4SI vector from position 32
isn't possible as 32 is not a multiple of 64.  Ideally this optab should have
worked on multiple of the element size, but too many targets rely on this
semantic now.

So instead add a new case which allows any extraction as long as the bit pos
is a multiple of the element size.  We use a VEC_PERM to shuffle the elements
into the bottom parts of the vector and then use a subreg to extract the values
out.  This now allows various vector operations that before were being
decomposed into very inefficient scalar operations.

NOTE: I added 3 testcases, I only fixed the 3rd one.

The 1st one missed because we don't optimize VEC_PERM expressions into
bitfields.  The 2nd one is missed because extract_bit_field only works on
vector modes.  In this case the intermediate extract is DImode.

On targets where the scalar mode is tiable to vector modes the extract should
work fine.

However I ran out of time to fix the first two and so will do so in GCC 14.
For now this catches the case that my pattern now introduces more easily.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* expmed.cc (extract_bit_field_1): Add support for vector element
	extracts.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ext_1.c: New.

--- inline copy of patch -- 
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	      return target;
 	    }
 	}
+      else if (!known_eq (bitnum, 0U)
+	       && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
+	{
+	  /* The encoding has a single stepped pattern.  */
+	  poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
+	  int nelts = nunits.to_constant ();
+	  vec_perm_builder sel (nunits, nelts, 1);
+	  int delta = -pos.to_constant ();
+	  for (int i = 0; i < nelts; ++i)
+	    sel.quick_push ((i - delta) % nelts);
+	  vec_perm_indices indices (sel, 1, nunits);
+
+	  if (can_vec_perm_const_p (new_mode, new_mode, indices, false))
+	    {
+	      class expand_operand ops[4];
+	      machine_mode outermode = new_mode;
+	      machine_mode innermode = tmode;
+	      enum insn_code icode
+		= direct_optab_handler (vec_perm_optab, outermode);
+	      target = gen_reg_rtx (outermode);
+	      if (icode != CODE_FOR_nothing)
+		{
+		  rtx sel = vec_perm_indices_to_rtx (outermode, indices);
+		  create_output_operand (&ops[0], target, outermode);
+		  ops[0].target = 1;
+		  create_input_operand (&ops[1], op0, outermode);
+		  create_input_operand (&ops[2], op0, outermode);
+		  create_input_operand (&ops[3], sel, outermode);
+		  if (maybe_expand_insn (icode, 4, ops))
+		    return simplify_gen_subreg (innermode, target, outermode, 0);
+		}
+	      else if (targetm.vectorize.vec_perm_const != NULL)
+		{
+		  if (targetm.vectorize.vec_perm_const (outermode, outermode,
+							target, op0, op0, indices))
+		    return simplify_gen_subreg (innermode, target, outermode, 0);
+		}
+	    }
+	}
     }
 
   /* See if we can get a better vector mode before extracting.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <string.h>
+
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v2si __attribute__((vector_size (8)));
+
+/*
+** extract: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract (v4si x)
+{
+    v2si res = {x[1], x[2]};
+    return res;
+}
+
+/*
+** extract1: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract1 (v4si x)
+{
+    v2si res;
+    memcpy (&res, ((int*)&x)+1, sizeof(res));
+    return res;
+}
+
+typedef struct cast {
+  int a;
+  v2si b __attribute__((packed));
+} cast_t;
+
+typedef union Data {
+   v4si x;
+   cast_t y;
+} data;  
+
+/*
+** extract2:
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract2 (v4si x)
+{
+    data d;
+    d.x = x;
+    return d.y.b;
+}
+




-- 

[-- Attachment #2: rb16242.patch --]
[-- Type: text/plain, Size: 3087 bytes --]

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	      return target;
 	    }
 	}
+      else if (!known_eq (bitnum, 0U)
+	       && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
+	{
+	  /* The encoding has a single stepped pattern.  */
+	  poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
+	  int nelts = nunits.to_constant ();
+	  vec_perm_builder sel (nunits, nelts, 1);
+	  int delta = -pos.to_constant ();
+	  for (int i = 0; i < nelts; ++i)
+	    sel.quick_push ((i - delta) % nelts);
+	  vec_perm_indices indices (sel, 1, nunits);
+
+	  if (can_vec_perm_const_p (new_mode, new_mode, indices, false))
+	    {
+	      class expand_operand ops[4];
+	      machine_mode outermode = new_mode;
+	      machine_mode innermode = tmode;
+	      enum insn_code icode
+		= direct_optab_handler (vec_perm_optab, outermode);
+	      target = gen_reg_rtx (outermode);
+	      if (icode != CODE_FOR_nothing)
+		{
+		  rtx sel = vec_perm_indices_to_rtx (outermode, indices);
+		  create_output_operand (&ops[0], target, outermode);
+		  ops[0].target = 1;
+		  create_input_operand (&ops[1], op0, outermode);
+		  create_input_operand (&ops[2], op0, outermode);
+		  create_input_operand (&ops[3], sel, outermode);
+		  if (maybe_expand_insn (icode, 4, ops))
+		    return simplify_gen_subreg (innermode, target, outermode, 0);
+		}
+	      else if (targetm.vectorize.vec_perm_const != NULL)
+		{
+		  if (targetm.vectorize.vec_perm_const (outermode, outermode,
+							target, op0, op0, indices))
+		    return simplify_gen_subreg (innermode, target, outermode, 0);
+		}
+	    }
+	}
     }
 
   /* See if we can get a better vector mode before extracting.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <string.h>
+
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v2si __attribute__((vector_size (8)));
+
+/*
+** extract: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract (v4si x)
+{
+    v2si res = {x[1], x[2]};
+    return res;
+}
+
+/*
+** extract1: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract1 (v4si x)
+{
+    v2si res;
+    memcpy (&res, ((int*)&x)+1, sizeof(res));
+    return res;
+}
+
+typedef struct cast {
+  int a;
+  v2si b __attribute__((packed));
+} cast_t;
+
+typedef union Data {
+   v4si x;
+   cast_t y;
+} data;  
+
+/*
+** extract2:
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract2 (v4si x)
+{
+    data d;
+    d.x = x;
+    return d.y.b;
+}
+

next prev parent reply	other threads:[~2022-10-31 11:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 11:56 [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Tamar Christina
2022-10-31 11:57 ` [PATCH 2/8]middle-end: Recognize scalar widening reductions Tamar Christina
2022-10-31 21:42   ` Jeff Law
2022-11-07 13:21   ` Richard Biener
2022-10-31 11:57 ` Tamar Christina [this message]
2022-10-31 21:44   ` [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector Jeff Law
2022-11-01 14:25   ` Richard Sandiford
2022-11-11 14:33     ` Tamar Christina
2022-11-15  8:35       ` Hongtao Liu
2022-11-15  8:51         ` Tamar Christina
2022-11-15  9:37           ` Hongtao Liu
2022-11-15 10:00             ` Tamar Christina
2022-11-15 17:39               ` Richard Sandiford
2022-11-17  8:04                 ` Hongtao Liu
2022-11-17  9:39                   ` Richard Sandiford
2022-11-17 10:20                     ` Hongtao Liu
2022-11-17 13:59                       ` Richard Sandiford
2022-11-18  2:31                         ` Hongtao Liu
2022-11-18  9:16                           ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns Tamar Christina
2022-11-01 14:41   ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable Tamar Christina
2022-11-01 14:58   ` Richard Sandiford
2022-11-01 15:11     ` Tamar Christina
2022-11-11 14:39     ` Tamar Christina
2022-11-22 16:01       ` Tamar Christina
2022-11-30  4:26         ` Tamar Christina
2022-12-06 10:28       ` Richard Sandiford
2022-12-06 10:58         ` Tamar Christina
2022-12-06 11:05           ` Richard Sandiford
2022-10-31 11:59 ` [PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL Tamar Christina
2022-10-31 11:59 ` [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones Tamar Christina
2022-11-30  4:28   ` Tamar Christina
2022-12-06 15:59   ` Richard Sandiford
2022-10-31 12:00 ` [PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side Tamar Christina
2022-11-01 15:04   ` Richard Sandiford
2022-11-01 15:20     ` Tamar Christina
2022-10-31 21:41 ` [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Jeff Law
2022-11-05 11:32 ` Richard Biener
2022-11-07  7:16   ` Tamar Christina
2022-11-07 10:17     ` Richard Biener
2022-11-07 11:00       ` Tamar Christina
2022-11-07 11:22         ` Richard Biener
2022-11-07 11:56           ` Tamar Christina
2022-11-22 10:36             ` Richard Sandiford
2022-11-22 10:58               ` Richard Biener
2022-11-22 11:02                 ` Tamar Christina
2022-11-22 11:06                   ` Richard Sandiford
2022-11-22 11:08                     ` Richard Biener
2022-11-22 14:33                       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1+4Nu1ryQIKoOQA@arm.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).