RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>,
	Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: nd <nd@arm.com>, "rguenther@suse.de" <rguenther@suse.de>
Subject: RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector
Date: Fri, 11 Nov 2022 14:33:54 +0000	[thread overview]
Message-ID: <VI1PR08MB532500008BED94FC1630BE2DFF009@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <mptr0ym21m6.fsf@arm.com>

[-- Attachment #1: Type: text/plain, Size: 4249 bytes --]

Hi,

> 
> ...can we use expand_vec_perm_const here?  It will try the constant
> expansion first, which is the preferred order.  It also has a few variations up
> its sleeve.
> 

We can, however it this function seems to be incorrectly assuming it can always
Convert the input mode to a QI vector mode.  When I started using it we got a number
of miscompilations in the AArch64 codegen.  This had the knock-on effect of uncovering
bugs in both the AArch64 backend and i386.  I'll send patched out for those separately.

For now here's the new patch using that hook and updating the permute expansion code:

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* expmed.cc (extract_bit_field_1): Add support for vector element
	extracts.
	* optabs.cc (expand_vec_perm_const): Add checks before converting
	permute to QImode fallback.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/ext_1.c: New.

--- inline copy of patch ---

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index bab020c07222afa38305ef8d7333f271b1965b78..7d38045ae525c8a4665a0c1384fc515e4de88c67 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1718,6 +1718,21 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	      return target;
 	    }
 	}
+      else if (!known_eq (bitnum, 0U)
+	       && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
+	{
+	  /* The encoding has a single stepped pattern.  */
+	  poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
+	  vec_perm_builder sel (nunits, 1, 3);
+	  sel.quick_push (pos);
+	  sel.quick_push (pos + 1);
+	  sel.quick_push (pos + 2);
+
+	  rtx res
+	    = expand_vec_perm_const (new_mode, op0, op0, sel, new_mode, NULL);
+	  if (res)
+	    return simplify_gen_subreg (tmode, res, new_mode, 0);
+	}
     }
 
   /* See if we can get a better vector mode before extracting.  */
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
       if (targetm.vectorize.vec_perm_const != NULL
+	  && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
 	  && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, v0_qi,
 					       v1_qi, qimode_indices))
 	return gen_lowpart (mode, target_qi);
@@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
     }
 
   if (qimode != VOIDmode
-      && selector_fits_mode_p (qimode, qimode_indices))
+      && selector_fits_mode_p (qimode, qimode_indices)
+      && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
     {
       icode = direct_optab_handler (vec_perm_optab, qimode);
       if (icode != CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <string.h>
+
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v2si __attribute__((vector_size (8)));
+
+/*
+** extract: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract (v4si x)
+{
+    v2si res = {x[1], x[2]};
+    return res;
+}
+
+/*
+** extract1: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract1 (v4si x)
+{
+    v2si res;
+    memcpy (&res, ((int*)&x)+1, sizeof(res));
+    return res;
+}
+
+typedef struct cast {
+  int a;
+  v2si b __attribute__((packed));
+} cast_t;
+
+typedef union Data {
+   v4si x;
+   cast_t y;
+} data;  
+
+/*
+** extract2:
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract2 (v4si x)
+{
+    data d;
+    d.x = x;
+    return d.y.b;
+}
+

[-- Attachment #2: rb16242.patch --]
[-- Type: application/octet-stream, Size: 3098 bytes --]

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index bab020c07222afa38305ef8d7333f271b1965b78..7d38045ae525c8a4665a0c1384fc515e4de88c67 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -1718,6 +1718,21 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 	      return target;
 	    }
 	}
+      else if (!known_eq (bitnum, 0U)
+	       && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos))
+	{
+	  /* The encoding has a single stepped pattern.  */
+	  poly_uint64 nunits = GET_MODE_NUNITS (new_mode);
+	  vec_perm_builder sel (nunits, 1, 3);
+	  sel.quick_push (pos);
+	  sel.quick_push (pos + 1);
+	  sel.quick_push (pos + 2);
+
+	  rtx res
+	    = expand_vec_perm_const (new_mode, op0, op0, sel, new_mode, NULL);
+	  if (res)
+	    return simplify_gen_subreg (tmode, res, new_mode, 0);
+	}
     }
 
   /* See if we can get a better vector mode before extracting.  */
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
       if (targetm.vectorize.vec_perm_const != NULL
+	  && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
 	  && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, v0_qi,
 					       v1_qi, qimode_indices))
 	return gen_lowpart (mode, target_qi);
@@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
     }
 
   if (qimode != VOIDmode
-      && selector_fits_mode_p (qimode, qimode_indices))
+      && selector_fits_mode_p (qimode, qimode_indices)
+      && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
     {
       icode = direct_optab_handler (vec_perm_optab, qimode);
       if (icode != CODE_FOR_nothing)
diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <string.h>
+
+typedef unsigned int v4si __attribute__((vector_size (16)));
+typedef unsigned int v2si __attribute__((vector_size (8)));
+
+/*
+** extract: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract (v4si x)
+{
+    v2si res = {x[1], x[2]};
+    return res;
+}
+
+/*
+** extract1: { xfail *-*-* }
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract1 (v4si x)
+{
+    v2si res;
+    memcpy (&res, ((int*)&x)+1, sizeof(res));
+    return res;
+}
+
+typedef struct cast {
+  int a;
+  v2si b __attribute__((packed));
+} cast_t;
+
+typedef union Data {
+   v4si x;
+   cast_t y;
+} data;  
+
+/*
+** extract2:
+**	ext	v0.16b, v0.16b, v0.16b, #4
+**	ret
+*/
+v2si extract2 (v4si x)
+{
+    data d;
+    d.x = x;
+    return d.y.b;
+}
+

next prev parent reply	other threads:[~2022-11-11 14:34 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 11:56 [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Tamar Christina
2022-10-31 11:57 ` [PATCH 2/8]middle-end: Recognize scalar widening reductions Tamar Christina
2022-10-31 21:42   ` Jeff Law
2022-11-07 13:21   ` Richard Biener
2022-10-31 11:57 ` [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector Tamar Christina
2022-10-31 21:44   ` Jeff Law
2022-11-01 14:25   ` Richard Sandiford
2022-11-11 14:33     ` Tamar Christina [this message]
2022-11-15  8:35       ` Hongtao Liu
2022-11-15  8:51         ` Tamar Christina
2022-11-15  9:37           ` Hongtao Liu
2022-11-15 10:00             ` Tamar Christina
2022-11-15 17:39               ` Richard Sandiford
2022-11-17  8:04                 ` Hongtao Liu
2022-11-17  9:39                   ` Richard Sandiford
2022-11-17 10:20                     ` Hongtao Liu
2022-11-17 13:59                       ` Richard Sandiford
2022-11-18  2:31                         ` Hongtao Liu
2022-11-18  9:16                           ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 4/8]AArch64 aarch64: Implement widening reduction patterns Tamar Christina
2022-11-01 14:41   ` Richard Sandiford
2022-10-31 11:58 ` [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable Tamar Christina
2022-11-01 14:58   ` Richard Sandiford
2022-11-01 15:11     ` Tamar Christina
2022-11-11 14:39     ` Tamar Christina
2022-11-22 16:01       ` Tamar Christina
2022-11-30  4:26         ` Tamar Christina
2022-12-06 10:28       ` Richard Sandiford
2022-12-06 10:58         ` Tamar Christina
2022-12-06 11:05           ` Richard Sandiford
2022-10-31 11:59 ` [PATCH 6/8]AArch64: Add peephole and scheduling logic for pairwise operations that appear late in RTL Tamar Christina
2022-10-31 11:59 ` [PATCH 7/8]AArch64: Consolidate zero and sign extension patterns and add missing ones Tamar Christina
2022-11-30  4:28   ` Tamar Christina
2022-12-06 15:59   ` Richard Sandiford
2022-10-31 12:00 ` [PATCH 8/8]AArch64: Have reload not choose to do add on the scalar side if both values exist on the SIMD side Tamar Christina
2022-11-01 15:04   ` Richard Sandiford
2022-11-01 15:20     ` Tamar Christina
2022-10-31 21:41 ` [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs Jeff Law
2022-11-05 11:32 ` Richard Biener
2022-11-07  7:16   ` Tamar Christina
2022-11-07 10:17     ` Richard Biener
2022-11-07 11:00       ` Tamar Christina
2022-11-07 11:22         ` Richard Biener
2022-11-07 11:56           ` Tamar Christina
2022-11-22 10:36             ` Richard Sandiford
2022-11-22 10:58               ` Richard Biener
2022-11-22 11:02                 ` Tamar Christina
2022-11-22 11:06                   ` Richard Sandiford
2022-11-22 11:08                     ` Richard Biener
2022-11-22 14:33                       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB532500008BED94FC1630BE2DFF009@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).