public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Richard Sandiford <richard.sandiford@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
Date: Mon, 8 Aug 2022 15:06:56 +0100	[thread overview]
Message-ID: <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2208011313430.4208@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

Hi,

So I've changed the approach from the RFC as suggested, moving the 
bitfield lowering to the if-convert pass.

So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD 
field's to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if 
they are writes, using loads and writes of 'representatives' that are 
big enough to contain the bitfield value.

In vect_recog I added two patterns to replace these BIT_FIELD_REF and 
BIT_INSERT_EXPR with shift's and masks as appropriate.

I'd like to see if it was possible to remove the 'load' part of a 
BIT_INSERT_EXPR if the representative write didn't change any relevant 
bits.  For example:

struct s{
int dont_care;
char a : 3;
};

s.a = <value>;

Should not require a load & write cycle, in fact it wouldn't even 
require any masking either. Though to achieve this we'd need to make 
sure the representative didn't overlap with any other field. Any 
suggestions on how to do this would be great, though I don't think we 
need to wait for that, as that's merely a nice-to-have optimization I guess?

I am not sure where I should 'document' this change of behavior to 
ifcvt, and/or we should change the name of the pass, since it's doing 
more than if-conversion now?

Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.

gcc/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to 
list of includes.
         (need_to_lower_bitfields): New static bool.
         (need_to_ifcvt): Likewise.
         (version_loop_for_if_conversion): Adapt to work for bitfield 
lowering-only path.
         (bitfield_data_t): New typedef.
         (get_bitfield_data): New function.
         (lower_bitfield): New function.
         (bitfields_to_lower_p): New function.
         (tree_if_conversion): Change to lower-bitfields too.
         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
Modify dump message to be more accurate.
         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
         (vect_recog_bitfield_ref_pattern): New function.
         (vect_recog_bit_insert_pattern): New function.
         (vect_vect_recog_func_ptrs): Add two new patterns.

gcc/testsuite/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
         * gcc.dg/vect/vect-bitfield-write-3.c: New test.

Kind regards,
Andre

[-- Attachment #2: vect_bitfield.patch --]
[-- Type: text/plain, Size: 29908 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..4070fa2f45970e564f13de794707613356cb5045 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3274,225 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+typedef struct
+{
+  scalar_int_mode best_mode;
+  tree struct_expr;
+  tree bf_type;
+  tree offset;
+  poly_int64 bitpos;
+  bool write;
+  gassign *stmt;
+} bitfield_data_t;
+
+/* Return TRUE if we can lower the bitfield in STMT.  Fill DATA with the
+   relevant information required to lower this bitfield.  */
+
+static bool
+get_bitfield_data (gassign *stmt, bool write, bitfield_data_t *data)
+{
+  poly_uint64 bitstart, bitend;
+  scalar_int_mode best_mode;
+  tree comp_ref = write ? gimple_get_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+  tree struct_expr = TREE_OPERAND (comp_ref, 0);
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree bf_type = TREE_TYPE (field_decl);
+  poly_int64 bitpos
+    = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl));
+  unsigned HOST_WIDE_INT bitsize = TYPE_PRECISION (bf_type);
+  tree offset = DECL_FIELD_OFFSET (field_decl);
+  /* BITSTART and BITEND describe the region we can safely load from inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the start
+     of the struct.  BEST_MODE is the mode describing the optimal size of the
+     representative chunk we load.  If this is a write we will store the same
+     sized representative back, after we have changed the appropriate bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+		     TYPE_ALIGN (TREE_TYPE (struct_expr)),
+		     INT_MAX, false, &best_mode))
+    {
+      data->best_mode = best_mode;
+      data->struct_expr = struct_expr;
+      data->bf_type = bf_type;
+      data->offset = offset;
+      data->bitpos = bitpos;
+      data->write = write;
+      data->stmt = stmt;
+      return true;
+    }
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\t\tCan not lower Bitfield, could not determine"
+			  " best mode.\n");
+    }
+  return false;
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (bitfield_data_t *data)
+{
+  scalar_int_mode best_mode = data->best_mode;
+  tree struct_expr = data->struct_expr;
+  tree bf_type = data->bf_type;
+  tree offset = data->offset;
+  poly_int64 bitpos = data->bitpos;
+  bool write = data->write;
+  gassign *stmt = data->stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  /* Type of the representative.  */
+  tree rep_type
+    = lang_hooks.types.type_for_mode (best_mode, TYPE_UNSIGNED (bf_type));
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			      NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();
+  extra_offset /= TYPE_ALIGN (bf_type);
+  extra_offset *= TYPE_ALIGN (bf_type);
+  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset,
+			build_int_cst (TREE_TYPE (offset),
+				       extra_offset / BITS_PER_UNIT));
+  /* Adapt the BITPOS to reflect the number of bits between the start of the
+     load and the start of the bitfield value.  */
+  bitpos -= extra_offset;
+  DECL_FIELD_BIT_OFFSET (rep_decl) = build_zero_cst (bitsizetype);
+  DECL_FIELD_OFFSET (rep_decl) = offset;
+  DECL_SIZE (rep_decl) = TYPE_SIZE (rep_type);
+  DECL_CONTEXT (rep_decl) = TREE_TYPE (struct_expr);
+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos_tree), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos_tree);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> *to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_get_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      bitfield_data_t *data = new bitfield_data_t ();
+	      if (get_bitfield_data (stmt, write, data))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\tBitfield OK to lower.\n");
+		  to_lower->safe_push (data);
+		}
+	      else
+		{
+		  delete data;
+		  return false;
+		}
+	    }
+	}
+    }
+  return !to_lower->is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3503,15 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <bitfield_data_t *, 4> bitfields_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,11 +3527,17 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
+    goto cleanup;
+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   if ((need_to_predicate || any_complicated_phi)
@@ -3310,7 +3553,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3594,32 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!bitfields_to_lower.is_empty ())
+	{
+	  bitfield_data_t *data = bitfields_to_lower.pop ();
+	  lower_bitfield (data);
+	  delete data;
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3395,6 +3661,11 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
       loop = rloop;
       goto again;
     }
+  while (!bitfields_to_lower.is_empty ())
+    {
+      bitfield_data_t *data = bitfields_to_lower.pop ();
+      delete data;
+    }
 
   return todo;
 }
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..435b75f860784a929041d5214d39c876c5ba790a 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,204 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type) _2;
+
+   where type is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = (type) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 << bitsize) - 1);
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR
+      || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);
+
+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;
+
+
+  gimple *pattern_stmt;
+  tree lhs = load;
+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs, offset);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size);
+  tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 & mask;		    // Clearing of the non-relevant bits in the
+				    // 'to-write value'.
+   patt2 = patt1 << bitpos;	    // Shift the cleaned value in to place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write to,
+				    // from the value we want to write to.
+   _3 = patt3 | patt2;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  /* Bail out if the load is already of vector type.  */
+  if (VECTOR_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  if (CONSTANT_CLASS_P (value))
+    value = fold_build1 (NOP_EXPR, load_type, value);
+  else
+    {
+      if (TREE_CODE (value) != SSA_NAME)
+	return NULL;
+      gassign *nop_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (value));
+      if (!nop_stmt || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR)
+	return NULL;
+      if (!useless_type_conversion_p (TREE_TYPE (value), load_type))
+	{
+	  value = fold_build1 (NOP_EXPR, load_type, gimple_assign_rhs1 (nop_stmt));
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+				   value);
+	  value = gimple_get_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+	}
+    }
+
+  unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1;
+  tree mask_t = build_int_cst (load_type, mask_i);
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   fold_build2 (BIT_AND_EXPR, load_type, value,
+					mask_t));
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+  /* Mask off the bits in the loaded value.  */
+  mask_i <<= shift_n;
+  mask_i = ~mask_i;
+  mask_t = build_int_cst (load_type, mask_i);
+
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5822,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

  reply	other threads:[~2022-08-08 14:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26 10:00 [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
2022-07-27 11:37 ` Richard Biener
2022-07-29  8:57   ` Andre Vieira (lists)
2022-07-29  9:11     ` Richard Biener
2022-07-29 10:31     ` Jakub Jelinek
2022-07-29 10:52       ` Richard Biener
2022-08-01 10:21         ` Andre Vieira (lists)
2022-08-01 13:16           ` Richard Biener
2022-08-08 14:06             ` Andre Vieira (lists) [this message]
2022-08-09 14:34               ` [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Richard Biener
2022-08-16 10:24                 ` Andre Vieira (lists)
2022-08-17 12:49                   ` Richard Biener
2022-08-25  9:09                     ` Andre Vieira (lists)
2022-09-08  9:07                       ` Andre Vieira (lists)
2022-09-08 11:51                       ` Richard Biener
2022-09-26 15:23                         ` Andre Vieira (lists)
2022-09-27 12:34                           ` Richard Biener
2022-09-28  9:43                             ` Andre Vieira (lists)
2022-09-28 17:31                               ` Andre Vieira (lists)
2022-09-29  7:54                                 ` Richard Biener
2022-10-07 14:20                                   ` Andre Vieira (lists)
2022-10-12  1:55                                     ` Hongtao Liu
2022-10-12  2:11                                       ` Hongtao Liu
2022-08-01 10:13       ` [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
2022-10-12  9:02 ` Eric Botcazou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).