public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] Teach vectorizer to deal with bitfield reads
@ 2022-07-26 10:00 Andre Vieira (lists)
  2022-07-27 11:37 ` Richard Biener
  2022-10-12  9:02 ` Eric Botcazou
  0 siblings, 2 replies; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-07-26 10:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Richard Sandiford, pinskia

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

Hi,

This is a RFC for my prototype for bitfield read vectorization. This 
patch enables bit-field read vectorization by removing the rejection of 
bit-field read's during DR analysis and by adding two vect patterns. The 
first one transforms TREE_COMPONENT's with BIT_FIELD_DECL's into 
BIT_FIELD_REF's, this is a temporary one as I believe there are plans to 
do this lowering earlier in the compiler. To avoid having to wait for 
that to happen we decided to implement this temporary vect pattern.
The second one looks for conversions of the result of BIT_FIELD_REF's 
from a 'partial' type to a 'full-type' and transforms it into a 
'full-type' load followed by the necessary shifting and masking.

The patch is not perfect, one thing I'd like to change for instance is 
the way the 'full-type' load is represented. I currently abuse the fact 
that the vectorizer transforms the original TREE_COMPONENT with a 
BIT_FIELD_DECL into a full-type vector load, because it uses the 
smallest mode necessary for that precision. The reason I do this is 
because I was struggling to construct a MEM_REF that the vectorizer 
would accept and this for some reason seemed to work ... I'd appreciate 
some pointers on how to do this properly :)

Another aspect that I haven't paid much attention to yet is costing, 
I've noticed some testcases fail to vectorize due to costing where I 
think it might be wrong, but like I said, I haven't paid much attention 
to it.

Finally another aspect I'd eventually like to tackle is the sinking of 
the masking when possible, for instance in bit-field-read-3.c the 
'masking' does not need to be inside the loop because we are doing 
bitwise operations. Though I suspect we are better off looking at things 
like this elsewhere, maybe where we do codegen for the reduction... 
Haven't looked at this at all yet.

Let me know if you believe this is a good approach? I've ran regression 
tests and this hasn't broken anything so far...

Kind regards,
Andre

PS: Once we do have lowering of BIT_FIELD_DECL's to BIT_FIELD_REF's 
earlier in the compiler I suspect we will require some further changes 
to the DR analysis part, but that's difficult to test right now.

[-- Attachment #2: vect_bitfieldread_rfc.patch --]
[-- Type: text/plain, Size: 14446 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c30aad365c40474109748bd03c3a5ca1d10723ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bitfield-read-1.c
@@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ab82ff347c55e78d098d194d739bcd9d7737f777
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bitfield-read-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fa566b4411e0da16f617f092eb49cceccbe7ca90
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bitfield-read-4.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index ff9327f6deb2bb85abbd3853dca9c666699e7a37..a27ec37a456b0c726221767a4b5e52a74057ae23 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -1137,7 +1137,22 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
   gcc_assert (base != NULL_TREE);
 
   poly_int64 pbytepos;
-  if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
+  /* If we are dealing with a bit-field reference then the PBITPOS may not be
+     a multiple of BITS_PER_UNIT.  Set PBYTEPOS to 0 if PBITPOS is smaller than
+     a byte or to the largest number of bytes smaller than BITS_PER_UNIT *
+     PBITPOS.  */
+  if (TREE_CODE (ref) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+    {
+      if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
+	{
+	  if (known_lt (pbitpos, BITS_PER_UNIT))
+	   pbytepos = 0;
+	  else
+	   can_div_trunc_p (pbitpos, BITS_PER_UNIT, &pbytepos);
+	}
+    }
+  else if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
     return opt_result::failure_at (stmt,
 				   "failed: bit offset alignment.\n");
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..cb610db9ca57e5179825e67be5aeb6af98d01aa0 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4016,7 +4016,18 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   if (reversep)
     return false;
 
-  poly_int64 pbytepos = exact_div (pbitpos, BITS_PER_UNIT);
+  /* If we are dealing with a bit-field reference then the PBITPOS may not be
+     a multiple of BITS_PER_UNIT.  Set PBYTEPOS to 0 if PBITPOS is smaller than
+     a byte or to the largest number of bytes smaller than BITS_PER_UNIT *
+     PBITPOS.  */
+  poly_int64 pbytepos;
+  if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
+    {
+      if (known_lt (pbitpos, BITS_PER_UNIT))
+	pbytepos = 0;
+      else
+	can_div_trunc_p (pbitpos, BITS_PER_UNIT, &pbytepos);
+    }
 
   if (TREE_CODE (base) == MEM_REF)
     {
@@ -4296,7 +4307,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       }
 
   if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
+      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1))
+      && !DR_IS_READ (dr))
     {
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..3f64b23888086f61e5ebf928a7ee0c6ed78bde15 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,157 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+static gimple *
+vect_recog_bit_field_decl (vec_info* /* vinfo*/, stmt_vec_info last_stmt_info,
+			 tree *type_out)
+{
+  gassign *decl_stmt = dyn_cast <gassign *> (last_stmt_info->stmt);
+
+  if (!decl_stmt)
+    return NULL;
+
+  data_reference *dr = STMT_VINFO_DATA_REF (last_stmt_info);
+  if (!dr)
+    return NULL;
+
+  if (TREE_CODE (DR_REF (dr)) != COMPONENT_REF
+      || !DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1))
+      || !DR_IS_READ (dr))
+    return NULL;
+
+  tree type = TREE_TYPE (gimple_get_lhs (decl_stmt));
+  tree record = TREE_OPERAND (DR_REF (dr), 0);
+  tree decl_field = TREE_OPERAND (DR_REF (dr), 1);
+  tree offset = fold_build2 (PLUS_EXPR, sizetype,
+			     DECL_FIELD_OFFSET (decl_field),
+			     DECL_FIELD_BIT_OFFSET (decl_field));
+  tree bf_ref = fold_build3 (BIT_FIELD_REF, type,
+			     record,
+			     build_int_cst (sizetype,
+			     TYPE_PRECISION (type)),
+			     offset);
+  gimple *pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (type, NULL), bf_ref);
+
+  *type_out = STMT_VINFO_VECTYPE (last_stmt_info);
+
+  vect_pattern_detected ("bit_field_decl pattern", last_stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+static gimple *
+vect_recog_bit_field_ref (vec_info *vinfo, stmt_vec_info last_stmt_info,
+			  tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (last_stmt_info->stmt);
+  if (!nop_stmt)
+    return NULL;
+
+  if (gimple_assign_rhs_code (nop_stmt) != NOP_EXPR)
+    return NULL;
+
+  if (TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt)
+    return NULL;
+
+  stmt_vec_info bf_stmt_info = vinfo->lookup_stmt (bf_stmt);
+  if (gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF
+      && (!bf_stmt_info || !STMT_VINFO_IN_PATTERN_P (bf_stmt_info)))
+    return NULL;
+
+  if (gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    {
+      if (!STMT_VINFO_RELATED_STMT (bf_stmt_info))
+	return NULL;
+      bf_stmt
+	= dyn_cast <gassign *> (STMT_VINFO_RELATED_STMT (bf_stmt_info)->stmt);
+    }
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  /* This is weird, why is rhs1 still a BIT_FIELD_REF??.  */
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree record = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);
+
+  tree bf_type = TREE_TYPE (bf_ref);
+  unsigned HOST_WIDE_INT bf_precision = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT load_size
+    = CEIL (bf_precision, BITS_PER_UNIT) * BITS_PER_UNIT;
+
+  if (bf_precision == load_size)
+    return NULL;
+
+  tree addr = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (record)),
+		      record);
+
+  addr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (addr), addr,
+		      fold_convert (sizetype, offset));
+
+  tree load_type = build_nonstandard_integer_type (load_size, 1);
+  tree vectype = get_vectype_for_scalar_type (vinfo, load_type);
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+
+  data_reference *dr = STMT_VINFO_DATA_REF (bf_stmt_info);
+  /* TODO: Fix this, rather than using the DR_REF here I'd like to reconstruct
+     the desired load, rather than rely on the 'misguided?' behaviour of the
+     vectorizer to vectorize these as normal loads.  However when I tried it
+     lead to the vectorizer think it needed to vectorize the address
+     computation too.  */
+  gimple *pattern_stmt = gimple_build_assign (lhs, DR_REF (dr));
+  gimple *load_stmt = pattern_stmt;
+
+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      append_pattern_def_seq (vinfo, bf_stmt_info, pattern_stmt,
+			      vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      vectype = STMT_VINFO_VECTYPE (last_stmt_info);
+    }
+  vect_mark_pattern_stmts (vinfo, bf_stmt_info, pattern_stmt, vectype);
+
+  stmt_vec_info load_stmt_info = vinfo->lookup_stmt (load_stmt);
+  vinfo->move_dr (load_stmt_info, bf_stmt_info);
+
+  unsigned HOST_WIDE_INT offset_i = tree_to_uhwi (offset);
+  unsigned HOST_WIDE_INT shift_n = offset_i % load_size;
+
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs,
+			       build_int_cst (integer_type_node, shift_n));
+      append_pattern_def_seq (vinfo, last_stmt_info, pattern_stmt);
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size);
+  tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+
+  *type_out = STMT_VINFO_VECTYPE (last_stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", last_stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5775,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bit_field_decl, "bit_field_decl" },
+  { vect_recog_bit_field_ref, "bitfield_ref" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f582d238984fbd083650a45d87997f72b6cd3839..c06e96ba2973d3048a754b1c15cfae917a35e271 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4933,19 +4933,6 @@ vectorizable_conversion (vec_info *vinfo,
 	       && SCALAR_FLOAT_TYPE_P (rhs_type))))
     return false;
 
-  if (!VECTOR_BOOLEAN_TYPE_P (vectype_out)
-      && ((INTEGRAL_TYPE_P (lhs_type)
-	   && !type_has_mode_precision_p (lhs_type))
-	  || (INTEGRAL_TYPE_P (rhs_type)
-	      && !type_has_mode_precision_p (rhs_type))))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "type conversion to/from bit-precision unsupported."
-                         "\n");
-      return false;
-    }
-
   if (op_type == binary_op)
     {
       gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-26 10:00 [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
@ 2022-07-27 11:37 ` Richard Biener
  2022-07-29  8:57   ` Andre Vieira (lists)
  2022-10-12  9:02 ` Eric Botcazou
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-07-27 11:37 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, pinskia

On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> This is a RFC for my prototype for bitfield read vectorization. This patch
> enables bit-field read vectorization by removing the rejection of bit-field
> read's during DR analysis and by adding two vect patterns. The first one
> transforms TREE_COMPONENT's with BIT_FIELD_DECL's into BIT_FIELD_REF's, this
> is a temporary one as I believe there are plans to do this lowering earlier in
> the compiler. To avoid having to wait for that to happen we decided to
> implement this temporary vect pattern.
> The second one looks for conversions of the result of BIT_FIELD_REF's from a
> 'partial' type to a 'full-type' and transforms it into a 'full-type' load
> followed by the necessary shifting and masking.
> 
> The patch is not perfect, one thing I'd like to change for instance is the way
> the 'full-type' load is represented. I currently abuse the fact that the
> vectorizer transforms the original TREE_COMPONENT with a BIT_FIELD_DECL into a
> full-type vector load, because it uses the smallest mode necessary for that
> precision. The reason I do this is because I was struggling to construct a
> MEM_REF that the vectorizer would accept and this for some reason seemed to
> work ... I'd appreciate some pointers on how to do this properly :)
> 
> Another aspect that I haven't paid much attention to yet is costing, I've
> noticed some testcases fail to vectorize due to costing where I think it might
> be wrong, but like I said, I haven't paid much attention to it.
> 
> Finally another aspect I'd eventually like to tackle is the sinking of the
> masking when possible, for instance in bit-field-read-3.c the 'masking' does
> not need to be inside the loop because we are doing bitwise operations. Though
> I suspect we are better off looking at things like this elsewhere, maybe where
> we do codegen for the reduction... Haven't looked at this at all yet.
> 
> Let me know if you believe this is a good approach? I've ran regression tests
> and this hasn't broken anything so far...

I don't think this is a good approach for what you gain and how 
necessarily limited it will be.  Similar to the recent experiment with
handling _Complex loads/stores this is much better tackled by lowering
things earlier (it will be lowered at RTL expansion time).

One place to do this experimentation would be to piggy-back on the
if-conversion pass so the lowering would happen only on the
vectorized code path.  Note that the desired lowering would look like
the following for reads:

  _1 = a.b;

to

  _2 = a.<representative for b>;
  _1 = BIT_FIELD_REF <2, ...>; // extract bits

and for writes:

  a.b = _1;

to

  _2 = a.<representative for b>;
  _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
  a.<representative for b> = _3;

so you trade now handled loads/stores with not handled
BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
pattern match to shifts and logical ops in the vectorizer.

There's a separate thing of actually promoting all uses, for
example

struct { long long x : 33; } a;

 a.a = a.a + 1;

will get you 33bit precision adds (for bit fields less than 32bits
they get promoted to int but not for larger bit fields).  RTL
expansion again will rewrite this into larger ops plus masking.

So I think the time is better spent in working on the lowering of
bitfield accesses, if sufficiently separated it could be used
from if-conversion by working on loop SEME regions.  The patches
doing previous implementations are probably not too useful anymore
(I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-27 11:37 ` Richard Biener
@ 2022-07-29  8:57   ` Andre Vieira (lists)
  2022-07-29  9:11     ` Richard Biener
  2022-07-29 10:31     ` Jakub Jelinek
  0 siblings, 2 replies; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-07-29  8:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford, pinskia

Hi Richard,

Thanks for the review, I don't completely understand all of the below, 
so I added some extra questions to help me understand :)

On 27/07/2022 12:37, Richard Biener wrote:
> On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:
>
> I don't think this is a good approach for what you gain and how
> necessarily limited it will be.  Similar to the recent experiment with
> handling _Complex loads/stores this is much better tackled by lowering
> things earlier (it will be lowered at RTL expansion time).
I assume the approach you are referring to here is the lowering of the 
BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the 
vectorizer. I am all for lowering earlier, the reason I did it there was 
as a 'temporary' approach until we have that earlier loading.
>
> One place to do this experimentation would be to piggy-back on the
> if-conversion pass so the lowering would happen only on the
> vectorized code path.
This was one of my initial thoughts, though the if-conversion changes 
are a bit more intrusive for a temporary approach and not that much 
earlier. It does however have the added benefit of not having to make 
any changes to the vectorizer itself later if we do do the earlier 
lowering, assuming the lowering results in the same.

The 'only on the vectorized code path' remains the same though as 
vect_recog also only happens on the vectorized code path right?
>   Note that the desired lowering would look like
> the following for reads:
>
>    _1 = a.b;
>
> to
>
>    _2 = a.<representative for b>;
>    _1 = BIT_FIELD_REF <2, ...>; // extract bits
I don't yet have a well formed idea of what '<representative for b>' is 
supposed to look like in terms of tree expressions. I understand what 
it's supposed to be representing, the 'larger than bit-field'-load. But 
is it going to be a COMPONENT_REF with a fake 'FIELD_DECL' with the 
larger size? Like I said on IRC, the description of BIT_FIELD_REF makes 
it sound like this isn't how we are supposed to use it, are we intending 
to make a change to that here?

> and for writes:
>
>    a.b = _1;
>
> to
>
>    _2 = a.<representative for b>;
>    _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
>    a.<representative for b> = _3;
I was going to avoid writes for now because they are somewhat more 
complicated, but maybe it's not that bad, I'll add them too.
> so you trade now handled loads/stores with not handled
> BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
> pattern match to shifts and logical ops in the vectorizer.
Yeah that vect_recog pattern already exists in my RFC patch, though I 
can probably simplify it by moving the bit-field-ref stuff to ifcvt.
>
> There's a separate thing of actually promoting all uses, for
> example
>
> struct { long long x : 33; } a;
>
>   a.a = a.a + 1;
>
> will get you 33bit precision adds (for bit fields less than 32bits
> they get promoted to int but not for larger bit fields).  RTL
> expansion again will rewrite this into larger ops plus masking.
Not sure I understand why this is relevant here? The current way I am 
doing this would likely lower a  bit-field like that to a 64-bit load  
followed by the masking away of the top 31 bits, same would happen with 
a ifcvt-lowering approach.
>
> So I think the time is better spent in working on the lowering of
> bitfield accesses, if sufficiently separated it could be used
> from if-conversion by working on loop SEME regions.
I will start to look at modifying ifcvt to add the lowering there. Will 
likely require two pass though because we can no longer look at the 
number of BBs to determine whether ifcvt is even needed, so we will 
first need to look for bit-field-decls, then version the loops and then 
look for them again for transformation, but I guess that's fine?
> The patches
> doing previous implementations are probably not too useful anymore
> (I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)
>
> Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-29  8:57   ` Andre Vieira (lists)
@ 2022-07-29  9:11     ` Richard Biener
  2022-07-29 10:31     ` Jakub Jelinek
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Biener @ 2022-07-29  9:11 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, pinskia

On Fri, 29 Jul 2022, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> Thanks for the review, I don't completely understand all of the below, so I
> added some extra questions to help me understand :)
> 
> On 27/07/2022 12:37, Richard Biener wrote:
> > On Tue, 26 Jul 2022, Andre Vieira (lists) wrote:
> >
> > I don't think this is a good approach for what you gain and how
> > necessarily limited it will be.  Similar to the recent experiment with
> > handling _Complex loads/stores this is much better tackled by lowering
> > things earlier (it will be lowered at RTL expansion time).
> I assume the approach you are referring to here is the lowering of the
> BIT_FIELD_DECL to BIT_FIELD_REF in the vect_recog part of the vectorizer. I am
> all for lowering earlier, the reason I did it there was as a 'temporary'
> approach until we have that earlier loading.

I understood, but "temporary" things in GCC tend to be still around
10 years later, so ...

> >
> > One place to do this experimentation would be to piggy-back on the
> > if-conversion pass so the lowering would happen only on the
> > vectorized code path.
> This was one of my initial thoughts, though the if-conversion changes are a
> bit more intrusive for a temporary approach and not that much earlier. It does
> however have the added benefit of not having to make any changes to the
> vectorizer itself later if we do do the earlier lowering, assuming the
> lowering results in the same.
> 
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?
> >   Note that the desired lowering would look like
> > the following for reads:
> >
> >    _1 = a.b;
> >
> > to
> >
> >    _2 = a.<representative for b>;
> >    _1 = BIT_FIELD_REF <2, ...>; // extract bits
> I don't yet have a well formed idea of what '<representative for b>' is
> supposed to look like in terms of tree expressions. I understand what it's
> supposed to be representing, the 'larger than bit-field'-load. But is it going
> to be a COMPONENT_REF with a fake 'FIELD_DECL' with the larger size? Like I
> said on IRC, the description of BIT_FIELD_REF makes it sound like this isn't
> how we are supposed to use it, are we intending to make a change to that here?

<representative for b> is what DECL_BIT_FIELD_REPRESENTATIVE 
(FIELD_DECL-for-b) gives you, it is a "fake" FIELD_DECL for the underlying
storage, conveniently made available to you by the middle-end.  For
your 31bit field it would be simply 'int' typed.

The BIT_FIELD_REF then extracts the actual bitfield from the underlying
storage, but it's now no longer operating on memory but on the register
holding the underlying data.  To the vectorizer we'd probably have to
pattern-match this to shifts & masks and hope for the conversion to
combine with a later one.

> > and for writes:
> >
> >    a.b = _1;
> >
> > to
> >
> >    _2 = a.<representative for b>;
> >    _3 = BIT_INSERT_EXPR <_2, _1, ...>; // insert bits
> >    a.<representative for b> = _3;
> I was going to avoid writes for now because they are somewhat more
> complicated, but maybe it's not that bad, I'll add them too.

Only handling loads at start is probably fine as experiment, but
handling stores should be straight forward - of course the
BIT_INSERT_EXPR lowering to shifts & masks will be more
complicated.

> > so you trade now handled loads/stores with not handled
> > BIT_FIELD_REF / BIT_INSERT_EXPR which you would then need to
> > pattern match to shifts and logical ops in the vectorizer.
> Yeah that vect_recog pattern already exists in my RFC patch, though I can
> probably simplify it by moving the bit-field-ref stuff to ifcvt.
> >
> > There's a separate thing of actually promoting all uses, for
> > example
> >
> > struct { long long x : 33; } a;
> >
> >   a.a = a.a + 1;
> >
> > will get you 33bit precision adds (for bit fields less than 32bits
> > they get promoted to int but not for larger bit fields).  RTL
> > expansion again will rewrite this into larger ops plus masking.
> Not sure I understand why this is relevant here? The current way I am doing
> this would likely lower a  bit-field like that to a 64-bit load  followed by
> the masking away of the top 31 bits, same would happen with a ifcvt-lowering
> approach.

Yes, but if there's anything besides loading or storing you will have
a conversion from, say int:31 to int in the IL before any arithmetic.
I've not looked but your patch probably handles conversions to/from
bitfield types by masking / extending.  What I've mentioned with the
33bit example is that with that you can have arithmetic in 33 bits
_without_ intermediate conversions, so you'd have to properly truncate
after every such operation (or choose not to vectorize which I think
is what would happen now).

> >
> > So I think the time is better spent in working on the lowering of
> > bitfield accesses, if sufficiently separated it could be used
> > from if-conversion by working on loop SEME regions.
> I will start to look at modifying ifcvt to add the lowering there. Will likely
> require two pass though because we can no longer look at the number of BBs to
> determine whether ifcvt is even needed, so we will first need to look for
> bit-field-decls, then version the loops and then look for them again for
> transformation, but I guess that's fine?

Ah, yeah - I guess that's fine.  Just add an need_to_lower_bitfields
along need_to_predicate and friends.

Richard.

> > The patches
> > doing previous implementations are probably not too useful anymore
> > (I find one from 2011 and one from 2016, both pre-dating BIT_INSERT_EXPR)
> >
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-29  8:57   ` Andre Vieira (lists)
  2022-07-29  9:11     ` Richard Biener
@ 2022-07-29 10:31     ` Jakub Jelinek
  2022-07-29 10:52       ` Richard Biener
  2022-08-01 10:13       ` [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
  1 sibling, 2 replies; 25+ messages in thread
From: Jakub Jelinek @ 2022-07-29 10:31 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Richard Biener, Richard Sandiford, gcc-patches

On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches wrote:
> The 'only on the vectorized code path' remains the same though as vect_recog
> also only happens on the vectorized code path right?

if conversion (in some cases) duplicates a loop and guards one copy with
an ifn which resolves to true if that particular loop is vectorized and
false otherwise.  So, then changes that shouldn't be done in case of
vectorization failure can be done on the for vectorizer only copy of the
loop.

	Jakub


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-29 10:31     ` Jakub Jelinek
@ 2022-07-29 10:52       ` Richard Biener
  2022-08-01 10:21         ` Andre Vieira (lists)
  2022-08-01 10:13       ` [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-07-29 10:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andre Vieira (lists), Richard Sandiford, gcc-patches

On Fri, 29 Jul 2022, Jakub Jelinek wrote:

> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches wrote:
> > The 'only on the vectorized code path' remains the same though as vect_recog
> > also only happens on the vectorized code path right?
> 
> if conversion (in some cases) duplicates a loop and guards one copy with
> an ifn which resolves to true if that particular loop is vectorized and
> false otherwise.  So, then changes that shouldn't be done in case of
> vectorization failure can be done on the for vectorizer only copy of the
> loop.

And just to mention, one issue with lowering of bitfield accesses
is bitfield inserts which, on some architectures (hello m68k) have
instructions operating on memory directly.  For those it's difficult
to not regress in code quality if a bitfield store becomes a
read-modify-write cycle.  That's one of the things holding this
back.  One idea would be to lower to .INSV directly for those targets
(but presence of insv isn't necessarily indicating support for
memory destinations).

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-29 10:31     ` Jakub Jelinek
  2022-07-29 10:52       ` Richard Biener
@ 2022-08-01 10:13       ` Andre Vieira (lists)
  1 sibling, 0 replies; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-08-01 10:13 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Richard Sandiford, gcc-patches


On 29/07/2022 11:31, Jakub Jelinek wrote:
> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches wrote:
>> The 'only on the vectorized code path' remains the same though as vect_recog
>> also only happens on the vectorized code path right?
> if conversion (in some cases) duplicates a loop and guards one copy with
> an ifn which resolves to true if that particular loop is vectorized and
> false otherwise.  So, then changes that shouldn't be done in case of
> vectorization failure can be done on the for vectorizer only copy of the
> loop.
>
> 	Jakub
I'm pretty sure vect_recog patterns have no effect on scalar codegen if 
the vectorization fails too. The patterns live as new vect_stmt_info's 
and no changes are actually done to the scalar loop. That was the point 
I was trying to make, but it doesn't matter that much, as I said I am 
happy to do this in if convert.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-29 10:52       ` Richard Biener
@ 2022-08-01 10:21         ` Andre Vieira (lists)
  2022-08-01 13:16           ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-08-01 10:21 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: Richard Sandiford, gcc-patches


On 29/07/2022 11:52, Richard Biener wrote:
> On Fri, 29 Jul 2022, Jakub Jelinek wrote:
>
>> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via Gcc-patches wrote:
>>> The 'only on the vectorized code path' remains the same though as vect_recog
>>> also only happens on the vectorized code path right?
>> if conversion (in some cases) duplicates a loop and guards one copy with
>> an ifn which resolves to true if that particular loop is vectorized and
>> false otherwise.  So, then changes that shouldn't be done in case of
>> vectorization failure can be done on the for vectorizer only copy of the
>> loop.
> And just to mention, one issue with lowering of bitfield accesses
> is bitfield inserts which, on some architectures (hello m68k) have
> instructions operating on memory directly.  For those it's difficult
> to not regress in code quality if a bitfield store becomes a
> read-modify-write cycle.  That's one of the things holding this
> back.  One idea would be to lower to .INSV directly for those targets
> (but presence of insv isn't necessarily indicating support for
> memory destinations).
>
> Richard.
Should I account for that when vectorizing though? From what I can tell 
(no TARGET_VECTOR_* hooks implemented) m68k does not have vectorization 
support. So the question is, are there currently any targets that 
vectorize and have vector bitfield-insert/extract support? If they don't 
exist I suggest we worry about it when it comes around, if not just for 
the fact that we wouldn't be able to test it right now.

If this is about not lowering on the non-vectorized path, see my 
previous reply, I never intended to do that in the vectorizer. I just 
thought it was the plan to do lowering eventually.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-08-01 10:21         ` Andre Vieira (lists)
@ 2022-08-01 13:16           ` Richard Biener
  2022-08-08 14:06             ` [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-08-01 13:16 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

On Mon, 1 Aug 2022, Andre Vieira (lists) wrote:

> 
> On 29/07/2022 11:52, Richard Biener wrote:
> > On Fri, 29 Jul 2022, Jakub Jelinek wrote:
> >
> >> On Fri, Jul 29, 2022 at 09:57:29AM +0100, Andre Vieira (lists) via
> >> Gcc-patches wrote:
> >>> The 'only on the vectorized code path' remains the same though as
> >>> vect_recog
> >>> also only happens on the vectorized code path right?
> >> if conversion (in some cases) duplicates a loop and guards one copy with
> >> an ifn which resolves to true if that particular loop is vectorized and
> >> false otherwise.  So, then changes that shouldn't be done in case of
> >> vectorization failure can be done on the for vectorizer only copy of the
> >> loop.
> > And just to mention, one issue with lowering of bitfield accesses
> > is bitfield inserts which, on some architectures (hello m68k) have
> > instructions operating on memory directly.  For those it's difficult
> > to not regress in code quality if a bitfield store becomes a
> > read-modify-write cycle.  That's one of the things holding this
> > back.  One idea would be to lower to .INSV directly for those targets
> > (but presence of insv isn't necessarily indicating support for
> > memory destinations).
> >
> > Richard.
> Should I account for that when vectorizing though? From what I can tell (no
> TARGET_VECTOR_* hooks implemented) m68k does not have vectorization support.

No.

> So the question is, are there currently any targets that vectorize and have
> vector bitfield-insert/extract support? If they don't exist I suggest we worry
> about it when it comes around, if not just for the fact that we wouldn't be
> able to test it right now.
> 
> If this is about not lowering on the non-vectorized path, see my previous
> reply, I never intended to do that in the vectorizer. I just thought it was
> the plan to do lowering eventually.

Yes, for the vectorized path this all isn't an issue - and btw the
advantage with if-conversion is that you get VN of the result
"for free", the RMW cycle of bitfield stores likely have reads to
share (and also redundant stores in the end, but ...).

And yes, the plan was to do lowering generally.  Just the simplistic
approaches (my last one was a lowering pass somewhen after IPA, IIRC
combined with SRA) run into some issues, like that on m68k, but IIRC
also some others.  So I wouldn't hold my breath, but then just somebody
needs to do the work and think about how to deal with m68k and the
likes...

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-01 13:16           ` Richard Biener
@ 2022-08-08 14:06             ` Andre Vieira (lists)
  2022-08-09 14:34               ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-08-08 14:06 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

Hi,

So I've changed the approach from the RFC as suggested, moving the 
bitfield lowering to the if-convert pass.

So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD 
field's to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if 
they are writes, using loads and writes of 'representatives' that are 
big enough to contain the bitfield value.

In vect_recog I added two patterns to replace these BIT_FIELD_REF and 
BIT_INSERT_EXPR with shift's and masks as appropriate.

I'd like to see if it was possible to remove the 'load' part of a 
BIT_INSERT_EXPR if the representative write didn't change any relevant 
bits.  For example:

struct s{
int dont_care;
char a : 3;
};

s.a = <value>;

Should not require a load & write cycle, in fact it wouldn't even 
require any masking either. Though to achieve this we'd need to make 
sure the representative didn't overlap with any other field. Any 
suggestions on how to do this would be great, though I don't think we 
need to wait for that, as that's merely a nice-to-have optimization I guess?

I am not sure where I should 'document' this change of behavior to 
ifcvt, and/or we should change the name of the pass, since it's doing 
more than if-conversion now?

Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.

gcc/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to 
list of includes.
         (need_to_lower_bitfields): New static bool.
         (need_to_ifcvt): Likewise.
         (version_loop_for_if_conversion): Adapt to work for bitfield 
lowering-only path.
         (bitfield_data_t): New typedef.
         (get_bitfield_data): New function.
         (lower_bitfield): New function.
         (bitfields_to_lower_p): New function.
         (tree_if_conversion): Change to lower-bitfields too.
         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
Modify dump message to be more accurate.
         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
         (vect_recog_bitfield_ref_pattern): New function.
         (vect_recog_bit_insert_pattern): New function.
         (vect_vect_recog_func_ptrs): Add two new patterns.

gcc/testsuite/ChangeLog:
2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
         * gcc.dg/vect/vect-bitfield-write-3.c: New test.

Kind regards,
Andre

[-- Attachment #2: vect_bitfield.patch --]
[-- Type: text/plain, Size: 29908 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..4070fa2f45970e564f13de794707613356cb5045 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3274,225 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+typedef struct
+{
+  scalar_int_mode best_mode;
+  tree struct_expr;
+  tree bf_type;
+  tree offset;
+  poly_int64 bitpos;
+  bool write;
+  gassign *stmt;
+} bitfield_data_t;
+
+/* Return TRUE if we can lower the bitfield in STMT.  Fill DATA with the
+   relevant information required to lower this bitfield.  */
+
+static bool
+get_bitfield_data (gassign *stmt, bool write, bitfield_data_t *data)
+{
+  poly_uint64 bitstart, bitend;
+  scalar_int_mode best_mode;
+  tree comp_ref = write ? gimple_get_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+  tree struct_expr = TREE_OPERAND (comp_ref, 0);
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree bf_type = TREE_TYPE (field_decl);
+  poly_int64 bitpos
+    = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl));
+  unsigned HOST_WIDE_INT bitsize = TYPE_PRECISION (bf_type);
+  tree offset = DECL_FIELD_OFFSET (field_decl);
+  /* BITSTART and BITEND describe the region we can safely load from inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the start
+     of the struct.  BEST_MODE is the mode describing the optimal size of the
+     representative chunk we load.  If this is a write we will store the same
+     sized representative back, after we have changed the appropriate bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+		     TYPE_ALIGN (TREE_TYPE (struct_expr)),
+		     INT_MAX, false, &best_mode))
+    {
+      data->best_mode = best_mode;
+      data->struct_expr = struct_expr;
+      data->bf_type = bf_type;
+      data->offset = offset;
+      data->bitpos = bitpos;
+      data->write = write;
+      data->stmt = stmt;
+      return true;
+    }
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\t\tCan not lower Bitfield, could not determine"
+			  " best mode.\n");
+    }
+  return false;
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (bitfield_data_t *data)
+{
+  scalar_int_mode best_mode = data->best_mode;
+  tree struct_expr = data->struct_expr;
+  tree bf_type = data->bf_type;
+  tree offset = data->offset;
+  poly_int64 bitpos = data->bitpos;
+  bool write = data->write;
+  gassign *stmt = data->stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  /* Type of the representative.  */
+  tree rep_type
+    = lang_hooks.types.type_for_mode (best_mode, TYPE_UNSIGNED (bf_type));
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			      NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();
+  extra_offset /= TYPE_ALIGN (bf_type);
+  extra_offset *= TYPE_ALIGN (bf_type);
+  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset,
+			build_int_cst (TREE_TYPE (offset),
+				       extra_offset / BITS_PER_UNIT));
+  /* Adapt the BITPOS to reflect the number of bits between the start of the
+     load and the start of the bitfield value.  */
+  bitpos -= extra_offset;
+  DECL_FIELD_BIT_OFFSET (rep_decl) = build_zero_cst (bitsizetype);
+  DECL_FIELD_OFFSET (rep_decl) = offset;
+  DECL_SIZE (rep_decl) = TYPE_SIZE (rep_type);
+  DECL_CONTEXT (rep_decl) = TREE_TYPE (struct_expr);
+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos_tree), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos_tree);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> *to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_get_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      bitfield_data_t *data = new bitfield_data_t ();
+	      if (get_bitfield_data (stmt, write, data))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\tBitfield OK to lower.\n");
+		  to_lower->safe_push (data);
+		}
+	      else
+		{
+		  delete data;
+		  return false;
+		}
+	    }
+	}
+    }
+  return !to_lower->is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3503,15 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <bitfield_data_t *, 4> bitfields_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,11 +3527,17 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
+    goto cleanup;
+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   if ((need_to_predicate || any_complicated_phi)
@@ -3310,7 +3553,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3594,32 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!bitfields_to_lower.is_empty ())
+	{
+	  bitfield_data_t *data = bitfields_to_lower.pop ();
+	  lower_bitfield (data);
+	  delete data;
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3395,6 +3661,11 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
       loop = rloop;
       goto again;
     }
+  while (!bitfields_to_lower.is_empty ())
+    {
+      bitfield_data_t *data = bitfields_to_lower.pop ();
+      delete data;
+    }
 
   return todo;
 }
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..435b75f860784a929041d5214d39c876c5ba790a 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,204 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type) _2;
+
+   where type is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = (type) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 << bitsize) - 1);
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR
+      || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);
+
+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;
+
+
+  gimple *pattern_stmt;
+  tree lhs = load;
+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs, offset);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size);
+  tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 & mask;		    // Clearing of the non-relevant bits in the
+				    // 'to-write value'.
+   patt2 = patt1 << bitpos;	    // Shift the cleaned value in to place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write to,
+				    // from the value we want to write to.
+   _3 = patt3 | patt2;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  /* Bail out if the load is already of vector type.  */
+  if (VECTOR_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  if (CONSTANT_CLASS_P (value))
+    value = fold_build1 (NOP_EXPR, load_type, value);
+  else
+    {
+      if (TREE_CODE (value) != SSA_NAME)
+	return NULL;
+      gassign *nop_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (value));
+      if (!nop_stmt || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR)
+	return NULL;
+      if (!useless_type_conversion_p (TREE_TYPE (value), load_type))
+	{
+	  value = fold_build1 (NOP_EXPR, load_type, gimple_assign_rhs1 (nop_stmt));
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+				   value);
+	  value = gimple_get_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+	}
+    }
+
+  unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1;
+  tree mask_t = build_int_cst (load_type, mask_i);
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   fold_build2 (BIT_AND_EXPR, load_type, value,
+					mask_t));
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+  /* Mask off the bits in the loaded value.  */
+  mask_i <<= shift_n;
+  mask_i = ~mask_i;
+  mask_t = build_int_cst (load_type, mask_i);
+
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5822,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-08 14:06             ` [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Andre Vieira (lists)
@ 2022-08-09 14:34               ` Richard Biener
  2022-08-16 10:24                 ` Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-08-09 14:34 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

On Mon, 8 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> So I've changed the approach from the RFC as suggested, moving the bitfield
> lowering to the if-convert pass.
> 
> So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's
> to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are
> writes, using loads and writes of 'representatives' that are big enough to
> contain the bitfield value.
> 
> In vect_recog I added two patterns to replace these BIT_FIELD_REF and
> BIT_INSERT_EXPR with shift's and masks as appropriate.
> 
> I'd like to see if it was possible to remove the 'load' part of a
> BIT_INSERT_EXPR if the representative write didn't change any relevant bits. 
> For example:
> 
> struct s{
> int dont_care;
> char a : 3;
> };
> 
> s.a = <value>;
> 
> Should not require a load & write cycle, in fact it wouldn't even require any
> masking either. Though to achieve this we'd need to make sure the
> representative didn't overlap with any other field. Any suggestions on how to
> do this would be great, though I don't think we need to wait for that, as
> that's merely a nice-to-have optimization I guess?

Hmm.  I'm not sure the middle-end can simply ignore padding.  If
some language standard says that would be OK then I think we should
exploit this during lowering when the frontend is still around to
ask - which means somewhen during early optimization.

> I am not sure where I should 'document' this change of behavior to ifcvt,
> and/or we should change the name of the pass, since it's doing more than
> if-conversion now?

It's preparation for vectorization anyway since it will emit
.MASK_LOAD/STORE and friends already.  So I don't think anything
needs to change there.


@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool 
aggressive_if_conv)
   auto_vec<edge> critical_edges;

   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;

   body = get_loop_body (loop);

this doesn't appear in the ChangeLog nor is it clear to me why it's
needed?  Likewise

-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  
*/
+      save_length = loop->inner ? loop->inner->num_nodes : 
loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+       saved_preds[i] = ifc_bbs[i]->aux;
+    }

is that just premature optimization?

+  /* BITSTART and BITEND describe the region we can safely load from 
inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the 
start
+     of the struct.  BEST_MODE is the mode describing the optimal size of 
the
+     representative chunk we load.  If this is a write we will store the 
same
+     sized representative back, after we have changed the appropriate 
bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);

I think you need to give up when get_bit_range sets bitstart = bitend to 
zero

+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
+                    INT_MAX, false, &best_mode))

+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+                             NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();

you shouldn't build a new FIELD_DECL.  Either you use
DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
BIT_FIELD_REF accessing the "representative".
DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
a variable field offset, you can also subset that with an
intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
too large for your taste.

I'm not sure all the offset calculation you do is correct, but
since you shouldn't invent a new FIELD_DECL it probably needs
to change anyway ...

Note that for optimization it will be important that all
accesses to the bitfield members of the same bitfield use the
same underlying area (CSE and store-forwarding will thank you).

+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, 
&bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;

so we lower bitfields even when we cannot split critical edges?
why?

+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;

likewise - if_convertible_loop_p performs other checks, the only
one we want to elide is the loop->num_nodes <= 2 check since
we want to lower bitfields in single-block loops as well.  That
means we only have to scan for bitfield accesses in the first
block "prematurely".  So I would interwind the need_to_lower_bitfields
into if_convertible_loop_p and if_convertible_loop_p_1 and
put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.

+         tree op = gimple_get_lhs (stmt);
+         bool write = TREE_CODE (op) == COMPONENT_REF;
+
+         if (!write)
+           op = gimple_assign_rhs1 (stmt);
+
+         if (TREE_CODE (op) != COMPONENT_REF)
+           continue;
+
+         if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))

note the canonical test for a bitfield access is to check
DECL_BIT_FIELD_TYPE, not DECL_BIT_FIELD.  In particular for

struct { int a : 4; int b : 4; int c : 8; int d : 4; int e : 12; }

'c' will _not_ have DECL_BIT_FIELD set but you want to lower it's
access since you otherwise likely will get conflicting accesses
for the other fields (store forwarding).

+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> 
*to_lower)

don't pass auto_vec<> *, just pass vec<>&, auto_vec will properly
decay.

+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info 
stmt_info,
+                                tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR

CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt))

+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);

use bit_field_{size,offset}

+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;

I think you want a "positive" check, what kind of type you
handle for the load.  An (unsigned?) INTEGRAL_TYPE_P one I guess.

+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+

gimple_assign_lhs

+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+                              NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }

hm - so you have for example

 int _1 = MEM;
 int:3 _2 = BIT_FIELD_REF <_1, ...>
 type _3 = (type) _2;

and that _3 = (type) _2 is because of integer promotion and you
perform all the shifting in that type.  I suppose you should
verify that the cast is indeed promoting, not narrowing, since
otherwise you'll produce wrong code?  That said, shouldn't you
perform the shift / mask in the type of _1 instead?  (the hope
is, of course, that typeof (_1) == type in most cases)

Similar comments apply to vect_recog_bit_insert_pattern.

Overall it looks reasonable but it does still need some work.

Thanks,
Richard.



> Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.
> 
> gcc/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of
> includes.
>         (need_to_lower_bitfields): New static bool.
>         (need_to_ifcvt): Likewise.
>         (version_loop_for_if_conversion): Adapt to work for bitfield 
> lowering-only path.
>         (bitfield_data_t): New typedef.
>         (get_bitfield_data): New function.
>         (lower_bitfield): New function.
>         (bitfields_to_lower_p): New function.
>         (tree_if_conversion): Change to lower-bitfields too.
>         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
> Modify dump message to be more accurate.
>         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
>         (vect_recog_bitfield_ref_pattern): New function.
>         (vect_recog_bit_insert_pattern): New function.
>         (vect_vect_recog_func_ptrs): Add two new patterns.
> 
> gcc/testsuite/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> 
> Kind regards,
> Andre

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-09 14:34               ` Richard Biener
@ 2022-08-16 10:24                 ` Andre Vieira (lists)
  2022-08-17 12:49                   ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-08-16 10:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7989 bytes --]

Hi,

New version of the patch attached, but haven't recreated the ChangeLog 
yet, just waiting to see if this is what you had in mind. See also some 
replies to your comments in-line below:

On 09/08/2022 15:34, Richard Biener wrote:

> @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool
> aggressive_if_conv)
>     auto_vec<edge> critical_edges;
>
>     /* Loop is not well formed.  */
> -  if (num <= 2 || loop->inner || !single_exit (loop))
> +  if (num <= 2 || loop->inner)
>       return false;
>
>     body = get_loop_body (loop);
>
> this doesn't appear in the ChangeLog nor is it clear to me why it's
> needed?  Likewise
So both these and...
>
> -  /* Save BB->aux around loop_version as that uses the same field.  */
> -  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> -  void **saved_preds = XALLOCAVEC (void *, save_length);
> -  for (unsigned i = 0; i < save_length; i++)
> -    saved_preds[i] = ifc_bbs[i]->aux;
> +  void **saved_preds = NULL;
> +  if (any_complicated_phi || need_to_predicate)
> +    {
> +      /* Save BB->aux around loop_version as that uses the same field.
> */
> +      save_length = loop->inner ? loop->inner->num_nodes :
> loop->num_nodes;
> +      saved_preds = XALLOCAVEC (void *, save_length);
> +      for (unsigned i = 0; i < save_length; i++)
> +       saved_preds[i] = ifc_bbs[i]->aux;
> +    }
>
> is that just premature optimization?

.. these changes are to make sure we can still use the loop versioning 
code even for cases where there are bitfields to lower but no ifcvts 
(i.e. num of BBs <= 2).
I wasn't sure about the loop-inner condition and the small examples I 
tried it seemed to work, that is loop version seems to be able to handle 
nested loops.

The single_exit condition is still required for both, because the code 
to create the loop versions depends on it. It does look like I missed 
this in the ChangeLog...

> +  /* BITSTART and BITEND describe the region we can safely load from
> inside the
> +     structure.  BITPOS is the bit position of the value inside the
> +     representative that we will end up loading OFFSET bytes from the
> start
> +     of the struct.  BEST_MODE is the mode describing the optimal size of
> the
> +     representative chunk we load.  If this is a write we will store the
> same
> +     sized representative back, after we have changed the appropriate
> bits.  */
> +  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
>
> I think you need to give up when get_bit_range sets bitstart = bitend to
> zero
>
> +  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
> +                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
> +                    INT_MAX, false, &best_mode))
>
> +  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +                             NULL_TREE, rep_type);
> +  /* Load from the start of 'offset + bitpos % alignment'.  */
> +  uint64_t extra_offset = bitpos.to_constant ();
>
> you shouldn't build a new FIELD_DECL.  Either you use
> DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
> BIT_FIELD_REF accessing the "representative".
> DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
> a variable field offset, you can also subset that with an
> intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
> too large for your taste.
>
> I'm not sure all the offset calculation you do is correct, but
> since you shouldn't invent a new FIELD_DECL it probably needs
> to change anyway ...
I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some 
offset calculation/extraction. It's easier to example with an example:

In vect-bitfield-read-3.c the struct:
typedef struct {
     int  c;
     int  b;
     bool a : 1;
} struct_t;

and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a 
DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is 
also 8 as expected). However, the DECL_FIELD_OFFSET of either the 
original field decl, the actual bitfield member, or the 
DECL_BIT_FIELD_REPRESENTATIVE is 0 and the DECL_FIELD_BIT_OFFSET is 64. 
These will lead to the correct load:
_1 = vect_false[i].D;

D here being the representative is an 8-bit load from vect_false[i] + 
64bits. So all good there. However, when we construct BIT_FIELD_REF we 
can't simply use DECL_FIELD_BIT_OFFSET (field_decl) as the 
BIT_FIELD_REF's bitpos.  During `verify_gimple` it checks that bitpos + 
bitsize < TYPE_SIZE (TREE_TYPE (load)) where BIT_FIELD_REF (load, 
bitsize, bitpos).

So instead I change bitpos such that:
align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
bitpos -= bitpos.to_constant () / align_of_representative * 
align_of_representative;

I've now rewritten this to:
poly_int64 q,r;
if (can_trunc_div_p(bitpos, align_of_representative, &q, &r))
   bitpos = r;

It makes it slightly clearer, also because I no longer need the changes 
to the original tree offset as I'm just using D for the load.
> Note that for optimization it will be important that all
> accesses to the bitfield members of the same bitfield use the
> same underlying area (CSE and store-forwarding will thank you).
>
> +
> +  need_to_lower_bitfields = bitfields_to_lower_p (loop,
> &bitfields_to_lower);
> +  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
> +      && !need_to_lower_bitfields)
>       goto cleanup;
>
> so we lower bitfields even when we cannot split critical edges?
> why?
>
> +  need_to_ifcvt
> +    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
> +  if (!need_to_ifcvt && !need_to_lower_bitfields)
>       goto cleanup;
>
> likewise - if_convertible_loop_p performs other checks, the only
> one we want to elide is the loop->num_nodes <= 2 check since
> we want to lower bitfields in single-block loops as well.  That
> means we only have to scan for bitfield accesses in the first
> block "prematurely".  So I would interwind the need_to_lower_bitfields
> into if_convertible_loop_p and if_convertible_loop_p_1 and
> put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.
I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' 
(new) and 'need_to_lower_bitfields' separate. One thing I did change is 
that we no longer check for bitfields to lower if there are if-stmts 
that we can't lower, since we will not be vectorizing this loop anyway 
so no point in wasting time lowering bitfields. At the same time though, 
I'd like to be able to lower-bitfields if there are no ifcvts.
> +  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
> +    {
> +      pattern_stmt
> +       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
> +                              NOP_EXPR, lhs);
> +      lhs = gimple_get_lhs (pattern_stmt);
> +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> +    }
>
> hm - so you have for example
>
>   int _1 = MEM;
>   int:3 _2 = BIT_FIELD_REF <_1, ...>
>   type _3 = (type) _2;
>
> and that _3 = (type) _2 is because of integer promotion and you
> perform all the shifting in that type.  I suppose you should
> verify that the cast is indeed promoting, not narrowing, since
> otherwise you'll produce wrong code?  That said, shouldn't you
> perform the shift / mask in the type of _1 instead?  (the hope
> is, of course, that typeof (_1) == type in most cases)
>
> Similar comments apply to vect_recog_bit_insert_pattern.
Good shout, hadn't realized that yet because of how the testcases didn't 
have that problem, but when using the REPRESENTATIVE macro they do test 
that now. I don't think the bit_insert is a problem though. In 
bit_insert, 'value' always has the relevant bits starting at its LSB. So 
regardless of whether the load (and store) type is larger or smaller 
than the type, performing the shifts and masks in this type should be OK 
as you'll only be 'cutting off' the MSB's which would be the ones that 
would get truncated anyway? Or am missing something here?

[-- Attachment #2: vect_bitfield2.patch --]
[-- Type: text/plain, Size: 29509 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..f450dbb1922586b3d405281f605fb0d8a7fc8fc2 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3274,196 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, poly_int64 *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_get_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  if (bitpos)
+    *bitpos = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl));
+
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  poly_int64 bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* BITPOS represents the position of the first bit of the bitfield field we
+     are accessing.  However, sometimes it can be from the start of the struct,
+     and sometimes from the start of the representative we are loading.  For
+     the first, the following code will adapt BITPOS to the latter since that
+     is the value BIT_FIELD_REF is expecting as bitposition.  For the latter
+     this should no effect.  */
+  HOST_WIDE_INT q;
+  poly_int64 r;
+  poly_int64 rep_align = TYPE_ALIGN (rep_type);
+  if (can_div_trunc_p (bitpos, rep_align, &q, &r))
+    bitpos = r;
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos_tree), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos_tree);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_get_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  vec <gassign *> reads_to_lower;
+  vec <gassign *> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
+  reads_to_lower.create (4);
+  writes_to_lower.create (4);
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3501,30 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
-    goto cleanup;
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+      if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+	goto cleanup;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						  writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3535,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3576,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3380,6 +3627,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   todo |= TODO_cleanup_cfg;
 
  cleanup:
+  reads_to_lower.release ();
+  writes_to_lower.release ();
   if (ifc_bbs)
     {
       unsigned int i;
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..5486aa72a33274db954abf275c2c30dae3accc1c 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1829,206 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type) _2;
+
+   where type is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = (type) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 << bitsize) - 1);
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt))
+      || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME)
+    return NULL;
+
+  gassign *bf_stmt
+    = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt)));
+
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    return NULL;
+
+  gimple *pattern_stmt;
+  tree ret_type = TREE_TYPE (gimple_assign_lhs (nop_stmt));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_i = bit_field_size (bf_ref).to_constant ();
+  tree mask = build_int_cst (TREE_TYPE (lhs),
+			     ((1ULL << mask_i) - 1) << shift_n);
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			   BIT_AND_EXPR, lhs, mask);
+  lhs = gimple_get_lhs (pattern_stmt);
+  if (shift_n)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			      get_vectype_for_scalar_type (vinfo,
+							   TREE_TYPE (lhs)));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL),
+			       RSHIFT_EXPR, lhs,
+			       build_int_cst (sizetype, shift_n));
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 & mask;		    // Clearing of the non-relevant bits in the
+				    // 'to-write value'.
+   patt2 = patt1 << bitpos;	    // Shift the cleaned value in to place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write to,
+				    // from the value we want to write to.
+   _3 = patt3 | patt2;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  if (!INTEGRAL_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), load_type))
+    {
+      value = fold_build1 (NOP_EXPR, load_type, value);
+      if (!CONSTANT_CLASS_P (value))
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+				   value);
+	  value = gimple_get_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+	}
+    }
+
+  unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1;
+  tree mask_t = build_int_cst (load_type, mask_i);
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   fold_build2 (BIT_AND_EXPR, load_type, value,
+					mask_t));
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+  /* Mask off the bits in the loaded value.  */
+  mask_i <<= shift_n;
+  mask_i = ~mask_i;
+  mask_t = build_int_cst (load_type, mask_i);
+
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5824,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-16 10:24                 ` Andre Vieira (lists)
@ 2022-08-17 12:49                   ` Richard Biener
  2022-08-25  9:09                     ` Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-08-17 12:49 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

On Tue, 16 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> New version of the patch attached, but haven't recreated the ChangeLog yet,
> just waiting to see if this is what you had in mind. See also some replies to
> your comments in-line below:
> 
> On 09/08/2022 15:34, Richard Biener wrote:
> 
> > @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool
> > aggressive_if_conv)
> >     auto_vec<edge> critical_edges;
> >
> >     /* Loop is not well formed.  */
> > -  if (num <= 2 || loop->inner || !single_exit (loop))
> > +  if (num <= 2 || loop->inner)
> >       return false;
> >
> >     body = get_loop_body (loop);
> >
> > this doesn't appear in the ChangeLog nor is it clear to me why it's
> > needed?  Likewise
> So both these and...
> >
> > -  /* Save BB->aux around loop_version as that uses the same field.  */
> > -  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
> > -  void **saved_preds = XALLOCAVEC (void *, save_length);
> > -  for (unsigned i = 0; i < save_length; i++)
> > -    saved_preds[i] = ifc_bbs[i]->aux;
> > +  void **saved_preds = NULL;
> > +  if (any_complicated_phi || need_to_predicate)
> > +    {
> > +      /* Save BB->aux around loop_version as that uses the same field.
> > */
> > +      save_length = loop->inner ? loop->inner->num_nodes :
> > loop->num_nodes;
> > +      saved_preds = XALLOCAVEC (void *, save_length);
> > +      for (unsigned i = 0; i < save_length; i++)
> > +       saved_preds[i] = ifc_bbs[i]->aux;
> > +    }
> >
> > is that just premature optimization?
> 
> .. these changes are to make sure we can still use the loop versioning code
> even for cases where there are bitfields to lower but no ifcvts (i.e. num of
> BBs <= 2).
> I wasn't sure about the loop-inner condition and the small examples I tried it
> seemed to work, that is loop version seems to be able to handle nested loops.
> 
> The single_exit condition is still required for both, because the code to
> create the loop versions depends on it. It does look like I missed this in the
> ChangeLog...
> 
> > +  /* BITSTART and BITEND describe the region we can safely load from
> > inside the
> > +     structure.  BITPOS is the bit position of the value inside the
> > +     representative that we will end up loading OFFSET bytes from the
> > start
> > +     of the struct.  BEST_MODE is the mode describing the optimal size of
> > the
> > +     representative chunk we load.  If this is a write we will store the
> > same
> > +     sized representative back, after we have changed the appropriate
> > bits.  */
> > +  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);
> >
> > I think you need to give up when get_bit_range sets bitstart = bitend to
> > zero
> >
> > +  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
> > +                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
> > +                    INT_MAX, false, &best_mode))
> >
> > +  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +                             NULL_TREE, rep_type);
> > +  /* Load from the start of 'offset + bitpos % alignment'.  */
> > +  uint64_t extra_offset = bitpos.to_constant ();
> >
> > you shouldn't build a new FIELD_DECL.  Either you use
> > DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
> > BIT_FIELD_REF accessing the "representative".
> > DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
> > a variable field offset, you can also subset that with an
> > intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
> > too large for your taste.
> >
> > I'm not sure all the offset calculation you do is correct, but
> > since you shouldn't invent a new FIELD_DECL it probably needs
> > to change anyway ...
> I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some offset
> calculation/extraction. It's easier to example with an example:
> 
> In vect-bitfield-read-3.c the struct:
> typedef struct {
>     int  c;
>     int  b;
>     bool a : 1;
> } struct_t;
> 
> and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a
> DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is also 8
> as expected). However, the DECL_FIELD_OFFSET of either the original field
> decl, the actual bitfield member, or the DECL_BIT_FIELD_REPRESENTATIVE is 0
> and the DECL_FIELD_BIT_OFFSET is 64. These will lead to the correct load:
> _1 = vect_false[i].D;
> 
> D here being the representative is an 8-bit load from vect_false[i] + 64bits.
> So all good there. However, when we construct BIT_FIELD_REF we can't simply
> use DECL_FIELD_BIT_OFFSET (field_decl) as the BIT_FIELD_REF's bitpos.  During
> `verify_gimple` it checks that bitpos + bitsize < TYPE_SIZE (TREE_TYPE (load))
> where BIT_FIELD_REF (load, bitsize, bitpos).

Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
access - that's the offset within the representative (by construction
both fields share DECL_FIELD_OFFSET).

> So instead I change bitpos such that:
> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
> bitpos -= bitpos.to_constant () / align_of_representative *
> align_of_representative;

?  Not sure why alignment comes into play here?

> I've now rewritten this to:
> poly_int64 q,r;
> if (can_trunc_div_p(bitpos, align_of_representative, &q, &r))
>   bitpos = r;
> 
> It makes it slightly clearer, also because I no longer need the changes to the
> original tree offset as I'm just using D for the load.
>
> > Note that for optimization it will be important that all
> > accesses to the bitfield members of the same bitfield use the
> > same underlying area (CSE and store-forwarding will thank you).
> >
> > +
> > +  need_to_lower_bitfields = bitfields_to_lower_p (loop,
> > &bitfields_to_lower);
> > +  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
> > +      && !need_to_lower_bitfields)
> >       goto cleanup;
> >
> > so we lower bitfields even when we cannot split critical edges?
> > why?
> >
> > +  need_to_ifcvt
> > +    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
> > +  if (!need_to_ifcvt && !need_to_lower_bitfields)
> >       goto cleanup;
> >
> > likewise - if_convertible_loop_p performs other checks, the only
> > one we want to elide is the loop->num_nodes <= 2 check since
> > we want to lower bitfields in single-block loops as well.  That
> > means we only have to scan for bitfield accesses in the first
> > block "prematurely".  So I would interwind the need_to_lower_bitfields
> > into if_convertible_loop_p and if_convertible_loop_p_1 and
> > put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.
> I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' (new)
> and 'need_to_lower_bitfields' separate. One thing I did change is that we no
> longer check for bitfields to lower if there are if-stmts that we can't lower,
> since we will not be vectorizing this loop anyway so no point in wasting time
> lowering bitfields. At the same time though, I'd like to be able to
> lower-bitfields if there are no ifcvts.

Sure.

> > +  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
> > +    {
> > +      pattern_stmt
> > +       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
> > +                              NOP_EXPR, lhs);
> > +      lhs = gimple_get_lhs (pattern_stmt);
> > +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> > +    }
> >
> > hm - so you have for example
> >
> >   int _1 = MEM;
> >   int:3 _2 = BIT_FIELD_REF <_1, ...>
> >   type _3 = (type) _2;
> >
> > and that _3 = (type) _2 is because of integer promotion and you
> > perform all the shifting in that type.  I suppose you should
> > verify that the cast is indeed promoting, not narrowing, since
> > otherwise you'll produce wrong code?  That said, shouldn't you
> > perform the shift / mask in the type of _1 instead?  (the hope
> > is, of course, that typeof (_1) == type in most cases)
> >
> > Similar comments apply to vect_recog_bit_insert_pattern.
> Good shout, hadn't realized that yet because of how the testcases didn't have
> that problem, but when using the REPRESENTATIVE macro they do test that now. I
> don't think the bit_insert is a problem though. In bit_insert, 'value' always
> has the relevant bits starting at its LSB. So regardless of whether the load
> (and store) type is larger or smaller than the type, performing the shifts and
> masks in this type should be OK as you'll only be 'cutting off' the MSB's
> which would be the ones that would get truncated anyway? Or am missing
> something here?

Not sure what you are saying but "yes", all shifting and masking should
happen in the type of the representative.

+  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);

for your convenience there's bitsize_int (bitpos) you can use.

I don't think you are using the correct bitpos though, you fail to
adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.

+                        build_int_cst (bitsizetype, TYPE_PRECISION 
(bf_type)),

the size of the bitfield reference is DECL_SIZE of the original
FIELD_DECL - it might be bigger than the precision of its type.
You probably want to double-check it's equal to the precision
(because of the insert but also because of all the masking) and
refuse to lower if not.

+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill 
TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+                     vec <gassign *> &reads_to_lower,
+                     vec <gassign *> &writes_to_lower)
+{
+  basic_block *bbs = get_loop_body (loop);
+  gimple_stmt_iterator gsi;

as said I'd prefer to do this walk as part of the other walks we
already do - if and if only because get_loop_body () is a DFS
walk over the loop body (you should at least share that).

+         gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+         if (!stmt)
+           continue;
+
+         tree op = gimple_get_lhs (stmt);

gimple_assign_lhs (stmt)

+         bool write = TREE_CODE (op) == COMPONENT_REF;
+
+         if (!write)
+           op = gimple_assign_rhs1 (stmt);
+
+         if (TREE_CODE (op) != COMPONENT_REF)
+           continue;
+
+         if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+           {

rumors say that at least with Ada you can have non-integral, maybe
even aggregate "bitfields", so please add

  && INTEGRAL_TYPE_P (TREE_TYPE (op))

@@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple 
*> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  vec <gassign *> reads_to_lower;
+  vec <gassign *> writes_to_lower;
   bitmap exit_bbs;

you should be able to use auto_vec<> here

  again:
+  reads_to_lower.create (4);
+  writes_to_lower.create (4);

I think repeated .create will not release what is there.  With
auto_vec<> above there's no need to .create, just do
truncate (0) here.

+  tree mask = build_int_cst (TREE_TYPE (lhs),
+                            ((1ULL << mask_i) - 1) << shift_n);

please use wide_int_to_tree (TREE_TYPE (lhs),
                             wi::shifted_mask (shift_n, mask_i, false
  , TYPE_PRECISION (TREE_TYPE (lhs)));

1ULL would better be (unsigned HOST_WIDE_INT)1 or HOST_WIDE_INT_1U.
But note the representative could be __int128_t where uint64_t
mask operations fall apart...

Btw, instead of (val & mask) >> shift it might be better to use
(val >> shift) & mask since the resulting mask values are "smaller"
and maybe easier to code generate?

+   patt1 = _2 & mask;              // Clearing of the non-relevant bits 
in the
+                                   // 'to-write value'.
+   patt2 = patt1 << bitpos;        // Shift the cleaned value in to 
place.
+   patt3 = _1 & ~(mask << bitpos);  // Clearing the bits we want to write 
to,

same here, shifting patt1 first and then masking allows to just
invert the mask (or use andn), no need for two different (constant)
masks?

+      value = fold_build1 (NOP_EXPR, load_type, value);

fold_convert (load_type, value)

+      if (!CONSTANT_CLASS_P (value))
+       {
+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (load_type, 
NULL),
+                                  value);
+         value = gimple_get_lhs (pattern_stmt);

there's in principle

     gimple_seq stmts = NULL;
     value = gimple_convert (&stmts, load_type, value);
     if (!gimple_seq_empty_p (stmts))
       {
         pattern_stmt = gimple_seq_first_stmt (stmts);
         append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
       }

though a append_pattern_def_seq helper to add a convenience sequence
would be nice to have here.

+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+                          fold_build2 (BIT_AND_EXPR, load_type, value,
+                                       mask_t));

please avoid building GENERIC and then gimple from it.  Either use

  gimple_build_assing (..., BIT_AND_EXPR, load_type, value, mask_t);

or, if you want to fold, use

  result_value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, 
mask_t);

as above with gimple_convert.  See my comment about the nice to have
helper so you can block-process the 'stmts' sequence as pattern
def sequence.

+  mask_i <<= shift_n;
+  mask_i = ~mask_i;

you have to use wide_ints again, a HOST_WIDE_INT might not be
large enough.

You probably want to double-check your lowering code by
bootstrapping / testing with -ftree-loop-if-convert.

Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-17 12:49                   ` Richard Biener
@ 2022-08-25  9:09                     ` Andre Vieira (lists)
  2022-09-08  9:07                       ` Andre Vieira (lists)
  2022-09-08 11:51                       ` Richard Biener
  0 siblings, 2 replies; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-08-25  9:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4101 bytes --]


On 17/08/2022 13:49, Richard Biener wrote:
> Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
> of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
> access - that's the offset within the representative (by construction
> both fields share DECL_FIELD_OFFSET).
Doh! That makes sense...
>> So instead I change bitpos such that:
>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
>> bitpos -= bitpos.to_constant () / align_of_representative *
>> align_of_representative;
> ?  Not sure why alignment comes into play here?
Yeah just forget about this... it was my ill attempt at basically doing 
what you described above.
> Not sure what you are saying but "yes", all shifting and masking should
> happen in the type of the representative.
>
> +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
>
> for your convenience there's bitsize_int (bitpos) you can use.
>
> I don't think you are using the correct bitpos though, you fail to
> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
Not sure I understand what you mean? I do adjust it, I've changed it now 
so it should hopefully be clearer.
>
> +                        build_int_cst (bitsizetype, TYPE_PRECISION
> (bf_type)),
>
> the size of the bitfield reference is DECL_SIZE of the original
> FIELD_DECL - it might be bigger than the precision of its type.
> You probably want to double-check it's equal to the precision
> (because of the insert but also because of all the masking) and
> refuse to lower if not.
I added a check for this but out of curiosity, how can the DECL_SIZE of 
a bitfield FIELD_DECL be different than it's type's precision?
>
> +/* Return TRUE if there are bitfields to lower in this LOOP.  Fill
> TO_LOWER
> +   with data structures representing these bitfields.  */
> +
> +static bool
> +bitfields_to_lower_p (class loop *loop,
> +                     vec <gassign *> &reads_to_lower,
> +                     vec <gassign *> &writes_to_lower)
> +{
> +  basic_block *bbs = get_loop_body (loop);
> +  gimple_stmt_iterator gsi;
>
> as said I'd prefer to do this walk as part of the other walks we
> already do - if and if only because get_loop_body () is a DFS
> walk over the loop body (you should at least share that).
I'm now sharing the use of ifc_bbs. The reason why I'd rather not share 
the walk over them is because it becomes quite complex to split out the 
decision to not lower if's because there are none, for which we will 
still want to lower bitfields, versus not lowering if's when they are 
there but aren't lowerable at which point we will forego lowering 
bitfields since we will not vectorize this loop anyway.
>
> +      value = fold_build1 (NOP_EXPR, load_type, value);
>
> fold_convert (load_type, value)
>
> +      if (!CONSTANT_CLASS_P (value))
> +       {
> +         pattern_stmt
> +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
> NULL),
> +                                  value);
> +         value = gimple_get_lhs (pattern_stmt);
>
> there's in principle
>
>       gimple_seq stmts = NULL;
>       value = gimple_convert (&stmts, load_type, value);
>       if (!gimple_seq_empty_p (stmts))
>         {
>           pattern_stmt = gimple_seq_first_stmt (stmts);
>           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>         }
>
> though a append_pattern_def_seq helper to add a convenience sequence
> would be nice to have here.
Ended up using the existing 'vect_convert_input', seems to do nicely here.
> You probably want to double-check your lowering code by
> bootstrapping / testing with -ftree-loop-if-convert.
Done, this lead me to find a new failure mode, where the type of the 
first operand of BIT_FIELD_REF was a FP type (TF mode), which then lead 
to failures when constructing the masking and shifting. I ended up 
adding a nop-conversion to an INTEGER type of the same width first if 
necessary. Also did a follow-up bootstrap with the addition of 
`-ftree-vectorize` and `-fno-vect-cost-model` to further test the 
codegen. All seems to be working on aarch64-linux-gnu.

[-- Attachment #2: vect_bitfield3.patch --]
[-- Type: text/plain, Size: 33438 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..c5c6d937a645e9caa0092c941c52c5192363bbd7 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,200 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE (field_decl));
+  if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt),
+						     new_val));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3469,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3494,40 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						  writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3538,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3579,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3643,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..731b7c2bc1962ff22288c4439679c0b11232cb4a 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -1828,6 +1830,294 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   _2 = BIT_FIELD_REF (_1, bitsize, bitpos);
+   _3 = (type_out) _2;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   _3 = (type_out) _2;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   patt1 = (type_out) _1;
+   patt2 = patt1 & (((1 << bitsize) - 1) << bitpos);
+   _3 = patt2 >> bitpos;
+
+   Widening with shift first, mask last:
+   patt1 = (type_out) _1;
+   patt2 = patt1 >> bitpos;
+   _3 = patt2 & ((1 <<bitsize) - 1);
+
+   Narrowing:
+   patt1 = _1 & (((1 << bitsize) - 1) << bitpos);
+   patt2 = patt1 >> bitpos;
+   _3 = (type_out) patt2;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN
+	  || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF)
+	return NULL;
+      bf_stmt = static_cast <gassign *> (second_stmt);
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    {
+      tree int_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE (lhs)),
+					  true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type, NULL),
+			       NOP_EXPR, lhs);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      lhs = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned int prec = TYPE_PRECISION (TREE_TYPE (lhs));
+  if (shift_first)
+    {
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							    NULL),
+				   RSHIFT_EXPR, lhs,
+				   build_int_cst (sizetype, shift_n));
+	  lhs = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (TREE_TYPE (lhs),
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							NULL),
+			       BIT_AND_EXPR, lhs, mask);
+      lhs = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (TREE_TYPE (lhs),
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							NULL),
+			       BIT_AND_EXPR, lhs, mask);
+      lhs = gimple_assign_lhs (pattern_stmt);
+      if (shift_n)
+	{
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs),
+							    NULL),
+				   RSHIFT_EXPR, lhs,
+				   build_int_cst (sizetype, shift_n));
+	  lhs = gimple_assign_lhs (pattern_stmt);
+	}
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+			       NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   _3 = BIT_INSERT_EXPR (_1, _2, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   patt1 = _2 << bitpos;	      // Shift value into place
+   patt2 = patt1 & (mask << bitpos);  // Clearing of the non-relevant bits in the
+				      // 'to-write value'.
+   patt3 = _1 & ~(mask << bitpos);    // Clearing the bits we want to write to,
+				      // from the value we want to write to.
+   _3 = patt3 | patt2;		      // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree load = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree offset = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree load_type = TREE_TYPE (load);
+
+  if (!INTEGRAL_TYPE_P (load_type))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, load_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo, load_type));
+
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset);
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			       LSHIFT_EXPR, value, offset);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned int prec = TYPE_PRECISION (load_type);
+  tree mask_t
+    = wide_int_to_tree (load_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from value and shift it in place.  */
+  gimple_seq stmts = NULL;
+  value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the loaded value.  */
+  mask_t = wide_int_to_tree (load_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree lhs = vect_recog_temp_ssa_var (load_type, NULL);
+  pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Compose the value to write back.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL),
+			   BIT_IOR_EXPR, lhs, value);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5913,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-25  9:09                     ` Andre Vieira (lists)
@ 2022-09-08  9:07                       ` Andre Vieira (lists)
  2022-09-08 11:51                       ` Richard Biener
  1 sibling, 0 replies; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-09-08  9:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

Ping.

On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 17/08/2022 13:49, Richard Biener wrote:
>> Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
>> of the representative from DECL_FIELD_BIT_OFFSET of the original 
>> bitfield
>> access - that's the offset within the representative (by construction
>> both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
>>> So instead I change bitpos such that:
>>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
>>> bitpos -= bitpos.to_constant () / align_of_representative *
>>> align_of_representative;
>> ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically 
> doing what you described above.
>> Not sure what you are saying but "yes", all shifting and masking should
>> happen in the type of the representative.
>>
>> +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
>>
>> for your convenience there's bitsize_int (bitpos) you can use.
>>
>> I don't think you are using the correct bitpos though, you fail to
>> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it 
> now so it should hopefully be clearer.
>>
>> +                        build_int_cst (bitsizetype, TYPE_PRECISION
>> (bf_type)),
>>
>> the size of the bitfield reference is DECL_SIZE of the original
>> FIELD_DECL - it might be bigger than the precision of its type.
>> You probably want to double-check it's equal to the precision
>> (because of the insert but also because of all the masking) and
>> refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE 
> of a bitfield FIELD_DECL be different than it's type's precision?
>>
>> +/* Return TRUE if there are bitfields to lower in this LOOP. Fill
>> TO_LOWER
>> +   with data structures representing these bitfields.  */
>> +
>> +static bool
>> +bitfields_to_lower_p (class loop *loop,
>> +                     vec <gassign *> &reads_to_lower,
>> +                     vec <gassign *> &writes_to_lower)
>> +{
>> +  basic_block *bbs = get_loop_body (loop);
>> +  gimple_stmt_iterator gsi;
>>
>> as said I'd prefer to do this walk as part of the other walks we
>> already do - if and if only because get_loop_body () is a DFS
>> walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not 
> share the walk over them is because it becomes quite complex to split 
> out the decision to not lower if's because there are none, for which 
> we will still want to lower bitfields, versus not lowering if's when 
> they are there but aren't lowerable at which point we will forego 
> lowering bitfields since we will not vectorize this loop anyway.
>>
>> +      value = fold_build1 (NOP_EXPR, load_type, value);
>>
>> fold_convert (load_type, value)
>>
>> +      if (!CONSTANT_CLASS_P (value))
>> +       {
>> +         pattern_stmt
>> +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
>> NULL),
>> +                                  value);
>> +         value = gimple_get_lhs (pattern_stmt);
>>
>> there's in principle
>>
>>       gimple_seq stmts = NULL;
>>       value = gimple_convert (&stmts, load_type, value);
>>       if (!gimple_seq_empty_p (stmts))
>>         {
>>           pattern_stmt = gimple_seq_first_stmt (stmts);
>>           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>>         }
>>
>> though a append_pattern_def_seq helper to add a convenience sequence
>> would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely 
> here.
>> You probably want to double-check your lowering code by
>> bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the 
> first operand of BIT_FIELD_REF was a FP type (TF mode), which then 
> lead to failures when constructing the masking and shifting. I ended 
> up adding a nop-conversion to an INTEGER type of the same width first 
> if necessary. Also did a follow-up bootstrap with the addition of 
> `-ftree-vectorize` and `-fno-vect-cost-model` to further test the 
> codegen. All seems to be working on aarch64-linux-gnu.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-08-25  9:09                     ` Andre Vieira (lists)
  2022-09-08  9:07                       ` Andre Vieira (lists)
@ 2022-09-08 11:51                       ` Richard Biener
  2022-09-26 15:23                         ` Andre Vieira (lists)
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-09-08 11:51 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

On Thu, 25 Aug 2022, Andre Vieira (lists) wrote:

> 
> On 17/08/2022 13:49, Richard Biener wrote:
> > Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
> > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
> > access - that's the offset within the representative (by construction
> > both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
> >> So instead I change bitpos such that:
> >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
> >> bitpos -= bitpos.to_constant () / align_of_representative *
> >> align_of_representative;
> > ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically doing what
> you described above.
> > Not sure what you are saying but "yes", all shifting and masking should
> > happen in the type of the representative.
> >
> > +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
> >
> > for your convenience there's bitsize_int (bitpos) you can use.
> >
> > I don't think you are using the correct bitpos though, you fail to
> > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it now so it
> should hopefully be clearer.
> >
> > +                        build_int_cst (bitsizetype, TYPE_PRECISION
> > (bf_type)),
> >
> > the size of the bitfield reference is DECL_SIZE of the original
> > FIELD_DECL - it might be bigger than the precision of its type.
> > You probably want to double-check it's equal to the precision
> > (because of the insert but also because of all the masking) and
> > refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE of a
> bitfield FIELD_DECL be different than it's type's precision?

It's probably not possible to create a C testcase but I don't see
what makes this impossible in general to have padding in a bitfield 
object.

> >
> > +/* Return TRUE if there are bitfields to lower in this LOOP.  Fill
> > TO_LOWER
> > +   with data structures representing these bitfields.  */
> > +
> > +static bool
> > +bitfields_to_lower_p (class loop *loop,
> > +                     vec <gassign *> &reads_to_lower,
> > +                     vec <gassign *> &writes_to_lower)
> > +{
> > +  basic_block *bbs = get_loop_body (loop);
> > +  gimple_stmt_iterator gsi;
> >
> > as said I'd prefer to do this walk as part of the other walks we
> > already do - if and if only because get_loop_body () is a DFS
> > walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the
> walk over them is because it becomes quite complex to split out the decision
> to not lower if's because there are none, for which we will still want to
> lower bitfields, versus not lowering if's when they are there but aren't
> lowerable at which point we will forego lowering bitfields since we will not
> vectorize this loop anyway.
> >
> > +      value = fold_build1 (NOP_EXPR, load_type, value);
> >
> > fold_convert (load_type, value)
> >
> > +      if (!CONSTANT_CLASS_P (value))
> > +       {
> > +         pattern_stmt
> > +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
> > NULL),
> > +                                  value);
> > +         value = gimple_get_lhs (pattern_stmt);
> >
> > there's in principle
> >
> >       gimple_seq stmts = NULL;
> >       value = gimple_convert (&stmts, load_type, value);
> >       if (!gimple_seq_empty_p (stmts))
> >         {
> >           pattern_stmt = gimple_seq_first_stmt (stmts);
> >           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> >         }
> >
> > though a append_pattern_def_seq helper to add a convenience sequence
> > would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely here.
> > You probably want to double-check your lowering code by
> > bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the first
> operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures
> when constructing the masking and shifting. I ended up adding a nop-conversion
> to an INTEGER type of the same width first if necessary.

You want a VIEW_CONVERT (aka bit-cast) here.

> Also did a follow-up
> bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model`
> to further test the codegen. All seems to be working on aarch64-linux-gnu.

+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+                 tree *struct_expr)
...
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the 
BF's
+     precision.  */
+  unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE 
(field_decl));
+  if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size)
+    return NULL_TREE;

you can

use compare_tree_int (DECL_SIZE (field_decl), TYPE_PRECISION (...)) != 0

which avoids caring for the case the size isn't a uhwi ...

+      gimple *new_stmt = gimple_build_assign (unshare_expr 
(rep_comp_ref),
+                                             new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;

you can use gimple_move_vops (new_stmt, stmt); here

+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+                        build_int_cst (bitsizetype, TYPE_PRECISION 
(bf_type)),
+                        bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs 
(stmt),
+                                                    new_val));

I'm curious, why the push to redundant_ssa_names?  That could use
a comment ...

+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+                                                 writes_to_lower);

do we want to conditionalize this on flag_tree_loop_vectorize?  That is,
I think the lowering should for now happen only on the loop version
guarded by .IFN_VECTORIZED.  There's

  if ((need_to_predicate || any_complicated_phi)
      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize))
    goto cleanup;

for the cases that will force versioning, but I think we should
simply not lower bitfields in the

         ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize)

case?

+      if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN
+         || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF)
+       return NULL;

the first || goes to a new line

+      bf_stmt = static_cast <gassign *> (second_stmt);

"nicer" and shorter is

       bf_stmt = dyn_cast <gassign *> (second_stmt);
       if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
         return NULL;

+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);

just when reading, generic variables like 'lhs' are not helpful
(when they are not an actual lhs even less so ...).
You have nice docs ontop of the function - when you use
atual names for _2 = BIT_FIELD_REF (_1, ...) variables you can
even use them in the code so docs and code match up nicely.

+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, 
convert
+     it to one of the same width so we can perform the necessary masking 
and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    {
+      tree int_type
+       = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE 
(lhs)),
+                                         true);

so you probably run into this from code that's not lowered from
original bitfield reads?  Note you should use TYPE_SIZE here,
definitely not TYPE_PRECISION on arbitrary types (if its a vector
type then that will yield the number of units for example).

+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant 
();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant 
();

is there anything that prevents this to run on VLA vector extractions?
I think it would be nice to test constantness at the start of the
function.

+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE 
(lhs),
+                                                           NULL),

eh, seeing that multiple times the vect_recog_temp_ssa_var needs
a defaulted NULL second argument ...

Note I fear we will have endianess issues when translating
bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
for memory and register operations do not match up (IIRC, I repeatedly
run into issues here myself).  The testcases all look like they
won't catch this - I think an example would be sth like
struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
testing on a big-endian target?

Otherwise the patch looks good, so there's only minor things to
fix up (in case the endianess issue turns out to be a non-issue).

Sorry for the delay in reviewing.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-08 11:51                       ` Richard Biener
@ 2022-09-26 15:23                         ` Andre Vieira (lists)
  2022-09-27 12:34                           ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-09-26 15:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]


On 08/09/2022 12:51, Richard Biener wrote:
>
> I'm curious, why the push to redundant_ssa_names?  That could use
> a comment ...
So I purposefully left a #if 0 #else #endif in there so you can see the 
two options. But the reason I used redundant_ssa_names is because ifcvt 
seems to use that as a container for all pairs of (old, new) ssa names 
to replace later. So I just piggy backed on that. I don't know if 
there's a specific reason they do the replacement at the end? Maybe some 
ordering issue? Either way both adding it to redundant_ssa_names or 
doing the replacement inline work for the bitfield lowering (or work in 
my testing at least).
> Note I fear we will have endianess issues when translating
> bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
> for memory and register operations do not match up (IIRC, I repeatedly
> run into issues here myself).  The testcases all look like they
> won't catch this - I think an example would be sth like
> struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
> testing on a big-endian target?
I've done some testing and you were right, it did fall apart on 
big-endian. I fixed it by changing the way we compute the 'shift' value 
and added two extra testcases for read and write each.
>
> Sorry for the delay in reviewing.
No worries, apologies myself for the delay in reworking this, had a nice 
little week holiday in between :)

I'll write the ChangeLogs once the patch has stabilized.

Thanks,
Andre

[-- Attachment #2: vect_bitfield4.patch --]
[-- Type: text/plain, Size: 39805 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..ee6226b7bee713598141468de00728abff675e52 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,202 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+#if 0
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt),
+						     new_val));
+#else
+    replace_uses_by (gimple_assign_lhs (stmt), new_val);
+#endif
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-26 15:23                         ` Andre Vieira (lists)
@ 2022-09-27 12:34                           ` Richard Biener
  2022-09-28  9:43                             ` Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-09-27 12:34 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:

> 
> On 08/09/2022 12:51, Richard Biener wrote:
> >
> > I'm curious, why the push to redundant_ssa_names?  That could use
> > a comment ...
> So I purposefully left a #if 0 #else #endif in there so you can see the two
> options. But the reason I used redundant_ssa_names is because ifcvt seems to
> use that as a container for all pairs of (old, new) ssa names to replace
> later. So I just piggy backed on that. I don't know if there's a specific
> reason they do the replacement at the end? Maybe some ordering issue? Either
> way both adding it to redundant_ssa_names or doing the replacement inline work
> for the bitfield lowering (or work in my testing at least).

Possibly because we (in the past?) inserted/copied stuff based on
predicates generated at analysis time after we decide to elide something
so we need to watch for later appearing uses.  But who knows ... my mind
fails me here.

If it works to replace uses immediately please do so.  But now
I wonder why we need this - the value shouldn't change so you
should get away with re-using the existing SSA name for the final value?

> > Note I fear we will have endianess issues when translating
> > bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
> > for memory and register operations do not match up (IIRC, I repeatedly
> > run into issues here myself).  The testcases all look like they
> > won't catch this - I think an example would be sth like
> > struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
> > testing on a big-endian target?
> I've done some testing and you were right, it did fall apart on big-endian. I
> fixed it by changing the way we compute the 'shift' value and added two extra
> testcases for read and write each.
> >
> > Sorry for the delay in reviewing.
> No worries, apologies myself for the delay in reworking this, had a nice
> little week holiday in between :)
> 
> I'll write the ChangeLogs once the patch has stabilized.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-27 12:34                           ` Richard Biener
@ 2022-09-28  9:43                             ` Andre Vieira (lists)
  2022-09-28 17:31                               ` Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-09-28  9:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches


On 27/09/2022 13:34, Richard Biener wrote:
> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>
>> On 08/09/2022 12:51, Richard Biener wrote:
>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>> a comment ...
>> So I purposefully left a #if 0 #else #endif in there so you can see the two
>> options. But the reason I used redundant_ssa_names is because ifcvt seems to
>> use that as a container for all pairs of (old, new) ssa names to replace
>> later. So I just piggy backed on that. I don't know if there's a specific
>> reason they do the replacement at the end? Maybe some ordering issue? Either
>> way both adding it to redundant_ssa_names or doing the replacement inline work
>> for the bitfield lowering (or work in my testing at least).
> Possibly because we (in the past?) inserted/copied stuff based on
> predicates generated at analysis time after we decide to elide something
> so we need to watch for later appearing uses.  But who knows ... my mind
> fails me here.
>
> If it works to replace uses immediately please do so.  But now
> I wonder why we need this - the value shouldn't change so you
> should get away with re-using the existing SSA name for the final value?

Yeah... good point. A quick change and minor testing seems to agree. I'm sure I had a good reason to do it initially ;)

I'll run a full-regression on this change to make sure I didn't miss anything.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-28  9:43                             ` Andre Vieira (lists)
@ 2022-09-28 17:31                               ` Andre Vieira (lists)
  2022-09-29  7:54                                 ` Richard Biener
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-09-28 17:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

Made the change and also created the ChangeLogs.

gcc/ChangeLog:

         * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of 
loop bb's from here...
         (tree_if_conversion): ... to here.  Also call bitfield lowering 
when appropriate.
         (version_loop_for_if_conversion): Adapt to enable loop 
versioning when we only need
         to lower bitfields.
         (ifcvt_split_critical_edges): Relax condition of expected loop 
form as this is checked earlier.
         (get_bitfield_rep): New function.
         (lower_bitfield): Likewise.
         (bitfields_to_lower_p): Likewise.
         (need_to_lower_bitfields): New global boolean.
         (need_to_ifcvt): Likewise.
         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
Improve diagnostic message.
         * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default 
value for last parameter.
         (vect_recog_bitfield_ref_pattern): New.
         (vect_recog_bit_insert_pattern): New.

gcc/testsuite/ChangeLog:

         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
         * gcc.dg/vect/vect-bitfield-read-5.c: New test.
         * gcc.dg/vect/vect-bitfield-read-6.c: New test.
         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
         * gcc.dg/vect/vect-bitfield-write-3.c: New test.
         * gcc.dg/vect/vect-bitfield-write-4.c: New test.
         * gcc.dg/vect/vect-bitfield-write-5.c: New test.

On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
>
> On 27/09/2022 13:34, Richard Biener wrote:
>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>>
>>> On 08/09/2022 12:51, Richard Biener wrote:
>>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>>> a comment ...
>>> So I purposefully left a #if 0 #else #endif in there so you can see 
>>> the two
>>> options. But the reason I used redundant_ssa_names is because ifcvt 
>>> seems to
>>> use that as a container for all pairs of (old, new) ssa names to 
>>> replace
>>> later. So I just piggy backed on that. I don't know if there's a 
>>> specific
>>> reason they do the replacement at the end? Maybe some ordering 
>>> issue? Either
>>> way both adding it to redundant_ssa_names or doing the replacement 
>>> inline work
>>> for the bitfield lowering (or work in my testing at least).
>> Possibly because we (in the past?) inserted/copied stuff based on
>> predicates generated at analysis time after we decide to elide something
>> so we need to watch for later appearing uses.  But who knows ... my mind
>> fails me here.
>>
>> If it works to replace uses immediately please do so.  But now
>> I wonder why we need this - the value shouldn't change so you
>> should get away with re-using the existing SSA name for the final value?
>
> Yeah... good point. A quick change and minor testing seems to agree. 
> I'm sure I had a good reason to do it initially ;)
>
> I'll run a full-regression on this change to make sure I didn't miss 
> anything.
>

[-- Attachment #2: vect_bitfield5.patch --]
[-- Type: text/plain, Size: 39800 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..d13b2fa6661d56e911bb9ec37cd3a9885fa653bb 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3259,6 +3265,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+
+      gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3269,12 +3470,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3290,16 +3495,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3310,7 +3541,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3350,10 +3582,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3393,6 +3646,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5623,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-28 17:31                               ` Andre Vieira (lists)
@ 2022-09-29  7:54                                 ` Richard Biener
  2022-10-07 14:20                                   ` Andre Vieira (lists)
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Biener @ 2022-09-29  7:54 UTC (permalink / raw)
  To: Andre Vieira (lists)
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, gcc-patches

On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Made the change and also created the ChangeLogs.

OK if bootstrap / testing succeeds.

Thanks,
Richard.

> gcc/ChangeLog:
>
>          * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> loop bb's from here...
>          (tree_if_conversion): ... to here.  Also call bitfield lowering
> when appropriate.
>          (version_loop_for_if_conversion): Adapt to enable loop
> versioning when we only need
>          to lower bitfields.
>          (ifcvt_split_critical_edges): Relax condition of expected loop
> form as this is checked earlier.
>          (get_bitfield_rep): New function.
>          (lower_bitfield): Likewise.
>          (bitfields_to_lower_p): Likewise.
>          (need_to_lower_bitfields): New global boolean.
>          (need_to_ifcvt): Likewise.
>          * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> Improve diagnostic message.
>          * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> value for last parameter.
>          (vect_recog_bitfield_ref_pattern): New.
>          (vect_recog_bit_insert_pattern): New.
>
> gcc/testsuite/ChangeLog:
>
>          * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-5.c: New test.
>          * gcc.dg/vect/vect-bitfield-read-6.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-3.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-4.c: New test.
>          * gcc.dg/vect/vect-bitfield-write-5.c: New test.
>
> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> >
> > On 27/09/2022 13:34, Richard Biener wrote:
> >> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> >>
> >>> On 08/09/2022 12:51, Richard Biener wrote:
> >>>> I'm curious, why the push to redundant_ssa_names?  That could use
> >>>> a comment ...
> >>> So I purposefully left a #if 0 #else #endif in there so you can see
> >>> the two
> >>> options. But the reason I used redundant_ssa_names is because ifcvt
> >>> seems to
> >>> use that as a container for all pairs of (old, new) ssa names to
> >>> replace
> >>> later. So I just piggy backed on that. I don't know if there's a
> >>> specific
> >>> reason they do the replacement at the end? Maybe some ordering
> >>> issue? Either
> >>> way both adding it to redundant_ssa_names or doing the replacement
> >>> inline work
> >>> for the bitfield lowering (or work in my testing at least).
> >> Possibly because we (in the past?) inserted/copied stuff based on
> >> predicates generated at analysis time after we decide to elide something
> >> so we need to watch for later appearing uses.  But who knows ... my mind
> >> fails me here.
> >>
> >> If it works to replace uses immediately please do so.  But now
> >> I wonder why we need this - the value shouldn't change so you
> >> should get away with re-using the existing SSA name for the final value?
> >
> > Yeah... good point. A quick change and minor testing seems to agree.
> > I'm sure I had a good reason to do it initially ;)
> >
> > I'll run a full-regression on this change to make sure I didn't miss
> > anything.
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-09-29  7:54                                 ` Richard Biener
@ 2022-10-07 14:20                                   ` Andre Vieira (lists)
  2022-10-12  1:55                                     ` Hongtao Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Andre Vieira (lists) @ 2022-10-07 14:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4592 bytes --]

Hi,

Whilst running a bootstrap with extra options to force bitfield 
vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert 
-fno-vect-cost-model' I ran into an ICE in vect-patterns where a 
bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a 
E_BLKmode, which meant we failed to build an integer type with the same 
size. For that reason I added a check to bail out earlier if the 
TYPE_MODE of the container is indeed E_BLKmode. The pattern for the 
bitfield inserts required no change as we currently don't support 
containers that aren't integer typed.

Also changed a testcase because in BIG-ENDIAN it was not vectorizing due 
to a different size of container that wasn't supported.

This passes the same bootstrap and regressions on aarch64-none-linux and 
no regressions on aarch64_be-none-elf either.

I assume you are OK with these changes Richard, but I don't like to 
commit on Friday in case something breaks over the weekend, so I'll 
leave it until Monday.

Thanks,
Andre

On 29/09/2022 08:54, Richard Biener wrote:
> On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>> Made the change and also created the ChangeLogs.
> OK if bootstrap / testing succeeds.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
>> loop bb's from here...
>>           (tree_if_conversion): ... to here.  Also call bitfield lowering
>> when appropriate.
>>           (version_loop_for_if_conversion): Adapt to enable loop
>> versioning when we only need
>>           to lower bitfields.
>>           (ifcvt_split_critical_edges): Relax condition of expected loop
>> form as this is checked earlier.
>>           (get_bitfield_rep): New function.
>>           (lower_bitfield): Likewise.
>>           (bitfields_to_lower_p): Likewise.
>>           (need_to_lower_bitfields): New global boolean.
>>           (need_to_ifcvt): Likewise.
>>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
>> Improve diagnostic message.
>>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
>> value for last parameter.
>>           (vect_recog_bitfield_ref_pattern): New.
>>           (vect_recog_bit_insert_pattern): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
>>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
>>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
>>
>> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
>>> On 27/09/2022 13:34, Richard Biener wrote:
>>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
>>>>
>>>>> On 08/09/2022 12:51, Richard Biener wrote:
>>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
>>>>>> a comment ...
>>>>> So I purposefully left a #if 0 #else #endif in there so you can see
>>>>> the two
>>>>> options. But the reason I used redundant_ssa_names is because ifcvt
>>>>> seems to
>>>>> use that as a container for all pairs of (old, new) ssa names to
>>>>> replace
>>>>> later. So I just piggy backed on that. I don't know if there's a
>>>>> specific
>>>>> reason they do the replacement at the end? Maybe some ordering
>>>>> issue? Either
>>>>> way both adding it to redundant_ssa_names or doing the replacement
>>>>> inline work
>>>>> for the bitfield lowering (or work in my testing at least).
>>>> Possibly because we (in the past?) inserted/copied stuff based on
>>>> predicates generated at analysis time after we decide to elide something
>>>> so we need to watch for later appearing uses.  But who knows ... my mind
>>>> fails me here.
>>>>
>>>> If it works to replace uses immediately please do so.  But now
>>>> I wonder why we need this - the value shouldn't change so you
>>>> should get away with re-using the existing SSA name for the final value?
>>> Yeah... good point. A quick change and minor testing seems to agree.
>>> I'm sure I had a good reason to do it initially ;)
>>>
>>> I'll run a full-regression on this change to make sure I didn't miss
>>> anything.
>>>

[-- Attachment #2: vect_bitfield6.patch --]
[-- Type: text/plain, Size: 39899 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define N 32
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].i;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..849f4a017e1818eee4abd66385417a326c497696
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+#include <stdbool.h>
+
+extern void abort(void);
+
+typedef struct {
+    int  c;
+    int  b;
+    bool a : 1;
+    int  d : 31;
+} struct_t;
+
+#define N 16
+#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0, 0x7FFFFFFF }
+#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1, 0x7FFFFFFF }
+
+struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+struct_t vect_true[N]  = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F,
+			   ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F  };
+int main (void)
+{
+  unsigned ret = 0;
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_false[i].a;
+  }
+  if (ret)
+    abort ();
+
+  for (unsigned i = 0; i < N; i++)
+  {
+      ret |= vect_true[i].a;
+  }
+  if (!ret)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 3, 0}
+#define ELT1 {0x7FFFFFFFUL, 3, 1}
+#define ELT2 {0x7FFFFFFFUL, 3, 2}
+#define ELT3 {0x7FFFFFFFUL, 3, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].a;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 9;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned a : 23; unsigned b : 8;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFUL, 0}
+#define ELT1 {0x7FFFFFUL, 1}
+#define ELT2 {0x7FFFFFUL, 2}
+#define ELT3 {0x7FFFFFUL, 3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      res += ptr[i].b;
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].i = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].i != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char x : 2;
+    char a : 4;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 9;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned b : 23;
+    unsigned a : 8;
+};
+
+#define N 32
+#define V 5
+struct s A[N];
+
+void __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    for (int i = 0; i < n; ++i)
+      ptr[i].a = V;
+}
+
+void __attribute__ ((noipa))
+check_f(struct s *ptr) {
+    for (unsigned i = 0; i < N; ++i)
+      if (ptr[i].a != V)
+	abort ();
+}
+
+int main (void)
+{
+  check_vect ();
+  __builtin_memset (&A[0], 0, sizeof(struct s) * N);
+
+  f(&A[0], N);
+  check_f (&A[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index bac29fb557462f5d3193481ef180f1412e8bc639..e468a4659fa28a3a31c3390cf19bee65f4590b80 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "expmed.h"
+#include "expr.h"
 #include "optabs-query.h"
 #include "gimple-pretty-print.h"
 #include "alias.h"
@@ -123,6 +124,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 
+/* For lang_hooks.types.type_for_mode.  */
+#include "langhooks.h"
+
 /* Only handle PHIs with no more arguments unless we are asked to by
    simd pragma.  */
 #define MAX_PHI_ARG_NUM \
@@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined;
    before phi_convertible_by_degenerating_args.  */
 static bool any_complicated_phi;
 
+/* True if we have bitfield accesses we can lower.  */
+static bool need_to_lower_bitfields;
+
+/* True if there is any ifcvting to be done.  */
+static bool need_to_ifcvt;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
    free the memory.  */
 
@@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
-  /* Allow statements that can be handled during if-conversion.  */
-  ifc_bbs = get_loop_body_in_if_conv_order (loop);
-  if (!ifc_bbs)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Irreducible loop\n");
-      return false;
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -2899,18 +2900,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
   class loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
-  unsigned int save_length;
+  unsigned int save_length = 0;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2922,8 +2927,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (any_complicated_phi || need_to_predicate)
+    for (unsigned i = 0; i < save_length; i++)
+      ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
     return NULL;
@@ -2999,7 +3005,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (loop->inner)
     return false;
 
   body = get_loop_body (loop);
@@ -3260,6 +3266,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe)
   free (body);
 }
 
+/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its
+   type mode is not BLKmode.  If BITPOS is not NULL it will hold the poly_int64
+   value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR,
+   if not NULL, will hold the tree representing the base struct of this
+   bitfield.  */
+
+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+		  tree *struct_expr)
+{
+  tree comp_ref = write ? gimple_assign_lhs (stmt)
+			: gimple_assign_rhs1 (stmt);
+
+  tree field_decl = TREE_OPERAND (comp_ref, 1);
+  tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl);
+
+  /* Bail out if the representative is BLKmode as we will not be able to
+     vectorize this.  */
+  if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode)
+    return NULL_TREE;
+
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's
+     precision.  */
+  unsigned HOST_WIDE_INT bf_prec
+    = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt)));
+  if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0)
+    return NULL_TREE;
+
+  if (struct_expr)
+    *struct_expr = TREE_OPERAND (comp_ref, 0);
+
+  if (bitpos)
+    *bitpos
+      = fold_build2 (MINUS_EXPR, bitsizetype,
+		     DECL_FIELD_BIT_OFFSET (field_decl),
+		     DECL_FIELD_BIT_OFFSET (rep_decl));
+
+  return rep_decl;
+
+}
+
+/* Lowers the bitfield described by DATA.
+   For a write like:
+
+   struct.bf = _1;
+
+   lower to:
+
+   __ifc_1 = struct.<representative>;
+   __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos);
+   struct.<representative> = __ifc_2;
+
+   For a read:
+
+   _1 = struct.bf;
+
+    lower to:
+
+    __ifc_1 = struct.<representative>;
+    _1 =  BIT_FIELD_REF (__ifc_1, bitsize, bitpos);
+
+    where representative is a legal load that contains the bitfield value,
+    bitsize is the size of the bitfield and bitpos the offset to the start of
+    the bitfield within the representative.  */
+
+static void
+lower_bitfield (gassign *stmt, bool write)
+{
+  tree struct_expr;
+  tree bitpos;
+  tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr);
+  tree rep_type = TREE_TYPE (rep_decl);
+  tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Lowering:\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+      fprintf (dump_file, "to:\n");
+    }
+
+  /* REP_COMP_REF is a COMPONENT_REF for the representative.  NEW_VAL is it's
+     defining SSA_NAME.  */
+  tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl,
+			      NULL_TREE);
+  tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+  if (write)
+    {
+      new_val = ifc_temp_var (rep_type,
+			      build3 (BIT_INSERT_EXPR, rep_type, new_val,
+				      unshare_expr (gimple_assign_rhs1 (stmt)),
+				      bitpos), &gsi);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM);
+
+      gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+  else
+    {
+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+			 build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)),
+			 bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+
+      gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt),
+					      new_val);
+      gimple_move_vops (new_stmt, stmt);
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+
+  gsi_remove (&gsi, true);
+}
+
+/* Return TRUE if there are bitfields to lower in this LOOP.  Fill TO_LOWER
+   with data structures representing these bitfields.  */
+
+static bool
+bitfields_to_lower_p (class loop *loop,
+		      vec <gassign *> &reads_to_lower,
+		      vec <gassign *> &writes_to_lower)
+{
+  gimple_stmt_iterator gsi;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num);
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+    {
+      basic_block bb = ifc_bbs[i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi));
+	  if (!stmt)
+	    continue;
+
+	  tree op = gimple_assign_lhs (stmt);
+	  bool write = TREE_CODE (op) == COMPONENT_REF;
+
+	  if (!write)
+	    op = gimple_assign_rhs1 (stmt);
+
+	  if (TREE_CODE (op) != COMPONENT_REF)
+	    continue;
+
+	  if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1)))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+
+	      if (!INTEGRAL_TYPE_P (TREE_TYPE (op)))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NO OK to lower,"
+					" field type is not Integral.\n");
+		  return false;
+		}
+
+	      if (!get_bitfield_rep (stmt, write, NULL, NULL))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "\t Bitfield NOT OK to lower,"
+					" representative is BLKmode.\n");
+		  return false;
+		}
+
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "\tBitfield OK to lower.\n");
+	      if (write)
+		writes_to_lower.safe_push (stmt);
+	      else
+		reads_to_lower.safe_push (stmt);
+	    }
+	}
+    }
+  return !reads_to_lower.is_empty () || !writes_to_lower.is_empty ();
+}
+
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3270,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   unsigned int todo = 0;
   bool aggressive_if_conv;
   class loop *rloop;
+  auto_vec <gassign *, 4> reads_to_lower;
+  auto_vec <gassign *, 4> writes_to_lower;
   bitmap exit_bbs;
   edge pe;
 
  again:
   rloop = NULL;
   ifc_bbs = NULL;
+  need_to_lower_bitfields = false;
+  need_to_ifcvt = false;
   need_to_predicate = false;
   need_to_rewrite_undefined = false;
   any_complicated_phi = false;
@@ -3291,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  if (!single_exit (loop))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
-      || !dbg_cnt (if_conversion_tree))
+  /* If there are more than two BBs in the loop then there is at least one if
+     to convert.  */
+  if (loop->num_nodes > 2
+      && !ifcvt_split_critical_edges (loop, aggressive_if_conv))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
-      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	  || loop->dont_vectorize))
+  ifc_bbs = get_loop_body_in_if_conv_order (loop);
+  if (!ifc_bbs)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Irreducible loop\n");
+      goto cleanup;
+    }
+
+  if (loop->num_nodes > 2)
+    {
+      need_to_ifcvt = true;
+
+      if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree))
+	goto cleanup;
+
+      if ((need_to_predicate || any_complicated_phi)
+	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+	      || loop->dont_vectorize))
+	goto cleanup;
+    }
+
+  if ((flag_tree_loop_vectorize || loop->force_vectorize)
+      && !loop->dont_vectorize)
+    need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+						    writes_to_lower);
+
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;
 
   /* The edge to insert invariant stmts on.  */
@@ -3311,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
      still if-convert the original inner loop.  */
-  if (need_to_predicate
+  if (need_to_lower_bitfields
+      || need_to_predicate
       || any_complicated_phi
       || flag_tree_loop_if_convert != 1)
     {
@@ -3351,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	pe = single_pred_edge (gimple_bb (preds->last ()));
     }
 
-  /* Now all statements are if-convertible.  Combine all the basic
-     blocks into one huge basic block doing the if-conversion
-     on-the-fly.  */
-  combine_blocks (loop);
+  if (need_to_lower_bitfields)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "-------------------------\n");
+	  fprintf (dump_file, "Start lowering bitfields\n");
+	}
+      while (!reads_to_lower.is_empty ())
+	lower_bitfield (reads_to_lower.pop (), false);
+      while (!writes_to_lower.is_empty ())
+	lower_bitfield (writes_to_lower.pop (), true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Done lowering bitfields\n");
+	  fprintf (dump_file, "-------------------------\n");
+	}
+    }
+  if (need_to_ifcvt)
+    {
+      /* Now all statements are if-convertible.  Combine all the basic
+	 blocks into one huge basic block doing the if-conversion
+	 on-the-fly.  */
+      combine_blocks (loop);
+    }
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
      and stores are involved.  CSE only the loop body, not the entry
@@ -3394,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   if (rloop != NULL)
     {
       loop = rloop;
+      reads_to_lower.truncate (0);
+      writes_to_lower.truncate (0);
       goto again;
     }
 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index e03b50498d164144da3220df8ee5bcf4248db821..4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4302,7 +4302,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt,
       free_data_ref (dr);
       return opt_result::failure_at (stmt,
 				     "not vectorized:"
-				     " statement is bitfield access %G", stmt);
+				     " statement is an unsupported"
+				     " bitfield access %G", stmt);
     }
 
   if (DR_BASE_ADDRESS (dr)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d2bd15b5e9005bce2612f0b32c0acf6ffe776343..0cc315d312667c05a27df4cdf435f0d0e6fd4a52 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
@@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
    is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */
 
 static tree
-vect_recog_temp_ssa_var (tree type, gimple *stmt)
+vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL)
 {
   return make_temp_ssa_name (type, stmt, "patt");
 }
@@ -1829,6 +1831,330 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+/* Function vect_recog_bitfield_ref_pattern
+
+   Try to find the following pattern:
+
+   bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
+   result = (type_out) bf_value;
+
+   where type_out is a non-bitfield type, that is to say, it's precision matches
+   2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with:
+   result = (type_out) bf_value;
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. If the precision of type_out is bigger
+   than the precision type of _1 we perform the widening before the shifting,
+   since the new precision will be large enough to shift the value and moving
+   widening operations up the statement chain enables the generation of
+   widening loads.  If we are widening and the operation after the pattern is
+   an addition then we mask first and shift later, to enable the generation of
+   shifting adds.  In the case of narrowing we will always mask first, shift
+   last and then perform a narrowing operation.  This will enable the
+   generation of narrowing shifts.
+
+   Widening with mask first, shift later:
+   container = (type_out) container;
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = patt2 >> masked;
+
+   Widening with shift first, mask last:
+   container = (type_out) container;
+   shifted = container >> bitpos;
+   result = shifted & ((1 << bitsize) - 1);
+
+   Narrowing:
+   masked = container & (((1 << bitsize) - 1) << bitpos);
+   result = masked >> bitpos;
+   result = (type_out) result;
+
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+				 tree *type_out)
+{
+  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+
+  if (!first_stmt)
+    return NULL;
+
+  gassign *bf_stmt;
+  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
+      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt
+	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+      if (!bf_stmt
+	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+	return NULL;
+    }
+  else
+    return NULL;
+
+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+  tree container = TREE_OPERAND (bf_ref, 0);
+
+  if (!bit_field_offset (bf_ref).is_constant ()
+      || !bit_field_size (bf_ref).is_constant ()
+      || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container))))
+    return NULL;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))
+      || TYPE_MODE (TREE_TYPE (container)) == E_BLKmode)
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);
+  tree ret_type = TREE_TYPE (ret);
+  bool shift_first = true;
+  tree vectype;
+
+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert
+     it to one of the same width so we can perform the necessary masking and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (container)))
+    {
+      unsigned HOST_WIDE_INT container_size =
+	tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container)));
+      tree int_type = build_nonstandard_integer_type (container_size, true);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (int_type),
+			       VIEW_CONVERT_EXPR, container);
+      vectype = get_vectype_for_scalar_type (vinfo, int_type);
+      container = gimple_assign_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+    }
+  else
+    vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container));
+
+  /* We move the conversion earlier if the loaded type is smaller than the
+     return type to enable the use of widening loads.  */
+  if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
+      && !useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, container);
+      container = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+  else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type))
+    /* If we are doing the conversion last then also delay the shift as we may
+       be able to combine the shift and conversion in certain cases.  */
+    shift_first = false;
+
+  tree container_type = TREE_TYPE (container);
+
+  /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
+     PLUS_EXPR then do the shift last as some targets can combine the shift and
+     add into a single instruction.  */
+  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+    {
+      if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
+	shift_first = false;
+    }
+
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+    shift_n = prec - shift_n - mask_width;
+
+  /* If we don't have to shift we only generate the mask, so just fix the
+     code-path to shift_first.  */
+  if (shift_n == 0)
+    shift_first = true;
+
+  tree result;
+  if (shift_first)
+    {
+      tree shifted = container;
+      if (shift_n)
+	{
+	  pattern_stmt
+	    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+				   RSHIFT_EXPR, container,
+				   build_int_cst (sizetype, shift_n));
+	  shifted = gimple_assign_lhs (pattern_stmt);
+	  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+	}
+
+      tree mask = wide_int_to_tree (container_type,
+				    wi::mask (mask_width, false, prec));
+
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, shifted, mask);
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+  else
+    {
+      tree mask = wide_int_to_tree (container_type,
+				    wi::shifted_mask (shift_n, mask_width,
+						      false, prec));
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       BIT_AND_EXPR, container, mask);
+      tree masked = gimple_assign_lhs (pattern_stmt);
+
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       RSHIFT_EXPR, masked,
+			       build_int_cst (sizetype, shift_n));
+      result = gimple_assign_lhs (pattern_stmt);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (result), ret_type))
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (ret_type),
+			       NOP_EXPR, result);
+    }
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+/* Function vect_recog_bit_insert_pattern
+
+   Try to find the following pattern:
+
+   written = BIT_INSERT_EXPR (container, value, bitpos);
+
+   Input:
+
+   * STMT_VINFO: The stmt we want to replace.
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern. In this case it will be:
+   value = (container_type) value;	    // Make sure
+   shifted = value << bitpos;		    // Shift value into place
+   masked = shifted & (mask << bitpos);	    // Mask off the non-relevant bits in
+					    // the 'to-write value'.
+   cleared = container & ~(mask << bitpos); // Clearing the bits we want to
+					    // write to from the value we want
+					    // to write to.
+   written = cleared | masked;		    // Write bits.
+
+
+   where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of
+   bits corresponding to the real size of the bitfield value we are writing to.
+   The shifting is always optional depending on whether bitpos != 0.
+
+*/
+
+static gimple *
+vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
+			       tree *type_out)
+{
+  gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR)
+    return NULL;
+
+  tree container = gimple_assign_rhs1 (bf_stmt);
+  tree value = gimple_assign_rhs2 (bf_stmt);
+  tree shift = gimple_assign_rhs3 (bf_stmt);
+
+  tree bf_type = TREE_TYPE (value);
+  tree container_type = TREE_TYPE (container);
+
+  if (!INTEGRAL_TYPE_P (container_type)
+      || !tree_fits_uhwi_p (TYPE_SIZE (container_type)))
+    return NULL;
+
+  gimple *pattern_stmt;
+
+  vect_unpromoted_value unprom;
+  unprom.set_op (value, vect_internal_def);
+  value = vect_convert_input (vinfo, stmt_info, container_type, &unprom,
+			      get_vectype_for_scalar_type (vinfo,
+							   container_type));
+
+  unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type);
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift);
+  if (BYTES_BIG_ENDIAN)
+    {
+      shift_n = prec - shift_n - mask_width;
+      shift = build_int_cst (TREE_TYPE (shift), shift_n);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (value), container_type))
+    {
+      pattern_stmt =
+	gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			     NOP_EXPR, value);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      value = gimple_get_lhs (pattern_stmt);
+    }
+
+  /* Shift VALUE into place.  */
+  tree shifted = value;
+  if (shift_n)
+    {
+      pattern_stmt
+	= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			       LSHIFT_EXPR, value, shift);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+      shifted = gimple_get_lhs (pattern_stmt);
+    }
+
+  tree mask_t
+    = wide_int_to_tree (container_type,
+			wi::shifted_mask (shift_n, mask_width, false, prec));
+
+  /* Clear bits we don't want to write back from SHIFTED.  */
+  gimple_seq stmts = NULL;
+  tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted,
+			      mask_t);
+  if (!gimple_seq_empty_p (stmts))
+    {
+      pattern_stmt = gimple_seq_first_stmt (stmts);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }
+
+  /* Mask off the bits in the container that we are to write to.  */
+  mask_t = wide_int_to_tree (container_type,
+			     wi::shifted_mask (shift_n, mask_width, true, prec));
+  tree cleared = vect_recog_temp_ssa_var (container_type);
+  pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t);
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+
+  /* Write MASKED into CLEARED.  */
+  pattern_stmt
+    = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
+			   BIT_IOR_EXPR, cleared, masked);
+
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  vect_pattern_detected ("bit_insert pattern", stmt_info->stmt);
+
+  return pattern_stmt;
+}
+
+
 /* Recognize cases in which an operation is performed in one type WTYPE
    but could be done more efficiently in a narrower type NTYPE.  For example,
    if we have:
@@ -5622,6 +5948,8 @@ struct vect_recog_func
    taken which means usually the more complex one needs to preceed the
    less comples onex (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+  { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
+  { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_over_widening_pattern, "over_widening" },
   /* Must come after over_widening, which narrows the shift as much as
      possible beforehand.  */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-10-07 14:20                                   ` Andre Vieira (lists)
@ 2022-10-12  1:55                                     ` Hongtao Liu
  2022-10-12  2:11                                       ` Hongtao Liu
  0 siblings, 1 reply; 25+ messages in thread
From: Hongtao Liu @ 2022-10-12  1:55 UTC (permalink / raw)
  To: Andre Vieira (lists)
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Richard Biener,
	gcc-patches

This commit failed tests

FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%xmm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
\t]*%ymm 1
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2

On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> Whilst running a bootstrap with extra options to force bitfield
> vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert
> -fno-vect-cost-model' I ran into an ICE in vect-patterns where a
> bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a
> E_BLKmode, which meant we failed to build an integer type with the same
> size. For that reason I added a check to bail out earlier if the
> TYPE_MODE of the container is indeed E_BLKmode. The pattern for the
> bitfield inserts required no change as we currently don't support
> containers that aren't integer typed.
>
> Also changed a testcase because in BIG-ENDIAN it was not vectorizing due
> to a different size of container that wasn't supported.
>
> This passes the same bootstrap and regressions on aarch64-none-linux and
> no regressions on aarch64_be-none-elf either.
>
> I assume you are OK with these changes Richard, but I don't like to
> commit on Friday in case something breaks over the weekend, so I'll
> leave it until Monday.
>
> Thanks,
> Andre
>
> On 29/09/2022 08:54, Richard Biener wrote:
> > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >> Made the change and also created the ChangeLogs.
> > OK if bootstrap / testing succeeds.
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> >> loop bb's from here...
> >>           (tree_if_conversion): ... to here.  Also call bitfield lowering
> >> when appropriate.
> >>           (version_loop_for_if_conversion): Adapt to enable loop
> >> versioning when we only need
> >>           to lower bitfields.
> >>           (ifcvt_split_critical_edges): Relax condition of expected loop
> >> form as this is checked earlier.
> >>           (get_bitfield_rep): New function.
> >>           (lower_bitfield): Likewise.
> >>           (bitfields_to_lower_p): Likewise.
> >>           (need_to_lower_bitfields): New global boolean.
> >>           (need_to_ifcvt): Likewise.
> >>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> >> Improve diagnostic message.
> >>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> >> value for last parameter.
> >>           (vect_recog_bitfield_ref_pattern): New.
> >>           (vect_recog_bit_insert_pattern): New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
> >>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
> >>
> >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> >>> On 27/09/2022 13:34, Richard Biener wrote:
> >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> >>>>
> >>>>> On 08/09/2022 12:51, Richard Biener wrote:
> >>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
> >>>>>> a comment ...
> >>>>> So I purposefully left a #if 0 #else #endif in there so you can see
> >>>>> the two
> >>>>> options. But the reason I used redundant_ssa_names is because ifcvt
> >>>>> seems to
> >>>>> use that as a container for all pairs of (old, new) ssa names to
> >>>>> replace
> >>>>> later. So I just piggy backed on that. I don't know if there's a
> >>>>> specific
> >>>>> reason they do the replacement at the end? Maybe some ordering
> >>>>> issue? Either
> >>>>> way both adding it to redundant_ssa_names or doing the replacement
> >>>>> inline work
> >>>>> for the bitfield lowering (or work in my testing at least).
> >>>> Possibly because we (in the past?) inserted/copied stuff based on
> >>>> predicates generated at analysis time after we decide to elide something
> >>>> so we need to watch for later appearing uses.  But who knows ... my mind
> >>>> fails me here.
> >>>>
> >>>> If it works to replace uses immediately please do so.  But now
> >>>> I wonder why we need this - the value shouldn't change so you
> >>>> should get away with re-using the existing SSA name for the final value?
> >>> Yeah... good point. A quick change and minor testing seems to agree.
> >>> I'm sure I had a good reason to do it initially ;)
> >>>
> >>> I'll run a full-regression on this change to make sure I didn't miss
> >>> anything.
> >>>



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
  2022-10-12  1:55                                     ` Hongtao Liu
@ 2022-10-12  2:11                                       ` Hongtao Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Hongtao Liu @ 2022-10-12  2:11 UTC (permalink / raw)
  To: Andre Vieira (lists)
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Richard Biener,
	gcc-patches

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107226

On Wed, Oct 12, 2022 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> This commit failed tests
>
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%xmm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[
> \t]*%ymm 1
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
> FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2
>
> On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > Whilst running a bootstrap with extra options to force bitfield
> > vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert
> > -fno-vect-cost-model' I ran into an ICE in vect-patterns where a
> > bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a
> > E_BLKmode, which meant we failed to build an integer type with the same
> > size. For that reason I added a check to bail out earlier if the
> > TYPE_MODE of the container is indeed E_BLKmode. The pattern for the
> > bitfield inserts required no change as we currently don't support
> > containers that aren't integer typed.
> >
> > Also changed a testcase because in BIG-ENDIAN it was not vectorizing due
> > to a different size of container that wasn't supported.
> >
> > This passes the same bootstrap and regressions on aarch64-none-linux and
> > no regressions on aarch64_be-none-elf either.
> >
> > I assume you are OK with these changes Richard, but I don't like to
> > commit on Friday in case something breaks over the weekend, so I'll
> > leave it until Monday.
> >
> > Thanks,
> > Andre
> >
> > On 29/09/2022 08:54, Richard Biener wrote:
> > > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >> Made the change and also created the ChangeLogs.
> > > OK if bootstrap / testing succeeds.
> > >
> > > Thanks,
> > > Richard.
> > >
> > >> gcc/ChangeLog:
> > >>
> > >>           * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of
> > >> loop bb's from here...
> > >>           (tree_if_conversion): ... to here.  Also call bitfield lowering
> > >> when appropriate.
> > >>           (version_loop_for_if_conversion): Adapt to enable loop
> > >> versioning when we only need
> > >>           to lower bitfields.
> > >>           (ifcvt_split_critical_edges): Relax condition of expected loop
> > >> form as this is checked earlier.
> > >>           (get_bitfield_rep): New function.
> > >>           (lower_bitfield): Likewise.
> > >>           (bitfields_to_lower_p): Likewise.
> > >>           (need_to_lower_bitfields): New global boolean.
> > >>           (need_to_ifcvt): Likewise.
> > >>           * tree-vect-data-refs.cc (vect_find_stmt_data_reference):
> > >> Improve diagnostic message.
> > >>           * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default
> > >> value for last parameter.
> > >>           (vect_recog_bitfield_ref_pattern): New.
> > >>           (vect_recog_bit_insert_pattern): New.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >>           * gcc.dg/vect/vect-bitfield-read-1.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-2.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-3.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-4.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-5.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-read-6.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-1.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-2.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-4.c: New test.
> > >>           * gcc.dg/vect/vect-bitfield-write-5.c: New test.
> > >>
> > >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote:
> > >>> On 27/09/2022 13:34, Richard Biener wrote:
> > >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote:
> > >>>>
> > >>>>> On 08/09/2022 12:51, Richard Biener wrote:
> > >>>>>> I'm curious, why the push to redundant_ssa_names?  That could use
> > >>>>>> a comment ...
> > >>>>> So I purposefully left a #if 0 #else #endif in there so you can see
> > >>>>> the two
> > >>>>> options. But the reason I used redundant_ssa_names is because ifcvt
> > >>>>> seems to
> > >>>>> use that as a container for all pairs of (old, new) ssa names to
> > >>>>> replace
> > >>>>> later. So I just piggy backed on that. I don't know if there's a
> > >>>>> specific
> > >>>>> reason they do the replacement at the end? Maybe some ordering
> > >>>>> issue? Either
> > >>>>> way both adding it to redundant_ssa_names or doing the replacement
> > >>>>> inline work
> > >>>>> for the bitfield lowering (or work in my testing at least).
> > >>>> Possibly because we (in the past?) inserted/copied stuff based on
> > >>>> predicates generated at analysis time after we decide to elide something
> > >>>> so we need to watch for later appearing uses.  But who knows ... my mind
> > >>>> fails me here.
> > >>>>
> > >>>> If it works to replace uses immediately please do so.  But now
> > >>>> I wonder why we need this - the value shouldn't change so you
> > >>>> should get away with re-using the existing SSA name for the final value?
> > >>> Yeah... good point. A quick change and minor testing seems to agree.
> > >>> I'm sure I had a good reason to do it initially ;)
> > >>>
> > >>> I'll run a full-regression on this change to make sure I didn't miss
> > >>> anything.
> > >>>
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Teach vectorizer to deal with bitfield reads
  2022-07-26 10:00 [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
  2022-07-27 11:37 ` Richard Biener
@ 2022-10-12  9:02 ` Eric Botcazou
  1 sibling, 0 replies; 25+ messages in thread
From: Eric Botcazou @ 2022-10-12  9:02 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc-patches, Richard Sandiford, Richard Biener

> Let me know if you believe this is a good approach? I've ran regression
> tests and this hasn't broken anything so far...

Small regression in Ada though, probably a missing guard somewhere:

                === gnat tests ===


Running target unix
FAIL: gnat.dg/loop_optimization23.adb 3 blank line(s) in output
FAIL: gnat.dg/loop_optimization23.adb (test for excess errors)
UNRESOLVED: gnat.dg/loop_optimization23.adb compilation failed to produce 
execut
able
FAIL: gnat.dg/loop_optimization23_pkg.adb 3 blank line(s) in output
FAIL: gnat.dg/loop_optimization23_pkg.adb (test for excess errors)

In order to reproduce, configure the compiler with Ada enabled, build it, and 
copy $[srcdir)/gcc/testsuite/gnat.dg/loop_optimization23_pkg.ad[sb] into the 
build directory, then just issue:

gcc/gnat1 -quiet loop_optimization23_pkg.adb -O2 -Igcc/ada/rts

eric@fomalhaut:~/build/gcc/native> gcc/gnat1 -quiet 
loop_optimization23_pkg.adb -O2 -Igcc/ada/rts
during GIMPLE pass: vect
+===========================GNAT BUG DETECTED==============================+
| 13.0.0 20221012 (experimental) [master ca7f7c3f140] (x86_64-suse-linux) GCC 
error:|
| in exact_div, at poly-int.h:2232                                         |
| Error detected around loop_optimization23_pkg.adb:5:3                    |
| Compiling loop_optimization23_pkg.adb                    

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-10-12  9:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-26 10:00 [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
2022-07-27 11:37 ` Richard Biener
2022-07-29  8:57   ` Andre Vieira (lists)
2022-07-29  9:11     ` Richard Biener
2022-07-29 10:31     ` Jakub Jelinek
2022-07-29 10:52       ` Richard Biener
2022-08-01 10:21         ` Andre Vieira (lists)
2022-08-01 13:16           ` Richard Biener
2022-08-08 14:06             ` [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Andre Vieira (lists)
2022-08-09 14:34               ` Richard Biener
2022-08-16 10:24                 ` Andre Vieira (lists)
2022-08-17 12:49                   ` Richard Biener
2022-08-25  9:09                     ` Andre Vieira (lists)
2022-09-08  9:07                       ` Andre Vieira (lists)
2022-09-08 11:51                       ` Richard Biener
2022-09-26 15:23                         ` Andre Vieira (lists)
2022-09-27 12:34                           ` Richard Biener
2022-09-28  9:43                             ` Andre Vieira (lists)
2022-09-28 17:31                               ` Andre Vieira (lists)
2022-09-29  7:54                                 ` Richard Biener
2022-10-07 14:20                                   ` Andre Vieira (lists)
2022-10-12  1:55                                     ` Hongtao Liu
2022-10-12  2:11                                       ` Hongtao Liu
2022-08-01 10:13       ` [RFC] Teach vectorizer to deal with bitfield reads Andre Vieira (lists)
2022-10-12  9:02 ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).