From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
 by sourceware.org (Postfix) with ESMTPS id EE8E4385AC1F
 for <gcc-patches@gcc.gnu.org>; Tue,  9 Aug 2022 14:34:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EE8E4385AC1F
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id 06CA120D6E;
 Tue,  9 Aug 2022 14:34:31 +0000 (UTC)
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id D5C392C1B0;
 Tue,  9 Aug 2022 14:34:30 +0000 (UTC)
Date: Tue, 9 Aug 2022 14:34:30 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
cc: Jakub Jelinek <jakub@redhat.com>, 
 Richard Sandiford <richard.sandiford@arm.com>, 
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was:
 [RFC] Teach vectorizer to deal with bitfield reads)
In-Reply-To: <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com>
Message-ID: <nycvar.YFH.7.77.849.2208091350580.3343@jbgna.fhfr.qr>
References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com>
 <nycvar.YFH.7.77.849.2207271122000.6583@jbgna.fhfr.qr>
 <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> <YuO3FvWoUm148X8J@tucnak>
 <nycvar.YFH.7.77.849.2207291049350.4208@jbgna.fhfr.qr>
 <69abe824-94f5-95b5-fb7f-6fa076973e05@arm.com>
 <nycvar.YFH.7.77.849.2208011313430.4208@jbgna.fhfr.qr>
 <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Aug 2022 14:34:35 -0000

On Mon, 8 Aug 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> So I've changed the approach from the RFC as suggested, moving the bitfield
> lowering to the if-convert pass.
> 
> So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's
> to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are
> writes, using loads and writes of 'representatives' that are big enough to
> contain the bitfield value.
> 
> In vect_recog I added two patterns to replace these BIT_FIELD_REF and
> BIT_INSERT_EXPR with shift's and masks as appropriate.
> 
> I'd like to see if it was possible to remove the 'load' part of a
> BIT_INSERT_EXPR if the representative write didn't change any relevant bits. 
> For example:
> 
> struct s{
> int dont_care;
> char a : 3;
> };
> 
> s.a = <value>;
> 
> Should not require a load & write cycle, in fact it wouldn't even require any
> masking either. Though to achieve this we'd need to make sure the
> representative didn't overlap with any other field. Any suggestions on how to
> do this would be great, though I don't think we need to wait for that, as
> that's merely a nice-to-have optimization I guess?

Hmm.  I'm not sure the middle-end can simply ignore padding.  If
some language standard says that would be OK then I think we should
exploit this during lowering when the frontend is still around to
ask - which means somewhen during early optimization.

> I am not sure where I should 'document' this change of behavior to ifcvt,
> and/or we should change the name of the pass, since it's doing more than
> if-conversion now?

It's preparation for vectorization anyway since it will emit
.MASK_LOAD/STORE and friends already.  So I don't think anything
needs to change there.


@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool 
aggressive_if_conv)
   auto_vec<edge> critical_edges;

   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num <= 2 || loop->inner)
     return false;

   body = get_loop_body (loop);

this doesn't appear in the ChangeLog nor is it clear to me why it's
needed?  Likewise

-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds = NULL;
+  if (any_complicated_phi || need_to_predicate)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  
*/
+      save_length = loop->inner ? loop->inner->num_nodes : 
loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+       saved_preds[i] = ifc_bbs[i]->aux;
+    }

is that just premature optimization?

+  /* BITSTART and BITEND describe the region we can safely load from 
inside the
+     structure.  BITPOS is the bit position of the value inside the
+     representative that we will end up loading OFFSET bytes from the 
start
+     of the struct.  BEST_MODE is the mode describing the optimal size of 
the
+     representative chunk we load.  If this is a write we will store the 
same
+     sized representative back, after we have changed the appropriate 
bits.  */
+  get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset);

I think you need to give up when get_bit_range sets bitstart = bitend to 
zero

+  if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend,
+                    TYPE_ALIGN (TREE_TYPE (struct_expr)),
+                    INT_MAX, false, &best_mode))

+  tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+                             NULL_TREE, rep_type);
+  /* Load from the start of 'offset + bitpos % alignment'.  */
+  uint64_t extra_offset = bitpos.to_constant ();

you shouldn't build a new FIELD_DECL.  Either you use
DECL_BIT_FIELD_REPRESENTATIVE directly or you use a
BIT_FIELD_REF accessing the "representative".
DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain
a variable field offset, you can also subset that with an
intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is
too large for your taste.

I'm not sure all the offset calculation you do is correct, but
since you shouldn't invent a new FIELD_DECL it probably needs
to change anyway ...

Note that for optimization it will be important that all
accesses to the bitfield members of the same bitfield use the
same underlying area (CSE and store-forwarding will thank you).

+
+  need_to_lower_bitfields = bitfields_to_lower_p (loop, 
&bitfields_to_lower);
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !need_to_lower_bitfields)
     goto cleanup;

so we lower bitfields even when we cannot split critical edges?
why?

+  need_to_ifcvt
+    = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree);
+  if (!need_to_ifcvt && !need_to_lower_bitfields)
     goto cleanup;

likewise - if_convertible_loop_p performs other checks, the only
one we want to elide is the loop->num_nodes <= 2 check since
we want to lower bitfields in single-block loops as well.  That
means we only have to scan for bitfield accesses in the first
block "prematurely".  So I would interwind the need_to_lower_bitfields
into if_convertible_loop_p and if_convertible_loop_p_1 and
put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields.

+         tree op = gimple_get_lhs (stmt);
+         bool write = TREE_CODE (op) == COMPONENT_REF;
+
+         if (!write)
+           op = gimple_assign_rhs1 (stmt);
+
+         if (TREE_CODE (op) != COMPONENT_REF)
+           continue;
+
+         if (DECL_BIT_FIELD (TREE_OPERAND (op, 1)))

note the canonical test for a bitfield access is to check
DECL_BIT_FIELD_TYPE, not DECL_BIT_FIELD.  In particular for

struct { int a : 4; int b : 4; int c : 8; int d : 4; int e : 12; }

'c' will _not_ have DECL_BIT_FIELD set but you want to lower it's
access since you otherwise likely will get conflicting accesses
for the other fields (store forwarding).

+static bool
+bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> 
*to_lower)

don't pass auto_vec<> *, just pass vec<>&, auto_vec will properly
decay.

+vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info 
stmt_info,
+                                tree *type_out)
+{
+  gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!nop_stmt
+      || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR

CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt))

+  tree bf_ref = gimple_assign_rhs1 (bf_stmt);
+
+  tree load = TREE_OPERAND (bf_ref, 0);
+  tree size = TREE_OPERAND (bf_ref, 1);
+  tree offset = TREE_OPERAND (bf_ref, 2);

use bit_field_{size,offset}

+  /* Bail out if the load is already a vector type.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (load)))
+    return NULL;

I think you want a "positive" check, what kind of type you
handle for the load.  An (unsigned?) INTEGRAL_TYPE_P one I guess.

+  tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt));
+

gimple_assign_lhs

+  if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type))
+    {
+      pattern_stmt
+       = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL),
+                              NOP_EXPR, lhs);
+      lhs = gimple_get_lhs (pattern_stmt);
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
+    }

hm - so you have for example

 int _1 = MEM;
 int:3 _2 = BIT_FIELD_REF <_1, ...>
 type _3 = (type) _2;

and that _3 = (type) _2 is because of integer promotion and you
perform all the shifting in that type.  I suppose you should
verify that the cast is indeed promoting, not narrowing, since
otherwise you'll produce wrong code?  That said, shouldn't you
perform the shift / mask in the type of _1 instead?  (the hope
is, of course, that typeof (_1) == type in most cases)

Similar comments apply to vect_recog_bit_insert_pattern.

Overall it looks reasonable but it does still need some work.

Thanks,
Richard.



> Bootstrapped and regression tested this patch on aarch64-none-linux-gnu.
> 
> gcc/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of
> includes.
>         (need_to_lower_bitfields): New static bool.
>         (need_to_ifcvt): Likewise.
>         (version_loop_for_if_conversion): Adapt to work for bitfield 
> lowering-only path.
>         (bitfield_data_t): New typedef.
>         (get_bitfield_data): New function.
>         (lower_bitfield): New function.
>         (bitfields_to_lower_p): New function.
>         (tree_if_conversion): Change to lower-bitfields too.
>         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): 
> Modify dump message to be more accurate.
>         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.
>         (vect_recog_bitfield_ref_pattern): New function.
>         (vect_recog_bit_insert_pattern): New function.
>         (vect_vect_recog_func_ptrs): Add two new patterns.
> 
> gcc/testsuite/ChangeLog:
> 2022-08-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * gcc.dg/vect/vect-bitfield-read-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-3.c: New test.
>         * gcc.dg/vect/vect-bitfield-read-4.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-1.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-2.c: New test.
>         * gcc.dg/vect/vect-bitfield-write-3.c: New test.
> 
> Kind regards,
> Andre