From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id EE8E4385AC1F for ; Tue, 9 Aug 2022 14:34:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EE8E4385AC1F Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 06CA120D6E; Tue, 9 Aug 2022 14:34:31 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id D5C392C1B0; Tue, 9 Aug 2022 14:34:30 +0000 (UTC) Date: Tue, 9 Aug 2022 14:34:30 +0000 (UTC) From: Richard Biener To: "Andre Vieira (lists)" cc: Jakub Jelinek , Richard Sandiford , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) In-Reply-To: <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com> Message-ID: References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> <69abe824-94f5-95b5-fb7f-6fa076973e05@arm.com> <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2022 14:34:35 -0000 On Mon, 8 Aug 2022, Andre Vieira (lists) wrote: > Hi, > > So I've changed the approach from the RFC as suggested, moving the bitfield > lowering to the if-convert pass. > > So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's > to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are > writes, using loads and writes of 'representatives' that are big enough to > contain the bitfield value. > > In vect_recog I added two patterns to replace these BIT_FIELD_REF and > BIT_INSERT_EXPR with shift's and masks as appropriate. > > I'd like to see if it was possible to remove the 'load' part of a > BIT_INSERT_EXPR if the representative write didn't change any relevant bits.  > For example: > > struct s{ > int dont_care; > char a : 3; > }; > > s.a = ; > > Should not require a load & write cycle, in fact it wouldn't even require any > masking either. Though to achieve this we'd need to make sure the > representative didn't overlap with any other field. Any suggestions on how to > do this would be great, though I don't think we need to wait for that, as > that's merely a nice-to-have optimization I guess? Hmm. I'm not sure the middle-end can simply ignore padding. If some language standard says that would be OK then I think we should exploit this during lowering when the frontend is still around to ask - which means somewhen during early optimization. > I am not sure where I should 'document' this change of behavior to ifcvt, > and/or we should change the name of the pass, since it's doing more than > if-conversion now? It's preparation for vectorization anyway since it will emit .MASK_LOAD/STORE and friends already. So I don't think anything needs to change there. @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (num <= 2 || loop->inner) return false; body = get_loop_body (loop); this doesn't appear in the ChangeLog nor is it clear to me why it's needed? Likewise - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } is that just premature optimization? + /* BITSTART and BITEND describe the region we can safely load from inside the + structure. BITPOS is the bit position of the value inside the + representative that we will end up loading OFFSET bytes from the start + of the struct. BEST_MODE is the mode describing the optimal size of the + representative chunk we load. If this is a write we will store the same + sized representative back, after we have changed the appropriate bits. */ + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); I think you need to give up when get_bit_range sets bitstart = bitend to zero + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, + TYPE_ALIGN (TREE_TYPE (struct_expr)), + INT_MAX, false, &best_mode)) + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, + NULL_TREE, rep_type); + /* Load from the start of 'offset + bitpos % alignment'. */ + uint64_t extra_offset = bitpos.to_constant (); you shouldn't build a new FIELD_DECL. Either you use DECL_BIT_FIELD_REPRESENTATIVE directly or you use a BIT_FIELD_REF accessing the "representative". DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain a variable field offset, you can also subset that with an intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is too large for your taste. I'm not sure all the offset calculation you do is correct, but since you shouldn't invent a new FIELD_DECL it probably needs to change anyway ... Note that for optimization it will be important that all accesses to the bitfield members of the same bitfield use the same underlying area (CSE and store-forwarding will thank you). + + need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower); + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) + && !need_to_lower_bitfields) goto cleanup; so we lower bitfields even when we cannot split critical edges? why? + need_to_ifcvt + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; likewise - if_convertible_loop_p performs other checks, the only one we want to elide is the loop->num_nodes <= 2 check since we want to lower bitfields in single-block loops as well. That means we only have to scan for bitfield accesses in the first block "prematurely". So I would interwind the need_to_lower_bitfields into if_convertible_loop_p and if_convertible_loop_p_1 and put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields. + tree op = gimple_get_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD (TREE_OPERAND (op, 1))) note the canonical test for a bitfield access is to check DECL_BIT_FIELD_TYPE, not DECL_BIT_FIELD. In particular for struct { int a : 4; int b : 4; int c : 8; int d : 4; int e : 12; } 'c' will _not_ have DECL_BIT_FIELD set but you want to lower it's access since you otherwise likely will get conflicting accesses for the other fields (store forwarding). +static bool +bitfields_to_lower_p (class loop *loop, auto_vec *to_lower) don't pass auto_vec<> *, just pass vec<>&, auto_vec will properly decay. +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *nop_stmt = dyn_cast (stmt_info->stmt); + if (!nop_stmt + || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt)) + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + + tree load = TREE_OPERAND (bf_ref, 0); + tree size = TREE_OPERAND (bf_ref, 1); + tree offset = TREE_OPERAND (bf_ref, 2); use bit_field_{size,offset} + /* Bail out if the load is already a vector type. */ + if (VECTOR_TYPE_P (TREE_TYPE (load))) + return NULL; I think you want a "positive" check, what kind of type you handle for the load. An (unsigned?) INTEGRAL_TYPE_P one I guess. + tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt)); + gimple_assign_lhs + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } hm - so you have for example int _1 = MEM; int:3 _2 = BIT_FIELD_REF <_1, ...> type _3 = (type) _2; and that _3 = (type) _2 is because of integer promotion and you perform all the shifting in that type. I suppose you should verify that the cast is indeed promoting, not narrowing, since otherwise you'll produce wrong code? That said, shouldn't you perform the shift / mask in the type of _1 instead? (the hope is, of course, that typeof (_1) == type in most cases) Similar comments apply to vect_recog_bit_insert_pattern. Overall it looks reasonable but it does still need some work. Thanks, Richard. > Bootstrapped and regression tested this patch on aarch64-none-linux-gnu. > > gcc/ChangeLog: > 2022-08-08  Andre Vieira  > >         * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of > includes. >         (need_to_lower_bitfields): New static bool. >         (need_to_ifcvt): Likewise. >         (version_loop_for_if_conversion): Adapt to work for bitfield > lowering-only path. >         (bitfield_data_t): New typedef. >         (get_bitfield_data): New function. >         (lower_bitfield): New function. >         (bitfields_to_lower_p): New function. >         (tree_if_conversion): Change to lower-bitfields too. >         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): > Modify dump message to be more accurate. >         * tree-vect-patterns.cc (includes): Add gimplify-me.h include. >         (vect_recog_bitfield_ref_pattern): New function. >         (vect_recog_bit_insert_pattern): New function. >         (vect_vect_recog_func_ptrs): Add two new patterns. > > gcc/testsuite/ChangeLog: > 2022-08-08  Andre Vieira  > >         * gcc.dg/vect/vect-bitfield-read-1.c: New test. >         * gcc.dg/vect/vect-bitfield-read-2.c: New test. >         * gcc.dg/vect/vect-bitfield-read-3.c: New test. >         * gcc.dg/vect/vect-bitfield-read-4.c: New test. >         * gcc.dg/vect/vect-bitfield-write-1.c: New test. >         * gcc.dg/vect/vect-bitfield-write-2.c: New test. >         * gcc.dg/vect/vect-bitfield-write-3.c: New test. > > Kind regards, > Andre