From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C39CA3858D39 for ; Tue, 26 Oct 2021 14:18:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C39CA3858D39 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 74F2E1063; Tue, 26 Oct 2021 07:18:36 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D5F363F73D; Tue, 26 Oct 2021 07:18:35 -0700 (PDT) From: Richard Sandiford To: Robin Dapp Mail-Followup-To: Robin Dapp , Robin Dapp via Gcc-patches , linkw@linux.ibm.com, richard.sandiford@arm.com Cc: Robin Dapp via Gcc-patches , linkw@linux.ibm.com Subject: Re: [RFC] Partial vectors for s390 References: Date: Tue, 26 Oct 2021 15:18:34 +0100 In-Reply-To: (Robin Dapp's message of "Tue, 26 Oct 2021 15:04:41 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Oct 2021 14:18:39 -0000 Robin Dapp writes: > Hi Richard, > >> We already have code to probe the predicates of the underlying >> define_expands/insns to see whether they support certain constant >> IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. >> We could do something similar here: add an extra operand to the optab, >> and an extra argument to the IFN, that gives a bias amount. >> The PowerPC version would require 0, the System Z version would >> require -1. The vectoriser would probe to see which value >> it should use. >>=20 >> Doing it that way ensures that the gimple is still self-describing. >> It avoids gimple semantics depending on target hooks. > > As I don't have much previous exposure to the vectoriser code, I cobbled= =20 > together something pretty ad-hoc (attached). Does this come somehow=20 > close to what you have in mind? Yeah, looks good. > internal_len_load_supported_p should rather be called=20 > internal_len_load_bias_supported_p or so I guess and the part where we=20 > exclude multiple loop_lens is still missing. Since we only support one bias, it might be better to make the internal-fn.c function return the bias as an int (with some marker value for =E2=80=9Cnot supported=E2=80=9D), so that the caller doesn't need= to probe both values. > Would we also check for a viable bias there and then either accept > multiple lens or not? Yeah, I think so. Thanks, Richard > > Regards > Robin > > commit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab > Author: Robin Dapp > Date: Tue Oct 26 14:36:08 2021 +0200 > > bias1 > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index 8312d08aab2..bf97d3e471a 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int in= dex) > static void > expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab op= tab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, target, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, target, mask, bias; > insn_code icode; >=20=20 > maskt =3D gimple_call_arg (stmt, 2); > @@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall *= stmt, convert_optab optab) > TYPE_UNSIGNED (TREE_TYPE (maskt))); > else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab =3D=3D len_load_optab) > + { > + biast =3D gimple_call_arg (stmt, 3); > + bias =3D expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > + } > + > + if (optab !=3D len_load_optab) > + expand_insn (icode, 3, ops); > + else > + expand_insn (icode, 4, ops); > + > if (!rtx_equal_p (target, ops[0].value)) > emit_move_insn (target, ops[0].value); > } > @@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *s= tmt, convert_optab optab) > static void > expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab o= ptab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, reg, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, reg, mask, bias; > insn_code icode; >=20=20 > maskt =3D gimple_call_arg (stmt, 2); > @@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall = *stmt, convert_optab optab) > TYPE_UNSIGNED (TREE_TYPE (maskt))); > else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab =3D=3D len_store_optab) > + { > + biast =3D gimple_call_arg (stmt, 4); > + bias =3D expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > + } > + > + if (optab !=3D len_store_optab) > + expand_insn (icode, 3, ops); > + else > + expand_insn (icode, 4, ops); > } >=20=20 > #define expand_mask_store_optab_fn expand_partial_store_optab_fn > @@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_f= n ifn, tree vector_type, > && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); > } >=20=20 > +bool > +internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias) > +{ > + if (bias > 0 || bias < -1) > + return false; > + > + machine_mode mode =3D TYPE_MODE (load_type); > + > + optab optab =3D direct_internal_fn_optab (ifn); > + insn_code icode =3D direct_optab_handler (optab, mode); > + int output_ops =3D internal_load_fn_p (ifn) ? 1 : 0; > + > + if (icode !=3D CODE_FOR_nothing > + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias))) > + return true; > + > + return false; > +} > + > /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function = IFN > for pointers of type TYPE when the accesses have LENGTH bytes and the= ir > common byte alignment is ALIGN. */ > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 19d0f849a5a..d0bf9941bcc 100644 > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -225,6 +225,7 @@ extern int internal_fn_mask_index (internal_fn); > extern int internal_fn_stored_value_index (internal_fn); > extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, > tree, tree, int); > +extern bool internal_len_load_supported_p (internal_fn ifn, tree, int); > extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, > poly_uint64, unsigned int); >=20=20 > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index d7723b1a92a..50537763ace 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -8272,12 +8272,14 @@ vectorizable_store (vec_info *vinfo, > opt_machine_mode new_ovmode > =3D get_len_load_store_mode (vmode, false); > machine_mode new_vmode =3D new_ovmode.require (); > + tree vtype =3D vectype; > /* Need conversion if it's wrapped with VnQI. */ > if (vmode !=3D new_vmode) > { > tree new_vtype > =3D build_vector_type_for_mode (unsigned_intQI_type_node, > - new_vmode); > + new_vmode); > + vtype =3D new_vtype; > tree var > =3D vect_get_new_ssa_name (new_vtype, vect_simple_var); > vec_oprnd > @@ -8289,9 +8291,29 @@ vectorizable_store (vec_info *vinfo, > gsi); > vec_oprnd =3D var; > } > + > + /* Check which bias value to use. Default is 0. */ > + tree bias =3D build_int_cst (intSI_type_node, 0); > + tree new_len =3D final_len; > + if (!internal_len_load_supported_p (IFN_LEN_LOAD, vtype, 0) > + && internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, -1)) > + { > + bias =3D build_int_cst (intSI_type_node, -1); > + new_len =3D make_ssa_name (TREE_TYPE (final_len)); > + gassign *m1 =3D gimple_build_assign (new_len, > + MINUS_EXPR, > + final_len, > + build_one_cst > + (TREE_TYPE > + (final_len))); > + vect_finish_stmt_generation (vinfo, stmt_info, m1, > + gsi); > + } > gcall *call > - =3D gimple_build_call_internal (IFN_LEN_STORE, 4, dataref_ptr, > - ptr, final_len, vec_oprnd); > + =3D gimple_build_call_internal (IFN_LEN_STORE, 5, dataref_ptr, > + ptr, new_len, vec_oprnd, > + bias); > gimple_call_set_nothrow (call, true); > vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > new_stmt =3D call; > @@ -9588,24 +9610,50 @@ vectorizable_load (vec_info *vinfo, > vec_num * j + i); > tree ptr =3D build_int_cst (ref_type, > align * BITS_PER_UNIT); > + > + machine_mode vmode =3D TYPE_MODE (vectype); > + opt_machine_mode new_ovmode > + =3D get_len_load_store_mode (vmode, true); > + machine_mode new_vmode =3D new_ovmode.require (); > + tree qi_type =3D unsigned_intQI_type_node; > + tree new_vtype > + =3D build_vector_type_for_mode (qi_type, new_vmode); > + > + tree vtype =3D vectype; > + if (vmode !=3D new_vmode) > + vtype =3D new_vtype; > + > + /* Check which bias value to use. Default is 0. */ > + tree bias =3D build_int_cst (intSI_type_node, 0); > + tree new_len =3D final_len; > + if (!internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, 0) > + && internal_len_load_supported_p (IFN_LEN_LOAD, > + vtype, -1)) > + { > + bias =3D build_int_cst (intSI_type_node, -1); > + new_len =3D make_ssa_name (TREE_TYPE (final_len)); > + gassign *m1 =3D gimple_build_assign (new_len, > + MINUS_EXPR, > + final_len, > + build_one_cst > + (TREE_TYPE > + (final_len))); > + vect_finish_stmt_generation (vinfo, stmt_info, m1, > + gsi); > + } > + > gcall *call > - =3D gimple_build_call_internal (IFN_LEN_LOAD, 3, > + =3D gimple_build_call_internal (IFN_LEN_LOAD, 4, > dataref_ptr, ptr, > - final_len); > + new_len, bias); > gimple_call_set_nothrow (call, true); > new_stmt =3D call; > data_ref =3D NULL_TREE; >=20=20 > /* Need conversion if it's wrapped with VnQI. */ > - machine_mode vmode =3D TYPE_MODE (vectype); > - opt_machine_mode new_ovmode > - =3D get_len_load_store_mode (vmode, true); > - machine_mode new_vmode =3D new_ovmode.require (); > if (vmode !=3D new_vmode) > { > - tree qi_type =3D unsigned_intQI_type_node; > - tree new_vtype > - =3D build_vector_type_for_mode (qi_type, new_vmode); > tree var =3D vect_get_new_ssa_name (new_vtype, > vect_simple_var); > gimple_set_lhs (call, var);