From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 02EC638582B0 for ; Tue, 22 Aug 2023 13:29:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 02EC638582B0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2b9338e4695so69976791fa.2 for ; Tue, 22 Aug 2023 06:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692710978; x=1693315778; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VwaOt2FcnpE1WO4vWVvRfxPsVF8WF5WapcThd5LXN0g=; b=ssDld8zDPjiOHzCYaiTzp09IPjcffyu7MfdqfWwujNvhN/Ou++ZxNEI3A9+/p31+uW CUgo7lFcVNkZ0FNWDMqJohf6WXARP0dWVfDYeryrxgwj7i2+UFuwZ0HeSfe97DJFgk4e 0igjUzEiIxqiT+Exx+xIwB3i0ovniJuF8ysVSiprjGCQuyNEReQXqq8u3eYD3QqGKnVL tN5PgSJY/AozAxwFQOFpMG27kx4Kfi5dkHpeWUNBZVIEPRz9UDvf6By0r//rPwgAsgLl ZGKrbVcC6i1+ZqTubcNxomf/gmaRp5Vt0JUmBK+zuTug70epdzPSGI1aviulAyTnOOwh Xxgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692710978; x=1693315778; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VwaOt2FcnpE1WO4vWVvRfxPsVF8WF5WapcThd5LXN0g=; b=IjjwlCNKVNVwvLWUg0/m4SzR+YFVDKQbbzjsvJOaB4r6zwKG28+JDgaRHFSEhOZrjs 0kF7vmCnEdVlhtBiqFwQRPCmjPA+Vk4XQuOsA1JTOw2+BhQeDf1D3u8OgCZL+kEvN8CD pRm4kYt28TSEYYCIDPGTVTB9JuCYGpmZ42NWmVMCYTJZmpV1ym8J1OFU6UAupZ5GL4Zy SM3T8IaKuKXzFRTWPWGQZME/wy3ci2G6Ov0Tn9BE8+HJdGi2zOe2UrZvlqwyWiNS75cU 5DqeWSEa04x+b9m/2xe83ej2Xo8Qudfx7k9CwESVANr0YvjDpOfFThww8zNx6bwvKTzh ngJQ== X-Gm-Message-State: AOJu0YxpoScrpwpD+G3obbmZ4VXTGw2isx03SAazF6Cu6iBNfjk7sBvE 6dL1d7xIF6HQBWWkeyFoTFWZFeXXxvs7Cowpf7o= X-Google-Smtp-Source: AGHT+IFqdvehRlOzRNm+lJbjUPz41umoCtTZGWtcaKCRiAN0uV9r0pCVtQeIXQ6kK6BAJpeYLv29b9u8ycjBQ3KRdek= X-Received: by 2002:a2e:988d:0:b0:2b6:eb68:fe76 with SMTP id b13-20020a2e988d000000b002b6eb68fe76mr7528837ljj.25.1692710978262; Tue, 22 Aug 2023 06:29:38 -0700 (PDT) MIME-Version: 1.0 References: <8c6c6b96-0b97-4eed-5b88-bda2b3dcc902@linux.ibm.com> <1c07d6a4-f322-6a1d-aaea-4d17733493fe@linux.ibm.com> In-Reply-To: <1c07d6a4-f322-6a1d-aaea-4d17733493fe@linux.ibm.com> From: Richard Biener Date: Tue, 22 Aug 2023 15:28:09 +0200 Message-ID: Subject: Re: [PATCH 3/3] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest To: "Kewen.Lin" Cc: GCC Patches , Richard Sandiford , Segher Boessenkool , Peter Bergner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 22, 2023 at 10:52=E2=80=AFAM Kewen.Lin wr= ote: > > Hi, > > Like r14-3317 which moves the handlings on memory access > type VMAT_GATHER_SCATTER in vectorizable_load final loop > nest, this one is to deal with vectorizable_store side. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > Is it ok for trunk? OK. > BR, > Kewen > ----- > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_store): Move the handlings on > VMAT_GATHER_SCATTER in the final loop nest to its own loop, > and update the final nest accordingly. > --- > gcc/tree-vect-stmts.cc | 258 +++++++++++++++++++++++++---------------- > 1 file changed, 159 insertions(+), 99 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 18f5ebcc09c..b959c1861ad 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -8930,44 +8930,23 @@ vectorizable_store (vec_info *vinfo, > return true; > } > > - auto_vec result_chain (group_size); > - auto_vec vec_offsets; > - auto_vec vec_oprnds; > - for (j =3D 0; j < ncopies; j++) > + if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > { > - gimple *new_stmt; > - if (j =3D=3D 0) > + gcc_assert (!slp && !grouped_store); > + auto_vec vec_offsets; > + for (j =3D 0; j < ncopies; j++) > { > - if (slp) > - { > - /* Get vectorized arguments for SLP_NODE. */ > - vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, > - &vec_oprnds); > - vec_oprnd =3D vec_oprnds[0]; > - } > - else > + gimple *new_stmt; > + if (j =3D=3D 0) > { > - /* For interleaved stores we collect vectorized defs for al= l the > - stores in the group in DR_CHAIN. DR_CHAIN is then used a= s an > - input to vect_permute_store_chain(). > - > - If the store is not grouped, DR_GROUP_SIZE is 1, and DR_= CHAIN > - is of size 1. */ > - stmt_vec_info next_stmt_info =3D first_stmt_info; > - for (i =3D 0; i < group_size; i++) > - { > - /* Since gaps are not supported for interleaved stores, > - DR_GROUP_SIZE is the exact number of stmts in the ch= ain. > - Therefore, NEXT_STMT_INFO can't be NULL_TREE. In ca= se > - that there is no interleaving, DR_GROUP_SIZE is 1, > - and only one iteration of the loop will be executed.= */ > - op =3D vect_get_store_rhs (next_stmt_info); > - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, n= copies, > - op, gvec_oprnds[i]); > - vec_oprnd =3D (*gvec_oprnds[i])[0]; > - dr_chain.quick_push (vec_oprnd); > - next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt_inf= o); > - } > + /* Since the store is not grouped, DR_GROUP_SIZE is 1, and > + DR_CHAIN is of size 1. */ > + gcc_assert (group_size =3D=3D 1); > + op =3D vect_get_store_rhs (first_stmt_info); > + vect_get_vec_defs_for_operand (vinfo, first_stmt_info, ncop= ies, > + op, gvec_oprnds[0]); > + vec_oprnd =3D (*gvec_oprnds[0])[0]; > + dr_chain.quick_push (vec_oprnd); > if (mask) > { > vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopie= s, > @@ -8975,91 +8954,55 @@ vectorizable_store (vec_info *vinfo, > mask_vectype); > vec_mask =3D vec_masks[0]; > } > - } > > - /* We should have catched mismatched types earlier. */ > - gcc_assert (useless_type_conversion_p (vectype, > - TREE_TYPE (vec_oprnd))); > - bool simd_lane_access_p > - =3D STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) !=3D 0; > - if (simd_lane_access_p > - && !loop_masks > - && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) =3D=3D A= DDR_EXPR > - && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr)= , 0)) > - && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info= )) > - && integer_zerop (DR_INIT (first_dr_info->dr)) > - && alias_sets_conflict_p (get_alias_set (aggr_type), > - get_alias_set (TREE_TYPE (ref_typ= e)))) > - { > - dataref_ptr =3D unshare_expr (DR_BASE_ADDRESS (first_dr_inf= o->dr)); > - dataref_offset =3D build_int_cst (ref_type, 0); > + /* We should have catched mismatched types earlier. */ > + gcc_assert (useless_type_conversion_p (vectype, > + TREE_TYPE (vec_oprnd= ))); > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, > + slp_node, &gs_info, &dataref= _ptr, > + &vec_offsets); > + else > + dataref_ptr > + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, a= ggr_type, > + NULL, offset, &dummy, gsi, > + &ptr_incr, false, bump); > } > - else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, slp= _node, > - &gs_info, &dataref_ptr, &vec_off= sets); > else > - dataref_ptr > - =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_= type, > - simd_lane_access_p ? loop : NUL= L, > - offset, &dummy, gsi, &ptr_incr, > - simd_lane_access_p, bump); > - } > - else > - { > - gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > - /* DR_CHAIN is then used as an input to vect_permute_store_chai= n(). > - If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAI= N is > - of size 1. */ > - for (i =3D 0; i < group_size; i++) > { > - vec_oprnd =3D (*gvec_oprnds[i])[j]; > - dr_chain[i] =3D vec_oprnd; > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > + vec_oprnd =3D (*gvec_oprnds[0])[j]; > + dr_chain[0] =3D vec_oprnd; > + if (mask) > + vec_mask =3D vec_masks[j]; > + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_= incr, > + gsi, stmt_info, bump); > } > - if (mask) > - vec_mask =3D vec_masks[j]; > - if (dataref_offset) > - dataref_offset =3D int_const_binop (PLUS_EXPR, dataref_offset= , bump); > - else if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > - stmt_info, bump); > - } > - > - new_stmt =3D NULL; > - if (grouped_store) > - /* Permute. */ > - vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info,= gsi, > - &result_chain); > > - stmt_vec_info next_stmt_info =3D first_stmt_info; > - for (i =3D 0; i < vec_num; i++) > - { > - unsigned misalign; > + new_stmt =3D NULL; > unsigned HOST_WIDE_INT align; > - > tree final_mask =3D NULL_TREE; > tree final_len =3D NULL_TREE; > tree bias =3D NULL_TREE; > if (loop_masks) > final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > - vec_num * ncopies, vectype, > - vec_num * j + i); > + ncopies, vectype, j); > if (vec_mask) > final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi= nal_mask, > vec_mask, gsi); > > - if (memory_access_type =3D=3D VMAT_GATHER_SCATTER > - && gs_info.ifn !=3D IFN_LAST) > + if (gs_info.ifn !=3D IFN_LAST) > { > if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > - vec_offset =3D vec_offsets[vec_num * j + i]; > + vec_offset =3D vec_offsets[j]; > tree scale =3D size_int (gs_info.scale); > > if (gs_info.ifn =3D=3D IFN_MASK_LEN_SCATTER_STORE) > { > if (loop_lens) > final_len =3D vect_get_loop_len (loop_vinfo, gsi, loo= p_lens, > - vec_num * ncopies, vec= type, > - vec_num * j + i, 1); > + ncopies, vectype, j, 1= ); > else > final_len =3D build_int_cst (sizetype, > TYPE_VECTOR_SUBPARTS (vect= ype)); > @@ -9091,9 +9034,8 @@ vectorizable_store (vec_info *vinfo, > gimple_call_set_nothrow (call, true); > vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); > new_stmt =3D call; > - break; > } > - else if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > + else > { > /* Emulated scatter. */ > gcc_assert (!final_mask); > @@ -9142,8 +9084,126 @@ vectorizable_store (vec_info *vinfo, > new_stmt =3D gimple_build_assign (ref, elt); > vect_finish_stmt_generation (vinfo, stmt_info, new_stmt= , gsi); > } > - break; > } > + if (j =3D=3D 0) > + *vec_stmt =3D new_stmt; > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > + } > + return true; > + } > + > + auto_vec result_chain (group_size); > + auto_vec vec_oprnds; > + for (j =3D 0; j < ncopies; j++) > + { > + gimple *new_stmt; > + if (j =3D=3D 0) > + { > + if (slp) > + { > + /* Get vectorized arguments for SLP_NODE. */ > + vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, > + &vec_oprnds); > + vec_oprnd =3D vec_oprnds[0]; > + } > + else > + { > + /* For interleaved stores we collect vectorized defs for al= l the > + stores in the group in DR_CHAIN. DR_CHAIN is then used a= s an > + input to vect_permute_store_chain(). > + > + If the store is not grouped, DR_GROUP_SIZE is 1, and DR_= CHAIN > + is of size 1. */ > + stmt_vec_info next_stmt_info =3D first_stmt_info; > + for (i =3D 0; i < group_size; i++) > + { > + /* Since gaps are not supported for interleaved stores, > + DR_GROUP_SIZE is the exact number of stmts in the ch= ain. > + Therefore, NEXT_STMT_INFO can't be NULL_TREE. In ca= se > + that there is no interleaving, DR_GROUP_SIZE is 1, > + and only one iteration of the loop will be executed.= */ > + op =3D vect_get_store_rhs (next_stmt_info); > + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, n= copies, > + op, gvec_oprnds[i]); > + vec_oprnd =3D (*gvec_oprnds[i])[0]; > + dr_chain.quick_push (vec_oprnd); > + next_stmt_info =3D DR_GROUP_NEXT_ELEMENT (next_stmt_inf= o); > + } > + if (mask) > + { > + vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopie= s, > + mask, &vec_masks, > + mask_vectype); > + vec_mask =3D vec_masks[0]; > + } > + } > + > + /* We should have catched mismatched types earlier. */ > + gcc_assert (useless_type_conversion_p (vectype, > + TREE_TYPE (vec_oprnd))); > + bool simd_lane_access_p > + =3D STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) !=3D 0; > + if (simd_lane_access_p > + && !loop_masks > + && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) =3D=3D A= DDR_EXPR > + && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr)= , 0)) > + && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info= )) > + && integer_zerop (DR_INIT (first_dr_info->dr)) > + && alias_sets_conflict_p (get_alias_set (aggr_type), > + get_alias_set (TREE_TYPE (ref_typ= e)))) > + { > + dataref_ptr =3D unshare_expr (DR_BASE_ADDRESS (first_dr_inf= o->dr)); > + dataref_offset =3D build_int_cst (ref_type, 0); > + } > + else > + dataref_ptr > + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_= type, > + simd_lane_access_p ? loop : NUL= L, > + offset, &dummy, gsi, &ptr_incr, > + simd_lane_access_p, bump); > + } > + else > + { > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > + /* DR_CHAIN is then used as an input to vect_permute_store_chai= n(). > + If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAI= N is > + of size 1. */ > + for (i =3D 0; i < group_size; i++) > + { > + vec_oprnd =3D (*gvec_oprnds[i])[j]; > + dr_chain[i] =3D vec_oprnd; > + } > + if (mask) > + vec_mask =3D vec_masks[j]; > + if (dataref_offset) > + dataref_offset =3D int_const_binop (PLUS_EXPR, dataref_offset= , bump); > + else > + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, ptr_incr= , gsi, > + stmt_info, bump); > + } > + > + new_stmt =3D NULL; > + if (grouped_store) > + /* Permute. */ > + vect_permute_store_chain (vinfo, dr_chain, group_size, stmt_info,= gsi, > + &result_chain); > + > + stmt_vec_info next_stmt_info =3D first_stmt_info; > + for (i =3D 0; i < vec_num; i++) > + { > + unsigned misalign; > + unsigned HOST_WIDE_INT align; > + > + tree final_mask =3D NULL_TREE; > + tree final_len =3D NULL_TREE; > + tree bias =3D NULL_TREE; > + if (loop_masks) > + final_mask =3D vect_get_loop_mask (loop_vinfo, gsi, loop_mask= s, > + vec_num * ncopies, vectype, > + vec_num * j + i); > + if (vec_mask) > + final_mask =3D prepare_vec_mask (loop_vinfo, mask_vectype, fi= nal_mask, > + vec_mask, gsi); > > if (i > 0) > /* Bump the vector pointer. */ > -- > 2.31.1