From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8E5F03858C36 for ; Mon, 14 Aug 2023 12:49:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8E5F03858C36 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37ECgTFb028111; Mon, 14 Aug 2023 12:49:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=KRfwtO5ewf+6r3sceWXHf7uNK1i7J6DlVeYBQcHvdVk=; b=nHSAmyanh0BpN/T0dQyuskSeYUOh+Gf1sAYOoHF5lRRCuweBP9picJCL1qU19M+2HvQl swqEoGakn7WiXbvEuVNljd2TOCu6qnn1/O+rQi+NulAwTYD6MvaULKN/8gEVrP904sfe DjjODCQOPKMNSeSBFKotrLiXZZW0MEc57vefiYfQHA6BcLHp2gHm5lMTw/IZlFeP3Sev IixD+hEGUnbs0t2+WkVshEDH3eBMtk/FRCOPjTCB5ydTSXnx4CJ1dzW2UF/+6R7594ZC eu+kcwdBeLdNZACXPbv99B+g2TfDFYI3JzInbgz/awjR/1mT8XOMZWHi1NkoXq6/Vn9S Bg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sfmgkr5up-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Aug 2023 12:49:55 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 37ECi98l002858; Mon, 14 Aug 2023 12:49:55 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sfmgkr5tp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Aug 2023 12:49:55 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 37EBkadv013413; Mon, 14 Aug 2023 12:49:54 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3sepmjbu87-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Aug 2023 12:49:53 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 37ECnpk316515612 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Aug 2023 12:49:51 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C721520043; Mon, 14 Aug 2023 12:49:51 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E32BC20040; Mon, 14 Aug 2023 12:49:49 +0000 (GMT) Received: from [9.200.51.201] (unknown [9.200.51.201]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 14 Aug 2023 12:49:49 +0000 (GMT) Message-ID: Date: Mon, 14 Aug 2023 20:49:48 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest Content-Language: en-US To: Richard Biener Cc: GCC Patches , Richard Sandiford References: From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9s85iWrJs1XILAQdbfLzTqeHYv0hzIQz X-Proofpoint-ORIG-GUID: 9pEtjiXsvdAlAw5SArxL9spD5avZgTYS Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-14_08,2023-08-10_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1015 bulkscore=0 adultscore=0 spamscore=0 impostorscore=0 malwarescore=0 suspectscore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308140116 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richi, on 2023/8/14 20:04, Richard Biener wrote: > On Mon, Aug 14, 2023 at 10:54 AM Kewen.Lin wrote: >> >> Hi, >> >> Following Richi's suggestion [1], this patch is to move the >> handlings on VMAT_LOAD_STORE_LANES in the final loop nest >> of function vectorizable_load to its own loop. Basically >> it duplicates the final loop nest, clean up some useless >> set up code for the case of VMAT_LOAD_STORE_LANES, remove >> some unreachable code. Also remove the corresponding >> handlings in the final loop nest. >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > OK (I guess the big diff is mostly because of re-indenting). Thanks! Yes, there is some code in the original final loop nest like if (memory_access_type == VMAT_LOAD_STORE_LANES) { ... } else { ... } Then the else arm is fully re-indented. The other patch on VMAT_GATHER_SCATTER looks a bit better since it doesn't need re-indenting. BR, Kewen > > Thanks, > Richard. > >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html >> >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (vectorizable_load): Move the handlings on >> VMAT_LOAD_STORE_LANES in the final loop nest to its own loop, >> and update the final nest accordingly. >> --- >> gcc/tree-vect-stmts.cc | 1275 ++++++++++++++++++++-------------------- >> 1 file changed, 634 insertions(+), 641 deletions(-) >> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 4f2d088484c..c361e16cb7b 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -10332,7 +10332,129 @@ vectorizable_load (vec_info *vinfo, >> vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, mask, >> &vec_masks, mask_vectype); >> } >> + >> tree vec_mask = NULL_TREE; >> + if (memory_access_type == VMAT_LOAD_STORE_LANES) >> + { >> + gcc_assert (alignment_support_scheme == dr_aligned >> + || alignment_support_scheme == dr_unaligned_supported); >> + gcc_assert (grouped_load && !slp); >> + >> + unsigned int inside_cost = 0, prologue_cost = 0; >> + for (j = 0; j < ncopies; j++) >> + { >> + if (costing_p) >> + { >> + /* An IFN_LOAD_LANES will load all its vector results, >> + regardless of which ones we actually need. Account >> + for the cost of unused results. */ >> + if (first_stmt_info == stmt_info) >> + { >> + unsigned int gaps = DR_GROUP_SIZE (first_stmt_info); >> + stmt_vec_info next_stmt_info = first_stmt_info; >> + do >> + { >> + gaps -= 1; >> + next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); >> + } >> + while (next_stmt_info); >> + if (gaps) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_model_load_cost: %d " >> + "unused vectors.\n", >> + gaps); >> + vect_get_load_cost (vinfo, stmt_info, gaps, >> + alignment_support_scheme, >> + misalignment, false, &inside_cost, >> + &prologue_cost, cost_vec, cost_vec, >> + true); >> + } >> + } >> + vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_scheme, >> + misalignment, false, &inside_cost, >> + &prologue_cost, cost_vec, cost_vec, true); >> + continue; >> + } >> + >> + /* 1. Create the vector or array pointer update chain. */ >> + if (j == 0) >> + dataref_ptr >> + = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, >> + at_loop, offset, &dummy, gsi, >> + &ptr_incr, false, bump); >> + else >> + { >> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); >> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, >> + stmt_info, bump); >> + } >> + if (mask) >> + vec_mask = vec_masks[j]; >> + >> + tree vec_array = create_vector_array (vectype, vec_num); >> + >> + tree final_mask = NULL_TREE; >> + if (loop_masks) >> + final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, >> + ncopies, vectype, j); >> + if (vec_mask) >> + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, final_mask, >> + vec_mask, gsi); >> + >> + gcall *call; >> + if (final_mask) >> + { >> + /* Emit: >> + VEC_ARRAY = MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR, >> + VEC_MASK). */ >> + unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); >> + tree alias_ptr = build_int_cst (ref_type, align); >> + call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, >> + dataref_ptr, alias_ptr, >> + final_mask); >> + } >> + else >> + { >> + /* Emit: >> + VEC_ARRAY = LOAD_LANES (MEM_REF[...all elements...]). */ >> + data_ref = create_array_ref (aggr_type, dataref_ptr, ref_type); >> + call = gimple_build_call_internal (IFN_LOAD_LANES, 1, data_ref); >> + } >> + gimple_call_set_lhs (call, vec_array); >> + gimple_call_set_nothrow (call, true); >> + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); >> + >> + dr_chain.create (vec_num); >> + /* Extract each vector into an SSA_NAME. */ >> + for (i = 0; i < vec_num; i++) >> + { >> + new_temp = read_vector_array (vinfo, stmt_info, gsi, scalar_dest, >> + vec_array, i); >> + dr_chain.quick_push (new_temp); >> + } >> + >> + /* Record the mapping between SSA_NAMEs and statements. */ >> + vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain); >> + >> + /* Record that VEC_ARRAY is now dead. */ >> + vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); >> + >> + dr_chain.release (); >> + >> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; >> + } >> + >> + if (costing_p && dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_model_load_cost: inside_cost = %u, " >> + "prologue_cost = %u .\n", >> + inside_cost, prologue_cost); >> + >> + return true; >> + } >> + >> poly_uint64 group_elt = 0; >> unsigned int inside_cost = 0, prologue_cost = 0; >> for (j = 0; j < ncopies; j++) >> @@ -10414,685 +10538,558 @@ vectorizable_load (vec_info *vinfo, >> dr_chain.create (vec_num); >> >> gimple *new_stmt = NULL; >> - if (memory_access_type == VMAT_LOAD_STORE_LANES) >> + for (i = 0; i < vec_num; i++) >> { >> - if (costing_p) >> - { >> - /* An IFN_LOAD_LANES will load all its vector results, >> - regardless of which ones we actually need. Account >> - for the cost of unused results. */ >> - if (grouped_load && first_stmt_info == stmt_info) >> - { >> - unsigned int gaps = DR_GROUP_SIZE (first_stmt_info); >> - stmt_vec_info next_stmt_info = first_stmt_info; >> - do >> - { >> - gaps -= 1; >> - next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); >> - } >> - while (next_stmt_info); >> - if (gaps) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_NOTE, vect_location, >> - "vect_model_load_cost: %d " >> - "unused vectors.\n", >> - gaps); >> - vect_get_load_cost (vinfo, stmt_info, gaps, >> - alignment_support_scheme, >> - misalignment, false, &inside_cost, >> - &prologue_cost, cost_vec, cost_vec, >> - true); >> - } >> - } >> - vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_scheme, >> - misalignment, false, &inside_cost, >> - &prologue_cost, cost_vec, cost_vec, true); >> - continue; >> - } >> - tree vec_array; >> - >> - vec_array = create_vector_array (vectype, vec_num); >> - >> tree final_mask = NULL_TREE; >> - if (loop_masks) >> - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, >> - ncopies, vectype, j); >> - if (vec_mask) >> - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, >> - final_mask, vec_mask, gsi); >> - >> - gcall *call; >> - if (final_mask) >> - { >> - /* Emit: >> - VEC_ARRAY = MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR, >> - VEC_MASK). */ >> - unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); >> - tree alias_ptr = build_int_cst (ref_type, align); >> - call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, >> - dataref_ptr, alias_ptr, >> - final_mask); >> - } >> - else >> + tree final_len = NULL_TREE; >> + tree bias = NULL_TREE; >> + if (!costing_p) >> { >> - /* Emit: >> - VEC_ARRAY = LOAD_LANES (MEM_REF[...all elements...]). */ >> - data_ref = create_array_ref (aggr_type, dataref_ptr, ref_type); >> - call = gimple_build_call_internal (IFN_LOAD_LANES, 1, data_ref); >> - } >> - gimple_call_set_lhs (call, vec_array); >> - gimple_call_set_nothrow (call, true); >> - vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); >> - new_stmt = call; >> + if (loop_masks) >> + final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, >> + vec_num * ncopies, vectype, >> + vec_num * j + i); >> + if (vec_mask) >> + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, >> + final_mask, vec_mask, gsi); >> >> - /* Extract each vector into an SSA_NAME. */ >> - for (i = 0; i < vec_num; i++) >> - { >> - new_temp = read_vector_array (vinfo, stmt_info, gsi, scalar_dest, >> - vec_array, i); >> - dr_chain.quick_push (new_temp); >> + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, >> + gsi, stmt_info, bump); >> } >> >> - /* Record the mapping between SSA_NAMEs and statements. */ >> - vect_record_grouped_load_vectors (vinfo, stmt_info, dr_chain); >> - >> - /* Record that VEC_ARRAY is now dead. */ >> - vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); >> - } >> - else >> - { >> - for (i = 0; i < vec_num; i++) >> + /* 2. Create the vector-load in the loop. */ >> + switch (alignment_support_scheme) >> { >> - tree final_mask = NULL_TREE; >> - tree final_len = NULL_TREE; >> - tree bias = NULL_TREE; >> - if (!costing_p) >> - { >> - if (loop_masks) >> - final_mask >> - = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, >> - vec_num * ncopies, vectype, >> - vec_num * j + i); >> - if (vec_mask) >> - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, >> - final_mask, vec_mask, gsi); >> - >> - if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >> - dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, >> - gsi, stmt_info, bump); >> - } >> + case dr_aligned: >> + case dr_unaligned_supported: >> + { >> + unsigned int misalign; >> + unsigned HOST_WIDE_INT align; >> >> - /* 2. Create the vector-load in the loop. */ >> - switch (alignment_support_scheme) >> - { >> - case dr_aligned: >> - case dr_unaligned_supported: >> + if (memory_access_type == VMAT_GATHER_SCATTER >> + && gs_info.ifn != IFN_LAST) >> { >> - unsigned int misalign; >> - unsigned HOST_WIDE_INT align; >> - >> - if (memory_access_type == VMAT_GATHER_SCATTER >> - && gs_info.ifn != IFN_LAST) >> + if (costing_p) >> { >> - if (costing_p) >> - { >> - unsigned int cnunits >> - = vect_nunits_for_cost (vectype); >> - inside_cost >> - = record_stmt_cost (cost_vec, cnunits, >> - scalar_load, stmt_info, 0, >> - vect_body); >> - break; >> - } >> - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >> - vec_offset = vec_offsets[vec_num * j + i]; >> - tree zero = build_zero_cst (vectype); >> - tree scale = size_int (gs_info.scale); >> - >> - if (gs_info.ifn == IFN_MASK_LEN_GATHER_LOAD) >> - { >> - if (loop_lens) >> - final_len >> - = vect_get_loop_len (loop_vinfo, gsi, loop_lens, >> - vec_num * ncopies, vectype, >> - vec_num * j + i, 1); >> - else >> - final_len = build_int_cst (sizetype, >> - TYPE_VECTOR_SUBPARTS ( >> - vectype)); >> - signed char biasval >> - = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); >> - bias = build_int_cst (intQI_type_node, biasval); >> - if (!final_mask) >> - { >> - mask_vectype = truth_type_for (vectype); >> - final_mask = build_minus_one_cst (mask_vectype); >> - } >> - } >> - >> - gcall *call; >> - if (final_len && final_mask) >> - call = gimple_build_call_internal ( >> - IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, >> - vec_offset, scale, zero, final_mask, final_len, >> - bias); >> - else if (final_mask) >> - call = gimple_build_call_internal >> - (IFN_MASK_GATHER_LOAD, 5, dataref_ptr, >> - vec_offset, scale, zero, final_mask); >> - else >> - call = gimple_build_call_internal >> - (IFN_GATHER_LOAD, 4, dataref_ptr, >> - vec_offset, scale, zero); >> - gimple_call_set_nothrow (call, true); >> - new_stmt = call; >> - data_ref = NULL_TREE; >> + unsigned int cnunits = vect_nunits_for_cost (vectype); >> + inside_cost >> + = record_stmt_cost (cost_vec, cnunits, scalar_load, >> + stmt_info, 0, vect_body); >> break; >> } >> - else if (memory_access_type == VMAT_GATHER_SCATTER) >> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >> + vec_offset = vec_offsets[vec_num * j + i]; >> + tree zero = build_zero_cst (vectype); >> + tree scale = size_int (gs_info.scale); >> + >> + if (gs_info.ifn == IFN_MASK_LEN_GATHER_LOAD) >> { >> - /* Emulated gather-scatter. */ >> - gcc_assert (!final_mask); >> - unsigned HOST_WIDE_INT const_nunits >> - = nunits.to_constant (); >> - if (costing_p) >> - { >> - /* For emulated gathers N offset vector element >> - offset add is consumed by the load). */ >> - inside_cost >> - = record_stmt_cost (cost_vec, const_nunits, >> - vec_to_scalar, stmt_info, 0, >> - vect_body); >> - /* N scalar loads plus gathering them into a >> - vector. */ >> - inside_cost >> - = record_stmt_cost (cost_vec, const_nunits, >> - scalar_load, stmt_info, 0, >> - vect_body); >> - inside_cost >> - = record_stmt_cost (cost_vec, 1, vec_construct, >> - stmt_info, 0, vect_body); >> - break; >> - } >> - unsigned HOST_WIDE_INT const_offset_nunits >> - = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) >> - .to_constant (); >> - vec *ctor_elts; >> - vec_alloc (ctor_elts, const_nunits); >> - gimple_seq stmts = NULL; >> - /* We support offset vectors with more elements >> - than the data vector for now. */ >> - unsigned HOST_WIDE_INT factor >> - = const_offset_nunits / const_nunits; >> - vec_offset = vec_offsets[j / factor]; >> - unsigned elt_offset = (j % factor) * const_nunits; >> - tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); >> - tree scale = size_int (gs_info.scale); >> - align >> - = get_object_alignment (DR_REF (first_dr_info->dr)); >> - tree ltype = build_aligned_type (TREE_TYPE (vectype), >> - align); >> - for (unsigned k = 0; k < const_nunits; ++k) >> + if (loop_lens) >> + final_len >> + = vect_get_loop_len (loop_vinfo, gsi, loop_lens, >> + vec_num * ncopies, vectype, >> + vec_num * j + i, 1); >> + else >> + final_len >> + = build_int_cst (sizetype, >> + TYPE_VECTOR_SUBPARTS (vectype)); >> + signed char biasval >> + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); >> + bias = build_int_cst (intQI_type_node, biasval); >> + if (!final_mask) >> { >> - tree boff = size_binop (MULT_EXPR, >> - TYPE_SIZE (idx_type), >> - bitsize_int >> - (k + elt_offset)); >> - tree idx = gimple_build (&stmts, BIT_FIELD_REF, >> - idx_type, vec_offset, >> - TYPE_SIZE (idx_type), >> - boff); >> - idx = gimple_convert (&stmts, sizetype, idx); >> - idx = gimple_build (&stmts, MULT_EXPR, >> - sizetype, idx, scale); >> - tree ptr = gimple_build (&stmts, PLUS_EXPR, >> - TREE_TYPE (dataref_ptr), >> - dataref_ptr, idx); >> - ptr = gimple_convert (&stmts, ptr_type_node, ptr); >> - tree elt = make_ssa_name (TREE_TYPE (vectype)); >> - tree ref = build2 (MEM_REF, ltype, ptr, >> - build_int_cst (ref_type, 0)); >> - new_stmt = gimple_build_assign (elt, ref); >> - gimple_set_vuse (new_stmt, >> - gimple_vuse (gsi_stmt (*gsi))); >> - gimple_seq_add_stmt (&stmts, new_stmt); >> - CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, elt); >> + mask_vectype = truth_type_for (vectype); >> + final_mask = build_minus_one_cst (mask_vectype); >> } >> - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >> - new_stmt = gimple_build_assign (NULL_TREE, >> - build_constructor >> - (vectype, ctor_elts)); >> - data_ref = NULL_TREE; >> - break; >> } >> >> - if (costing_p) >> - break; >> - >> - align = >> - known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); >> - if (alignment_support_scheme == dr_aligned) >> - misalign = 0; >> - else if (misalignment == DR_MISALIGNMENT_UNKNOWN) >> - { >> - align = dr_alignment >> - (vect_dr_behavior (vinfo, first_dr_info)); >> - misalign = 0; >> - } >> + gcall *call; >> + if (final_len && final_mask) >> + call = gimple_build_call_internal ( >> + IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, vec_offset, >> + scale, zero, final_mask, final_len, bias); >> + else if (final_mask) >> + call >> + = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, 5, >> + dataref_ptr, vec_offset, >> + scale, zero, final_mask); >> else >> - misalign = misalignment; >> - if (dataref_offset == NULL_TREE >> - && TREE_CODE (dataref_ptr) == SSA_NAME) >> - set_ptr_info_alignment (get_ptr_info (dataref_ptr), >> - align, misalign); >> - align = least_bit_hwi (misalign | align); >> - >> - /* Compute IFN when LOOP_LENS or final_mask valid. */ >> - machine_mode vmode = TYPE_MODE (vectype); >> - machine_mode new_vmode = vmode; >> - internal_fn partial_ifn = IFN_LAST; >> - if (loop_lens) >> + call >> + = gimple_build_call_internal (IFN_GATHER_LOAD, 4, >> + dataref_ptr, vec_offset, >> + scale, zero); >> + gimple_call_set_nothrow (call, true); >> + new_stmt = call; >> + data_ref = NULL_TREE; >> + break; >> + } >> + else if (memory_access_type == VMAT_GATHER_SCATTER) >> + { >> + /* Emulated gather-scatter. */ >> + gcc_assert (!final_mask); >> + unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); >> + if (costing_p) >> { >> - opt_machine_mode new_ovmode >> - = get_len_load_store_mode (vmode, true, >> - &partial_ifn); >> - new_vmode = new_ovmode.require (); >> - unsigned factor = (new_ovmode == vmode) >> - ? 1 >> - : GET_MODE_UNIT_SIZE (vmode); >> - final_len >> - = vect_get_loop_len (loop_vinfo, gsi, loop_lens, >> - vec_num * ncopies, vectype, >> - vec_num * j + i, factor); >> + /* For emulated gathers N offset vector element >> + offset add is consumed by the load). */ >> + inside_cost >> + = record_stmt_cost (cost_vec, const_nunits, >> + vec_to_scalar, stmt_info, 0, >> + vect_body); >> + /* N scalar loads plus gathering them into a >> + vector. */ >> + inside_cost = record_stmt_cost (cost_vec, const_nunits, >> + scalar_load, stmt_info, >> + 0, vect_body); >> + inside_cost >> + = record_stmt_cost (cost_vec, 1, vec_construct, >> + stmt_info, 0, vect_body); >> + break; >> } >> - else if (final_mask) >> + unsigned HOST_WIDE_INT const_offset_nunits >> + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) >> + .to_constant (); >> + vec *ctor_elts; >> + vec_alloc (ctor_elts, const_nunits); >> + gimple_seq stmts = NULL; >> + /* We support offset vectors with more elements >> + than the data vector for now. */ >> + unsigned HOST_WIDE_INT factor >> + = const_offset_nunits / const_nunits; >> + vec_offset = vec_offsets[j / factor]; >> + unsigned elt_offset = (j % factor) * const_nunits; >> + tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); >> + tree scale = size_int (gs_info.scale); >> + align = get_object_alignment (DR_REF (first_dr_info->dr)); >> + tree ltype >> + = build_aligned_type (TREE_TYPE (vectype), align); >> + for (unsigned k = 0; k < const_nunits; ++k) >> { >> - if (!can_vec_mask_load_store_p ( >> - vmode, TYPE_MODE (TREE_TYPE (final_mask)), true, >> - &partial_ifn)) >> - gcc_unreachable (); >> + tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), >> + bitsize_int (k + elt_offset)); >> + tree idx = gimple_build (&stmts, BIT_FIELD_REF, >> + idx_type, vec_offset, >> + TYPE_SIZE (idx_type), boff); >> + idx = gimple_convert (&stmts, sizetype, idx); >> + idx = gimple_build (&stmts, MULT_EXPR, sizetype, idx, >> + scale); >> + tree ptr = gimple_build (&stmts, PLUS_EXPR, >> + TREE_TYPE (dataref_ptr), >> + dataref_ptr, idx); >> + ptr = gimple_convert (&stmts, ptr_type_node, ptr); >> + tree elt = make_ssa_name (TREE_TYPE (vectype)); >> + tree ref = build2 (MEM_REF, ltype, ptr, >> + build_int_cst (ref_type, 0)); >> + new_stmt = gimple_build_assign (elt, ref); >> + gimple_set_vuse (new_stmt, >> + gimple_vuse (gsi_stmt (*gsi))); >> + gimple_seq_add_stmt (&stmts, new_stmt); >> + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, elt); >> } >> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >> + new_stmt = gimple_build_assign ( >> + NULL_TREE, build_constructor (vectype, ctor_elts)); >> + data_ref = NULL_TREE; >> + break; >> + } >> >> - if (partial_ifn == IFN_MASK_LEN_LOAD) >> + if (costing_p) >> + break; >> + >> + align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); >> + if (alignment_support_scheme == dr_aligned) >> + misalign = 0; >> + else if (misalignment == DR_MISALIGNMENT_UNKNOWN) >> + { >> + align >> + = dr_alignment (vect_dr_behavior (vinfo, first_dr_info)); >> + misalign = 0; >> + } >> + else >> + misalign = misalignment; >> + if (dataref_offset == NULL_TREE >> + && TREE_CODE (dataref_ptr) == SSA_NAME) >> + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, >> + misalign); >> + align = least_bit_hwi (misalign | align); >> + >> + /* Compute IFN when LOOP_LENS or final_mask valid. */ >> + machine_mode vmode = TYPE_MODE (vectype); >> + machine_mode new_vmode = vmode; >> + internal_fn partial_ifn = IFN_LAST; >> + if (loop_lens) >> + { >> + opt_machine_mode new_ovmode >> + = get_len_load_store_mode (vmode, true, &partial_ifn); >> + new_vmode = new_ovmode.require (); >> + unsigned factor >> + = (new_ovmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vmode); >> + final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, >> + vec_num * ncopies, vectype, >> + vec_num * j + i, factor); >> + } >> + else if (final_mask) >> + { >> + if (!can_vec_mask_load_store_p ( >> + vmode, TYPE_MODE (TREE_TYPE (final_mask)), true, >> + &partial_ifn)) >> + gcc_unreachable (); >> + } >> + >> + if (partial_ifn == IFN_MASK_LEN_LOAD) >> + { >> + if (!final_len) >> { >> - if (!final_len) >> - { >> - /* Pass VF value to 'len' argument of >> - MASK_LEN_LOAD if LOOP_LENS is invalid. */ >> - final_len >> - = size_int (TYPE_VECTOR_SUBPARTS (vectype)); >> - } >> - if (!final_mask) >> - { >> - /* Pass all ones value to 'mask' argument of >> - MASK_LEN_LOAD if final_mask is invalid. */ >> - mask_vectype = truth_type_for (vectype); >> - final_mask = build_minus_one_cst (mask_vectype); >> - } >> + /* Pass VF value to 'len' argument of >> + MASK_LEN_LOAD if LOOP_LENS is invalid. */ >> + final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype)); >> } >> - if (final_len) >> + if (!final_mask) >> { >> - signed char biasval >> - = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); >> - >> - bias = build_int_cst (intQI_type_node, biasval); >> + /* Pass all ones value to 'mask' argument of >> + MASK_LEN_LOAD if final_mask is invalid. */ >> + mask_vectype = truth_type_for (vectype); >> + final_mask = build_minus_one_cst (mask_vectype); >> } >> + } >> + if (final_len) >> + { >> + signed char biasval >> + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); >> >> - if (final_len) >> + bias = build_int_cst (intQI_type_node, biasval); >> + } >> + >> + if (final_len) >> + { >> + tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); >> + gcall *call; >> + if (partial_ifn == IFN_MASK_LEN_LOAD) >> + call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, 5, >> + dataref_ptr, ptr, >> + final_mask, final_len, >> + bias); >> + else >> + call = gimple_build_call_internal (IFN_LEN_LOAD, 4, >> + dataref_ptr, ptr, >> + final_len, bias); >> + gimple_call_set_nothrow (call, true); >> + new_stmt = call; >> + data_ref = NULL_TREE; >> + >> + /* Need conversion if it's wrapped with VnQI. */ >> + if (vmode != new_vmode) >> { >> - tree ptr >> - = build_int_cst (ref_type, align * BITS_PER_UNIT); >> - gcall *call; >> - if (partial_ifn == IFN_MASK_LEN_LOAD) >> - call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, >> - 5, dataref_ptr, >> - ptr, final_mask, >> - final_len, bias); >> - else >> - call = gimple_build_call_internal (IFN_LEN_LOAD, 4, >> - dataref_ptr, ptr, >> - final_len, bias); >> - gimple_call_set_nothrow (call, true); >> - new_stmt = call; >> - data_ref = NULL_TREE; >> - >> - /* Need conversion if it's wrapped with VnQI. */ >> - if (vmode != new_vmode) >> - { >> - tree new_vtype = build_vector_type_for_mode ( >> - unsigned_intQI_type_node, new_vmode); >> - tree var = vect_get_new_ssa_name (new_vtype, >> - vect_simple_var); >> - gimple_set_lhs (call, var); >> - vect_finish_stmt_generation (vinfo, stmt_info, call, >> - gsi); >> - tree op = build1 (VIEW_CONVERT_EXPR, vectype, var); >> - new_stmt >> - = gimple_build_assign (vec_dest, >> - VIEW_CONVERT_EXPR, op); >> - } >> + tree new_vtype = build_vector_type_for_mode ( >> + unsigned_intQI_type_node, new_vmode); >> + tree var >> + = vect_get_new_ssa_name (new_vtype, vect_simple_var); >> + gimple_set_lhs (call, var); >> + vect_finish_stmt_generation (vinfo, stmt_info, call, >> + gsi); >> + tree op = build1 (VIEW_CONVERT_EXPR, vectype, var); >> + new_stmt = gimple_build_assign (vec_dest, >> + VIEW_CONVERT_EXPR, op); >> } >> - else if (final_mask) >> + } >> + else if (final_mask) >> + { >> + tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); >> + gcall *call = gimple_build_call_internal (IFN_MASK_LOAD, 3, >> + dataref_ptr, ptr, >> + final_mask); >> + gimple_call_set_nothrow (call, true); >> + new_stmt = call; >> + data_ref = NULL_TREE; >> + } >> + else >> + { >> + tree ltype = vectype; >> + tree new_vtype = NULL_TREE; >> + unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info); >> + unsigned int vect_align >> + = vect_known_alignment_in_bytes (first_dr_info, vectype); >> + unsigned int scalar_dr_size >> + = vect_get_scalar_dr_size (first_dr_info); >> + /* If there's no peeling for gaps but we have a gap >> + with slp loads then load the lower half of the >> + vector only. See get_group_load_store_type for >> + when we apply this optimization. */ >> + if (slp >> + && loop_vinfo >> + && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && gap != 0 >> + && known_eq (nunits, (group_size - gap) * 2) >> + && known_eq (nunits, group_size) >> + && gap >= (vect_align / scalar_dr_size)) >> { >> - tree ptr = build_int_cst (ref_type, >> - align * BITS_PER_UNIT); >> - gcall *call >> - = gimple_build_call_internal (IFN_MASK_LOAD, 3, >> - dataref_ptr, ptr, >> - final_mask); >> - gimple_call_set_nothrow (call, true); >> - new_stmt = call; >> - data_ref = NULL_TREE; >> + tree half_vtype; >> + new_vtype >> + = vector_vector_composition_type (vectype, 2, >> + &half_vtype); >> + if (new_vtype != NULL_TREE) >> + ltype = half_vtype; >> } >> + tree offset >> + = (dataref_offset ? dataref_offset >> + : build_int_cst (ref_type, 0)); >> + if (ltype != vectype >> + && memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> + { >> + unsigned HOST_WIDE_INT gap_offset >> + = gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_type)); >> + tree gapcst = build_int_cst (ref_type, gap_offset); >> + offset = size_binop (PLUS_EXPR, offset, gapcst); >> + } >> + data_ref >> + = fold_build2 (MEM_REF, ltype, dataref_ptr, offset); >> + if (alignment_support_scheme == dr_aligned) >> + ; >> else >> + TREE_TYPE (data_ref) >> + = build_aligned_type (TREE_TYPE (data_ref), >> + align * BITS_PER_UNIT); >> + if (ltype != vectype) >> { >> - tree ltype = vectype; >> - tree new_vtype = NULL_TREE; >> - unsigned HOST_WIDE_INT gap >> - = DR_GROUP_GAP (first_stmt_info); >> - unsigned int vect_align >> - = vect_known_alignment_in_bytes (first_dr_info, >> - vectype); >> - unsigned int scalar_dr_size >> - = vect_get_scalar_dr_size (first_dr_info); >> - /* If there's no peeling for gaps but we have a gap >> - with slp loads then load the lower half of the >> - vector only. See get_group_load_store_type for >> - when we apply this optimization. */ >> - if (slp >> - && loop_vinfo >> - && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) >> - && gap != 0 >> - && known_eq (nunits, (group_size - gap) * 2) >> - && known_eq (nunits, group_size) >> - && gap >= (vect_align / scalar_dr_size)) >> + vect_copy_ref_info (data_ref, >> + DR_REF (first_dr_info->dr)); >> + tree tem = make_ssa_name (ltype); >> + new_stmt = gimple_build_assign (tem, data_ref); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, >> + gsi); >> + data_ref = NULL; >> + vec *v; >> + vec_alloc (v, 2); >> + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> { >> - tree half_vtype; >> - new_vtype >> - = vector_vector_composition_type (vectype, 2, >> - &half_vtype); >> - if (new_vtype != NULL_TREE) >> - ltype = half_vtype; >> + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >> + build_zero_cst (ltype)); >> + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); >> } >> - tree offset >> - = (dataref_offset ? dataref_offset >> - : build_int_cst (ref_type, 0)); >> - if (ltype != vectype >> - && memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> + else >> { >> - unsigned HOST_WIDE_INT gap_offset >> - = gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_type)); >> - tree gapcst = build_int_cst (ref_type, gap_offset); >> - offset = size_binop (PLUS_EXPR, offset, gapcst); >> + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); >> + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >> + build_zero_cst (ltype)); >> } >> - data_ref >> - = fold_build2 (MEM_REF, ltype, dataref_ptr, offset); >> - if (alignment_support_scheme == dr_aligned) >> - ; >> + gcc_assert (new_vtype != NULL_TREE); >> + if (new_vtype == vectype) >> + new_stmt = gimple_build_assign ( >> + vec_dest, build_constructor (vectype, v)); >> else >> - TREE_TYPE (data_ref) >> - = build_aligned_type (TREE_TYPE (data_ref), >> - align * BITS_PER_UNIT); >> - if (ltype != vectype) >> { >> - vect_copy_ref_info (data_ref, >> - DR_REF (first_dr_info->dr)); >> - tree tem = make_ssa_name (ltype); >> - new_stmt = gimple_build_assign (tem, data_ref); >> + tree new_vname = make_ssa_name (new_vtype); >> + new_stmt = gimple_build_assign ( >> + new_vname, build_constructor (new_vtype, v)); >> vect_finish_stmt_generation (vinfo, stmt_info, >> new_stmt, gsi); >> - data_ref = NULL; >> - vec *v; >> - vec_alloc (v, 2); >> - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> - { >> - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >> - build_zero_cst (ltype)); >> - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); >> - } >> - else >> - { >> - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); >> - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, >> - build_zero_cst (ltype)); >> - } >> - gcc_assert (new_vtype != NULL_TREE); >> - if (new_vtype == vectype) >> - new_stmt = gimple_build_assign ( >> - vec_dest, build_constructor (vectype, v)); >> - else >> - { >> - tree new_vname = make_ssa_name (new_vtype); >> - new_stmt = gimple_build_assign ( >> - new_vname, build_constructor (new_vtype, v)); >> - vect_finish_stmt_generation (vinfo, stmt_info, >> - new_stmt, gsi); >> - new_stmt = gimple_build_assign ( >> - vec_dest, build1 (VIEW_CONVERT_EXPR, vectype, >> - new_vname)); >> - } >> + new_stmt = gimple_build_assign ( >> + vec_dest, >> + build1 (VIEW_CONVERT_EXPR, vectype, new_vname)); >> } >> } >> - break; >> } >> - case dr_explicit_realign: >> - { >> - if (costing_p) >> - break; >> - tree ptr, bump; >> - >> - tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); >> + break; >> + } >> + case dr_explicit_realign: >> + { >> + if (costing_p) >> + break; >> + tree ptr, bump; >> >> - if (compute_in_loop) >> - msq = vect_setup_realignment (vinfo, first_stmt_info, gsi, >> - &realignment_token, >> - dr_explicit_realign, >> - dataref_ptr, NULL); >> + tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); >> >> - if (TREE_CODE (dataref_ptr) == SSA_NAME) >> - ptr = copy_ssa_name (dataref_ptr); >> - else >> - ptr = make_ssa_name (TREE_TYPE (dataref_ptr)); >> - // For explicit realign the target alignment should be >> - // known at compile time. >> - unsigned HOST_WIDE_INT align = >> - DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); >> - new_stmt = gimple_build_assign >> - (ptr, BIT_AND_EXPR, dataref_ptr, >> - build_int_cst >> - (TREE_TYPE (dataref_ptr), >> - -(HOST_WIDE_INT) align)); >> - vect_finish_stmt_generation (vinfo, stmt_info, >> - new_stmt, gsi); >> - data_ref >> - = build2 (MEM_REF, vectype, ptr, >> - build_int_cst (ref_type, 0)); >> - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); >> - vec_dest = vect_create_destination_var (scalar_dest, >> - vectype); >> - new_stmt = gimple_build_assign (vec_dest, data_ref); >> - new_temp = make_ssa_name (vec_dest, new_stmt); >> - gimple_assign_set_lhs (new_stmt, new_temp); >> - gimple_move_vops (new_stmt, stmt_info->stmt); >> - vect_finish_stmt_generation (vinfo, stmt_info, >> - new_stmt, gsi); >> - msq = new_temp; >> - >> - bump = size_binop (MULT_EXPR, vs, >> - TYPE_SIZE_UNIT (elem_type)); >> - bump = size_binop (MINUS_EXPR, bump, size_one_node); >> - ptr = bump_vector_ptr (vinfo, dataref_ptr, NULL, gsi, >> - stmt_info, bump); >> - new_stmt = gimple_build_assign >> - (NULL_TREE, BIT_AND_EXPR, ptr, >> - build_int_cst >> - (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)); >> - if (TREE_CODE (ptr) == SSA_NAME) >> - ptr = copy_ssa_name (ptr, new_stmt); >> - else >> - ptr = make_ssa_name (TREE_TYPE (ptr), new_stmt); >> - gimple_assign_set_lhs (new_stmt, ptr); >> - vect_finish_stmt_generation (vinfo, stmt_info, >> - new_stmt, gsi); >> - data_ref >> - = build2 (MEM_REF, vectype, ptr, >> - build_int_cst (ref_type, 0)); >> - break; >> - } >> - case dr_explicit_realign_optimized: >> - { >> - if (costing_p) >> - break; >> - if (TREE_CODE (dataref_ptr) == SSA_NAME) >> - new_temp = copy_ssa_name (dataref_ptr); >> - else >> - new_temp = make_ssa_name (TREE_TYPE (dataref_ptr)); >> - // We should only be doing this if we know the target >> - // alignment at compile time. >> - unsigned HOST_WIDE_INT align = >> - DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); >> - new_stmt = gimple_build_assign >> - (new_temp, BIT_AND_EXPR, dataref_ptr, >> - build_int_cst (TREE_TYPE (dataref_ptr), >> - -(HOST_WIDE_INT) align)); >> - vect_finish_stmt_generation (vinfo, stmt_info, >> - new_stmt, gsi); >> - data_ref >> - = build2 (MEM_REF, vectype, new_temp, >> - build_int_cst (ref_type, 0)); >> - break; >> - } >> - default: >> - gcc_unreachable (); >> - } >> + if (compute_in_loop) >> + msq = vect_setup_realignment (vinfo, first_stmt_info, gsi, >> + &realignment_token, >> + dr_explicit_realign, >> + dataref_ptr, NULL); >> + >> + if (TREE_CODE (dataref_ptr) == SSA_NAME) >> + ptr = copy_ssa_name (dataref_ptr); >> + else >> + ptr = make_ssa_name (TREE_TYPE (dataref_ptr)); >> + // For explicit realign the target alignment should be >> + // known at compile time. >> + unsigned HOST_WIDE_INT align >> + = DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); >> + new_stmt = gimple_build_assign ( >> + ptr, BIT_AND_EXPR, dataref_ptr, >> + build_int_cst (TREE_TYPE (dataref_ptr), >> + -(HOST_WIDE_INT) align)); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + data_ref >> + = build2 (MEM_REF, vectype, ptr, build_int_cst (ref_type, 0)); >> + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); >> + vec_dest = vect_create_destination_var (scalar_dest, vectype); >> + new_stmt = gimple_build_assign (vec_dest, data_ref); >> + new_temp = make_ssa_name (vec_dest, new_stmt); >> + gimple_assign_set_lhs (new_stmt, new_temp); >> + gimple_move_vops (new_stmt, stmt_info->stmt); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + msq = new_temp; >> + >> + bump = size_binop (MULT_EXPR, vs, TYPE_SIZE_UNIT (elem_type)); >> + bump = size_binop (MINUS_EXPR, bump, size_one_node); >> + ptr = bump_vector_ptr (vinfo, dataref_ptr, NULL, gsi, stmt_info, >> + bump); >> + new_stmt = gimple_build_assign ( >> + NULL_TREE, BIT_AND_EXPR, ptr, >> + build_int_cst (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)); >> + if (TREE_CODE (ptr) == SSA_NAME) >> + ptr = copy_ssa_name (ptr, new_stmt); >> + else >> + ptr = make_ssa_name (TREE_TYPE (ptr), new_stmt); >> + gimple_assign_set_lhs (new_stmt, ptr); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + data_ref >> + = build2 (MEM_REF, vectype, ptr, build_int_cst (ref_type, 0)); >> + break; >> + } >> + case dr_explicit_realign_optimized: >> + { >> + if (costing_p) >> + break; >> + if (TREE_CODE (dataref_ptr) == SSA_NAME) >> + new_temp = copy_ssa_name (dataref_ptr); >> + else >> + new_temp = make_ssa_name (TREE_TYPE (dataref_ptr)); >> + // We should only be doing this if we know the target >> + // alignment at compile time. >> + unsigned HOST_WIDE_INT align >> + = DR_TARGET_ALIGNMENT (first_dr_info).to_constant (); >> + new_stmt = gimple_build_assign ( >> + new_temp, BIT_AND_EXPR, dataref_ptr, >> + build_int_cst (TREE_TYPE (dataref_ptr), >> + -(HOST_WIDE_INT) align)); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + data_ref = build2 (MEM_REF, vectype, new_temp, >> + build_int_cst (ref_type, 0)); >> + break; >> + } >> + default: >> + gcc_unreachable (); >> + } >> >> - /* One common place to cost the above vect load for different >> - alignment support schemes. */ >> - if (costing_p) >> - { >> - /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we >> - only need to take care of the first stmt, whose >> - stmt_info is first_stmt_info, vec_num iterating on it >> - will cover the cost for the remaining, it's consistent >> - with transforming. For the prologue cost for realign, >> - we only need to count it once for the whole group. */ >> - bool first_stmt_info_p = first_stmt_info == stmt_info; >> - bool add_realign_cost = first_stmt_info_p && i == 0; >> - if (memory_access_type == VMAT_CONTIGUOUS >> - || memory_access_type == VMAT_CONTIGUOUS_REVERSE >> - || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE >> - && (!grouped_load || first_stmt_info_p))) >> - vect_get_load_cost (vinfo, stmt_info, 1, >> - alignment_support_scheme, misalignment, >> - add_realign_cost, &inside_cost, >> - &prologue_cost, cost_vec, cost_vec, >> - true); >> - } >> - else >> + /* One common place to cost the above vect load for different >> + alignment support schemes. */ >> + if (costing_p) >> + { >> + /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we >> + only need to take care of the first stmt, whose >> + stmt_info is first_stmt_info, vec_num iterating on it >> + will cover the cost for the remaining, it's consistent >> + with transforming. For the prologue cost for realign, >> + we only need to count it once for the whole group. */ >> + bool first_stmt_info_p = first_stmt_info == stmt_info; >> + bool add_realign_cost = first_stmt_info_p && i == 0; >> + if (memory_access_type == VMAT_CONTIGUOUS >> + || memory_access_type == VMAT_CONTIGUOUS_REVERSE >> + || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE >> + && (!grouped_load || first_stmt_info_p))) >> + vect_get_load_cost (vinfo, stmt_info, 1, >> + alignment_support_scheme, misalignment, >> + add_realign_cost, &inside_cost, >> + &prologue_cost, cost_vec, cost_vec, true); >> + } >> + else >> + { >> + vec_dest = vect_create_destination_var (scalar_dest, vectype); >> + /* DATA_REF is null if we've already built the statement. */ >> + if (data_ref) >> { >> - vec_dest = vect_create_destination_var (scalar_dest, vectype); >> - /* DATA_REF is null if we've already built the statement. */ >> - if (data_ref) >> - { >> - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); >> - new_stmt = gimple_build_assign (vec_dest, data_ref); >> - } >> - new_temp = make_ssa_name (vec_dest, new_stmt); >> - gimple_set_lhs (new_stmt, new_temp); >> - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); >> + new_stmt = gimple_build_assign (vec_dest, data_ref); >> } >> + new_temp = make_ssa_name (vec_dest, new_stmt); >> + gimple_set_lhs (new_stmt, new_temp); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + } >> >> - /* 3. Handle explicit realignment if necessary/supported. >> - Create in loop: >> - vec_dest = realign_load (msq, lsq, realignment_token) */ >> - if (!costing_p >> - && (alignment_support_scheme == dr_explicit_realign_optimized >> - || alignment_support_scheme == dr_explicit_realign)) >> - { >> - lsq = gimple_assign_lhs (new_stmt); >> - if (!realignment_token) >> - realignment_token = dataref_ptr; >> - vec_dest = vect_create_destination_var (scalar_dest, vectype); >> - new_stmt = gimple_build_assign (vec_dest, REALIGN_LOAD_EXPR, >> - msq, lsq, realignment_token); >> - new_temp = make_ssa_name (vec_dest, new_stmt); >> - gimple_assign_set_lhs (new_stmt, new_temp); >> - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> + /* 3. Handle explicit realignment if necessary/supported. >> + Create in loop: >> + vec_dest = realign_load (msq, lsq, realignment_token) */ >> + if (!costing_p >> + && (alignment_support_scheme == dr_explicit_realign_optimized >> + || alignment_support_scheme == dr_explicit_realign)) >> + { >> + lsq = gimple_assign_lhs (new_stmt); >> + if (!realignment_token) >> + realignment_token = dataref_ptr; >> + vec_dest = vect_create_destination_var (scalar_dest, vectype); >> + new_stmt = gimple_build_assign (vec_dest, REALIGN_LOAD_EXPR, msq, >> + lsq, realignment_token); >> + new_temp = make_ssa_name (vec_dest, new_stmt); >> + gimple_assign_set_lhs (new_stmt, new_temp); >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >> >> - if (alignment_support_scheme == dr_explicit_realign_optimized) >> - { >> - gcc_assert (phi); >> - if (i == vec_num - 1 && j == ncopies - 1) >> - add_phi_arg (phi, lsq, >> - loop_latch_edge (containing_loop), >> - UNKNOWN_LOCATION); >> - msq = lsq; >> - } >> + if (alignment_support_scheme == dr_explicit_realign_optimized) >> + { >> + gcc_assert (phi); >> + if (i == vec_num - 1 && j == ncopies - 1) >> + add_phi_arg (phi, lsq, loop_latch_edge (containing_loop), >> + UNKNOWN_LOCATION); >> + msq = lsq; >> } >> + } >> >> - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) >> + { >> + if (costing_p) >> + inside_cost = record_stmt_cost (cost_vec, 1, vec_perm, >> + stmt_info, 0, vect_body); >> + else >> { >> - if (costing_p) >> - inside_cost = record_stmt_cost (cost_vec, 1, vec_perm, >> - stmt_info, 0, vect_body); >> - else >> - { >> - tree perm_mask = perm_mask_for_reverse (vectype); >> - new_temp >> - = permute_vec_elements (vinfo, new_temp, new_temp, >> - perm_mask, stmt_info, gsi); >> - new_stmt = SSA_NAME_DEF_STMT (new_temp); >> - } >> + tree perm_mask = perm_mask_for_reverse (vectype); >> + new_temp = permute_vec_elements (vinfo, new_temp, new_temp, >> + perm_mask, stmt_info, gsi); >> + new_stmt = SSA_NAME_DEF_STMT (new_temp); >> } >> + } >> >> - /* Collect vector loads and later create their permutation in >> - vect_transform_grouped_load (). */ >> - if (!costing_p && (grouped_load || slp_perm)) >> - dr_chain.quick_push (new_temp); >> + /* Collect vector loads and later create their permutation in >> + vect_transform_grouped_load (). */ >> + if (!costing_p && (grouped_load || slp_perm)) >> + dr_chain.quick_push (new_temp); >> >> - /* Store vector loads in the corresponding SLP_NODE. */ >> - if (!costing_p && slp && !slp_perm) >> - slp_node->push_vec_def (new_stmt); >> + /* Store vector loads in the corresponding SLP_NODE. */ >> + if (!costing_p && slp && !slp_perm) >> + slp_node->push_vec_def (new_stmt); >> >> - /* With SLP permutation we load the gaps as well, without >> - we need to skip the gaps after we manage to fully load >> - all elements. group_gap_adj is DR_GROUP_SIZE here. */ >> - group_elt += nunits; >> - if (!costing_p >> - && maybe_ne (group_gap_adj, 0U) >> - && !slp_perm >> - && known_eq (group_elt, group_size - group_gap_adj)) >> - { >> - poly_wide_int bump_val >> - = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) >> - * group_gap_adj); >> - if (tree_int_cst_sgn >> - (vect_dr_behavior (vinfo, dr_info)->step) == -1) >> - bump_val = -bump_val; >> - tree bump = wide_int_to_tree (sizetype, bump_val); >> - dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, >> - gsi, stmt_info, bump); >> - group_elt = 0; >> - } >> - } >> - /* Bump the vector pointer to account for a gap or for excess >> - elements loaded for a permuted SLP load. */ >> + /* With SLP permutation we load the gaps as well, without >> + we need to skip the gaps after we manage to fully load >> + all elements. group_gap_adj is DR_GROUP_SIZE here. */ >> + group_elt += nunits; >> if (!costing_p >> && maybe_ne (group_gap_adj, 0U) >> - && slp_perm) >> + && !slp_perm >> + && known_eq (group_elt, group_size - group_gap_adj)) >> { >> poly_wide_int bump_val >> - = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) >> - * group_gap_adj); >> - if (tree_int_cst_sgn >> - (vect_dr_behavior (vinfo, dr_info)->step) == -1) >> + = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap_adj); >> + if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->step) >> + == -1) >> bump_val = -bump_val; >> tree bump = wide_int_to_tree (sizetype, bump_val); >> dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, >> stmt_info, bump); >> + group_elt = 0; >> } >> } >> + /* Bump the vector pointer to account for a gap or for excess >> + elements loaded for a permuted SLP load. */ >> + if (!costing_p >> + && maybe_ne (group_gap_adj, 0U) >> + && slp_perm) >> + { >> + poly_wide_int bump_val >> + = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) * group_gap_adj); >> + if (tree_int_cst_sgn (vect_dr_behavior (vinfo, dr_info)->step) == -1) >> + bump_val = -bump_val; >> + tree bump = wide_int_to_tree (sizetype, bump_val); >> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, >> + stmt_info, bump); >> + } >> >> if (slp && !slp_perm) >> continue; >> @@ -11120,39 +11117,36 @@ vectorizable_load (vec_info *vinfo, >> } >> } >> else >> - { >> - if (grouped_load) >> - { >> - if (memory_access_type != VMAT_LOAD_STORE_LANES) >> + { >> + if (grouped_load) >> + { >> + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); >> + /* We assume that the cost of a single load-lanes instruction >> + is equivalent to the cost of DR_GROUP_SIZE separate loads. >> + If a grouped access is instead being provided by a >> + load-and-permute operation, include the cost of the >> + permutes. */ >> + if (costing_p && first_stmt_info == stmt_info) >> { >> - gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); >> - /* We assume that the cost of a single load-lanes instruction >> - is equivalent to the cost of DR_GROUP_SIZE separate loads. >> - If a grouped access is instead being provided by a >> - load-and-permute operation, include the cost of the >> - permutes. */ >> - if (costing_p && first_stmt_info == stmt_info) >> - { >> - /* Uses an even and odd extract operations or shuffle >> - operations for each needed permute. */ >> - int group_size = DR_GROUP_SIZE (first_stmt_info); >> - int nstmts = ceil_log2 (group_size) * group_size; >> - inside_cost >> - += record_stmt_cost (cost_vec, nstmts, vec_perm, >> - stmt_info, 0, vect_body); >> + /* Uses an even and odd extract operations or shuffle >> + operations for each needed permute. */ >> + int group_size = DR_GROUP_SIZE (first_stmt_info); >> + int nstmts = ceil_log2 (group_size) * group_size; >> + inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm, >> + stmt_info, 0, vect_body); >> >> - if (dump_enabled_p ()) >> - dump_printf_loc ( >> - MSG_NOTE, vect_location, >> - "vect_model_load_cost: strided group_size = %d .\n", >> - group_size); >> - } >> - else if (!costing_p) >> - vect_transform_grouped_load (vinfo, stmt_info, dr_chain, >> - group_size, gsi); >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_model_load_cost:" >> + "strided group_size = %d .\n", >> + group_size); >> + } >> + else if (!costing_p) >> + { >> + vect_transform_grouped_load (vinfo, stmt_info, dr_chain, >> + group_size, gsi); >> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; >> } >> - if (!costing_p) >> - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; >> } >> else if (!costing_p) >> STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); >> @@ -11166,7 +11160,8 @@ vectorizable_load (vec_info *vinfo, >> { >> gcc_assert (memory_access_type != VMAT_INVARIANT >> && memory_access_type != VMAT_ELEMENTWISE >> - && memory_access_type != VMAT_STRIDED_SLP); >> + && memory_access_type != VMAT_STRIDED_SLP >> + && memory_access_type != VMAT_LOAD_STORE_LANES); >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> "vect_model_load_cost: inside_cost = %u, " >> -- >> 2.31.1 >