From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 97497385840A for ; Tue, 15 Aug 2023 12:13:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 97497385840A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37FC3f2d007965; Tue, 15 Aug 2023 12:13:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=v1lWnvOcBjUWoXAVJ8ZMBOZVAgYW6CTOVWtvhu9rXYk=; b=qMa6BE7iV+Qj3INTQ7/BfKW59Q7YRL0Ell9vVwlf3y3MSJT094yoER/j702xchbuIrf7 +Wq90IDjtgbPbqKfN0eR8c8GGNj3RNkraeg/vUEAChNyyUmnbk9Ng3SoKgdoW63jQgM8 5YHORWCsiTLb7+LOu/quwZqjBwOE3fRS8FVw4Or+uOslY2ryEtpeFdYqjJyc6iaCJQ9E JN9ECn7KRO1j5aQP47EC93as48wSjmPmmxbquEF9MYdMUNZqZgcEnxhqbAfm2FBkOZjT tqwrxEgIpt6R0AQsz6SFF6nZQyk6TWqt4pUTwCtBkGmyzlmokVNpmvC6CfCjXQ7aOkC2 EQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sg919ravw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Aug 2023 12:13:50 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 37FC4OIu010411; Tue, 15 Aug 2023 12:13:50 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3sg919ravm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Aug 2023 12:13:50 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 37FCANF0013500; Tue, 15 Aug 2023 12:13:49 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3sepmjm651-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Aug 2023 12:13:49 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 37FCDmaN16122526 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 Aug 2023 12:13:48 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 09AE020040; Tue, 15 Aug 2023 12:13:48 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B870220043; Tue, 15 Aug 2023 12:13:46 +0000 (GMT) Received: from [9.197.231.56] (unknown [9.197.231.56]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 15 Aug 2023 12:13:46 +0000 (GMT) Message-ID: Date: Tue, 15 Aug 2023 20:13:45 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest Content-Language: en-US To: Richard Biener Cc: richard.sandiford@arm.com, GCC Patches References: <7314a4eb-26d0-e33e-94c2-31daca9f490e@linux.ibm.com> <4275f99b-cd1f-da4b-0bd3-3c5b50f0e93d@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: F-VVUql4XKPl8XZjAbTYGsPzvuJIKzB3 X-Proofpoint-ORIG-GUID: jJg7tuXuPoMKUqTW8LWTkv48j7OeDpj2 Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-15_10,2023-08-15_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 impostorscore=0 adultscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308150108 X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: on 2023/8/15 20:07, Richard Biener wrote: > On Tue, Aug 15, 2023 at 1:47 PM Kewen.Lin wrote: >> >> on 2023/8/15 15:53, Richard Biener wrote: >>> On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin wrote: >>>> >>>> on 2023/8/14 22:16, Richard Sandiford wrote: >>>>> "Kewen.Lin" writes: >>>>>> Hi Richard, >>>>>> >>>>>> on 2023/8/14 20:20, Richard Sandiford wrote: >>>>>>> Thanks for the clean-ups. But... >>>>>>> >>>>>>> "Kewen.Lin" writes: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Following Richi's suggestion [1], this patch is to move the >>>>>>>> handlings on VMAT_GATHER_SCATTER in the final loop nest >>>>>>>> of function vectorizable_load to its own loop. Basically >>>>>>>> it duplicates the final loop nest, clean up some useless >>>>>>>> set up code for the case of VMAT_GATHER_SCATTER, remove some >>>>>>>> unreachable code. Also remove the corresponding handlings >>>>>>>> in the final loop nest. >>>>>>>> >>>>>>>> Bootstrapped and regtested on x86_64-redhat-linux, >>>>>>>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >>>>>>>> >>>>>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html >>>>>>>> >>>>>>>> Is it ok for trunk? >>>>>>>> >>>>>>>> BR, >>>>>>>> Kewen >>>>>>>> ----- >>>>>>>> >>>>>>>> gcc/ChangeLog: >>>>>>>> >>>>>>>> * tree-vect-stmts.cc (vectorizable_load): Move the handlings on >>>>>>>> VMAT_GATHER_SCATTER in the final loop nest to its own loop, >>>>>>>> and update the final nest accordingly. >>>>>>>> --- >>>>>>>> gcc/tree-vect-stmts.cc | 361 +++++++++++++++++++++++++---------------- >>>>>>>> 1 file changed, 219 insertions(+), 142 deletions(-) >>>>>>> >>>>>>> ...that seems like quite a lot of +s. Is there nothing we can do to >>>>>>> avoid the cut-&-paste? >>>>>> >>>>>> Thanks for the comments! I'm not sure if I get your question, if we >>>>>> want to move out the handlings of VMAT_GATHER_SCATTER, the new +s seem >>>>>> inevitable? Your concern is mainly about git blame history? >>>>> >>>>> No, it was more that 219-142=77, so it seems like a lot of lines >>>>> are being duplicated rather than simply being moved. (Unlike for >>>>> VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so >>>>> was a clear improvement.) >>>>> >>>>> So I was just wondering if there was any obvious factoring-out that >>>>> could be done to reduce the duplication. >>>> >>>> ah, thanks for the clarification! >>>> >>>> I think the main duplication are on the loop body beginning and end, >>>> let's take a look at them in details: >>>> >>>> + if (memory_access_type == VMAT_GATHER_SCATTER) >>>> + { >>>> + gcc_assert (alignment_support_scheme == dr_aligned >>>> + || alignment_support_scheme == dr_unaligned_supported); >>>> + gcc_assert (!grouped_load && !slp_perm); >>>> + >>>> + unsigned int inside_cost = 0, prologue_cost = 0; >>>> >>>> // These above are newly added. >>>> >>>> + for (j = 0; j < ncopies; j++) >>>> + { >>>> + /* 1. Create the vector or array pointer update chain. */ >>>> + if (j == 0 && !costing_p) >>>> + { >>>> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >>>> + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, >>>> + slp_node, &gs_info, &dataref_ptr, >>>> + &vec_offsets); >>>> + else >>>> + dataref_ptr >>>> + = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, >>>> + at_loop, offset, &dummy, gsi, >>>> + &ptr_incr, false, bump); >>>> + } >>>> + else if (!costing_p) >>>> + { >>>> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); >>>> + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >>>> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, >>>> + gsi, stmt_info, bump); >>>> + } >>>> >>>> // These are for dataref_ptr, in the final looop nest we deal with more cases >>>> on simd_lane_access_p and diff_first_stmt_info, but don't handle >>>> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared between, >>>> IMHO factoring out it seems like a overkill. >>>> >>>> + >>>> + if (mask && !costing_p) >>>> + vec_mask = vec_masks[j]; >>>> >>>> // It's merged out from j == 0 and j != 0 >>>> >>>> + >>>> + gimple *new_stmt = NULL; >>>> + for (i = 0; i < vec_num; i++) >>>> + { >>>> + tree final_mask = NULL_TREE; >>>> + tree final_len = NULL_TREE; >>>> + tree bias = NULL_TREE; >>>> + if (!costing_p) >>>> + { >>>> + if (loop_masks) >>>> + final_mask >>>> + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, >>>> + vec_num * ncopies, vectype, >>>> + vec_num * j + i); >>>> + if (vec_mask) >>>> + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, >>>> + final_mask, vec_mask, gsi); >>>> + >>>> + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) >>>> + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, >>>> + gsi, stmt_info, bump); >>>> + } >>>> >>>> // This part is directly copied from the original, the original gets updated by >>>> removing && !STMT_VINFO_GATHER_SCATTER_P. Due to its size, I didn't consider >>>> this before, do you prefer me to factor this part out? >>>> >>>> + if (gs_info.ifn != IFN_LAST) >>>> + { >>>> ... >>>> + } >>>> + else >>>> + { >>>> + /* Emulated gather-scatter. */ >>>> ... >>>> >>>> // This part is just moved from the original. >>>> >>>> + vec_dest = vect_create_destination_var (scalar_dest, vectype); >>>> + /* DATA_REF is null if we've already built the statement. */ >>>> + if (data_ref) >>>> + { >>>> + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); >>>> + new_stmt = gimple_build_assign (vec_dest, data_ref); >>>> + } >>>> + new_temp = make_ssa_name (vec_dest, new_stmt); >>>> + gimple_set_lhs (new_stmt, new_temp); >>>> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); >>>> + >>>> + /* Store vector loads in the corresponding SLP_NODE. */ >>>> + if (slp) >>>> + slp_node->push_vec_def (new_stmt); >>>> + >>>> + if (!slp && !costing_p) >>>> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); >>>> + } >>>> + >>>> + if (!slp && !costing_p) >>>> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; >>>> >>>> // This part is some subsequent handlings, it's duplicated from the original >>>> but removing some more useless code. I guess this part is not worthy >>>> being factored out? >>>> >>>> + if (costing_p) >>>> + { >>>> + if (dump_enabled_p ()) >>>> + dump_printf_loc (MSG_NOTE, vect_location, >>>> + "vect_model_load_cost: inside_cost = %u, " >>>> + "prologue_cost = %u .\n", >>>> + inside_cost, prologue_cost); >>>> + } >>>> + return true; >>>> + } >>>> >>>> // Duplicating the dumping, I guess it's unnecessary to be factored out. >>>> >>>> oh, I just noticed that this should be shorten as >>>> "if (costing_p && dump_enabled_p ())" instead, just the same as what's >>>> adopted for VMAT_LOAD_STORE_LANES dumping. >>> >>> Just to mention, the original motivational idea was even though we >>> duplicate some >>> code we make it overall more readable and thus maintainable. In the end we >>> might have vectorizable_load () for analysis but have not only >>> load_vec_info_type but one for each VMAT_* which means multiple separate >>> vect_transform_load () functions. Currently vectorizable_load is structured >>> very inconsistently, having the transforms all hang off a single >>> switch (vmat-kind) {} would be an improvement IMHO. >> >> Thanks for the comments! With these two patches, now the final loop nest are >> only handling VMAT_CONTIGUOUS, VMAT_CONTIGUOUS_REVERSE and VMAT_CONTIGUOUS_PERMUTE. >> IMHO, their handlings are highly bundled, re-structuring them can have more >> duplicated code and potential incomplete bug fix risks as Richard pointed out. >> But if I read the above comments right, our final goal seems to separate all of >> them? I wonder if you both prefer to further separate them? > > I'd leave those together, they share too much code. Got it, thanks for clarifying, it matches my previous thought. :) BR, Kewen