From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 18567386102B for ; Wed, 8 Jul 2020 06:54:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 18567386102B Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0686Vrre196241; Wed, 8 Jul 2020 02:54:19 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 324y8s68dr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 02:54:19 -0400 Received: from m0187473.ppops.net (m0187473.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0686VsEl196267; Wed, 8 Jul 2020 02:54:18 -0400 Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com with ESMTP id 324y8s68cd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 02:54:18 -0400 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0686kS13013767; Wed, 8 Jul 2020 06:54:15 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma02fra.de.ibm.com with ESMTP id 322hd84ayv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 08 Jul 2020 06:54:15 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0686qwaw11927926 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 8 Jul 2020 06:52:58 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 65EDEA405C; Wed, 8 Jul 2020 06:52:58 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FA20A405B; Wed, 8 Jul 2020 06:52:56 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.35.189]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 8 Jul 2020 06:52:55 +0000 (GMT) Subject: Re: [PATCH 5/7 v6] vect: Support vector load/store with length in vectorizer To: GCC Patches , richard.sandiford@arm.com References: <30906c0d-3b9f-e1e6-156f-c01fcf229cb9@linux.ibm.com> <4b7f2daf-467e-d940-b79c-31c1c30a1dd4@linux.ibm.com> <2fa0452b-9895-15f3-6db5-4233ff16b408@linux.ibm.com> <68981da8-f7a1-4c95-6d64-2a1d8b748b9f@linux.ibm.com> <5634e168-13fa-b870-3378-b78779793e1f@linux.ibm.com> <0eb2d1c1-f8e6-4f62-cc56-200228252650@linux.ibm.com> <9c8a665c-6ac7-431e-6eb1-aa6f6ecba69d@linux.ibm.com> <8058b249-b8d3-7b0b-8322-8b0fba17df9b@linux.ibm.com> <46923817-05c2-629a-5d59-3378da17d835@linux.ibm.com> Cc: Bill Schmidt , dje.gcc@gmail.com, Segher Boessenkool From: "Kewen.Lin" Message-ID: <322755de-d8f2-3b34-6e59-9824d7bb690f@linux.ibm.com> Date: Wed, 8 Jul 2020 14:52:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-07-08_01:2020-07-08, 2020-07-08 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 phishscore=0 spamscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 impostorscore=0 lowpriorityscore=0 adultscore=0 clxscore=1015 cotscore=-2147483648 suspectscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007080042 X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2020 06:54:25 -0000 Hi Richard, on 2020/7/7 下午6:44, Richard Sandiford wrote: > "Kewen.Lin" writes: >> on 2020/7/2 下午1:20, Kewen.Lin via Gcc-patches wrote: >>> on 2020/7/1 下午11:17, Richard Sandiford wrote: >>>> "Kewen.Lin" writes: >>>>> on 2020/7/1 上午3:53, Richard Sandiford wrote: >>>>>> "Kewen.Lin" writes: >>>>>>> + /* Decide whether to use fully-masked approach. */ >>>>>>> + if (vect_verify_full_masking (loop_vinfo)) >>>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; >>>>>>> + /* Decide whether to use length-based approach. */ >>>>>>> + else if (vect_verify_loop_lens (loop_vinfo)) >>>>>>> + { >>>>>>> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) >>>>>>> + || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)) >>>>>>> + { >>>>>>> + if (dump_enabled_p ()) >>>>>>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>>>>>> + "can't vectorize this loop with length-based" >>>>>>> + " partial vectors approach becuase peeling" >>>>>>> + " for alignment or gaps is required.\n"); >>>>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; >>>>>>> + } >>>>>> >>>>>> Why are these peeling cases necessary? Peeling for gaps should >>>>>> just mean subtracting one scalar iteration from the iteration count >>>>>> and shouldn't otherwise affect the main loop. Similarly, peeling for >>>>>> alignment can be handled in the normal way, with a scalar prologue loop. >>>>>> >>>>> >>>>> I was thinking to relax this later and to avoid to handle too many cases >>>>> in the first enablement patch. Since Power hw whose level is able to support >>>>> vector with length, it supports unaligned load/store, need to construct >>>>> some cases for them. May I postpone it a bit? Or you prefer me to support >>>>> it here? >>>> >>>> I've no objection to postponing it if there are specific known >>>> problems that make it difficult, but I think we should at least >>>> say what they are. On the face of it, I'm not sure why it doesn't >>>> Just Work, since the way that we control the main loop should be >>>> mostly orthogonal to how we handle peeled prologue iterations >>>> and how we handle a single peeled epilogue iteration. >>>> >>> >>> OK, I will remove it to see the impact. By the way, do you think to >>> use partial vectors for prologue is something worth to trying in future? >>> >> >> I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS >> part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to >> fail at execution during vect-partial-vector-usage=2. So far the patch >> doesn't handle any niters_skip cases. I think if we want to support it, >> we have to add some handlings in/like what we have for masking, such as: >> mask_skip_niters, vect_prepare_for_masked_peels etc. >> >> Do you prefer me to extend the support in this patch series? > > It's not so much whether it has to be supported now, but more why > it doesn't work now. What was the reason for the failure? > > The peeling-with-masking thing is just an optimisation, so that we > can vectorise the peeled iterations rather than falling back to > scalar code for them. It shouldn't be needed for correctness. > Whoops, thanks for the clarification! Nice, I just realized it's a way to adopt partial vectors for prologue. The fail case is gcc.dg/vect/vect-ifcvt-11.c. There the first iteration is optimized out due to the known AND result of IV 0, then it tries to peel 3 iterations, the number of remaining iterations for vectorization body is expected to be 12. But it still uses 15 and causes out-of-bound access. The below fix can fix the failure. The justification is that we need to use the fixed up niters after peeling prolog for the vectorization body for partial vectors. I'm not sure why the other cases not using partial vectors don't need the fixed up niters, to avoid troubles I guarded it with LOOP_VINFO_USING_PARTIAL_VECTORS_P explicitly. Does it make sense? --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -8888,6 +8896,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf); step_vector = build_one_cst (TREE_TYPE (niters)); } + else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + && !vect_use_loop_mask_for_alignment_p (loop_vinfo)) + vect_gen_vector_loop_niters (loop_vinfo, LOOP_VINFO_NITERS (loop_vinfo), + &niters_vector, &step_vector, + niters_no_overflow); else vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, &step_vector, niters_no_overflow); BR, Kewen