From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0D7AC3858C53 for ; Fri, 21 Jul 2023 06:08:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0D7AC3858C53 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36L67tID025748; Fri, 21 Jul 2023 06:08:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=DgmQOrML2RpUqXsaIWTp7uBNEOlYhBdhQ30C7iTnaJw=; b=W/fMpshT2FtgtwxnFZM1RvDryL6mk0Nl6p1udIJRcmheEuBgh4UjsbUjstz71Tj7Tbrl SmNLxpMdey39U9GpL4FJZ9ehwuhJT0JnSYFbuRHd2gJHOFiQ97cJDttqTAWvt9Auhkfm nC8sseOQ9msJbeVf7XvIFGUxlhaBrJwVa9i5129CIaaQb6aOWHrapsY8BvfkMggp98tA aN7g6WiEwSE6mttCNt1wvzG/lVpzPE1IQJzJCr4l1sASz8OcBgrXcpyovPo7wfXk4ZGB Mz8tqCxmLHjTHxdzlOmVEOrgMb/aun6wdE4HY+rydjXOCunYyZRCFbMBcVCNgHX94FgO Bw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ry3ydemyq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 06:08:02 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36L681gg026509; Fri, 21 Jul 2023 06:08:01 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ry3ydemj7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 06:08:00 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36L5NxAb031299; Fri, 21 Jul 2023 06:03:29 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3rv79k1srk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 06:03:29 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36L63RBg11534890 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Jul 2023 06:03:27 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 50D8E20040; Fri, 21 Jul 2023 06:03:27 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46ACB20043; Fri, 21 Jul 2023 06:03:25 +0000 (GMT) Received: from [9.177.28.8] (unknown [9.177.28.8]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 21 Jul 2023 06:03:24 +0000 (GMT) Message-ID: <07426582-50b4-de62-f5d7-d36e470f7fcb@linux.ibm.com> Date: Fri, 21 Jul 2023 14:03:23 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US To: GCC Patches Cc: Richard Biener , Richard Sandiford , Peter Bergner , Segher Boessenkool From: "Kewen.Lin" Subject: [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740] Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: n-YOGY7WF7IrpoP_3DssI96W9rxF3GT3 X-Proofpoint-ORIG-GUID: Fn0_FcRSSRs_YQXSw5HwbZY0sgIPiPr0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-21_02,2023-07-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 spamscore=0 priorityscore=1501 mlxscore=0 clxscore=1015 impostorscore=0 suspectscore=0 adultscore=0 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307210054 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, The function vect_update_epilogue_niters which has been removed by r14-2281 has some code taking care of that if there is only one scalar iteration left for epilogue then we won't try to vectorize it any more. Although costing should be able to care about it eventually, I think we still want this special casing without costing enabled, so this patch is to add it back in function vect_analyze_loop_costing, and make it more general for both main and epilogue loops as Richi suggested, it can fix some exposed failures on Power10: - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c - gcc.dg/vect/slp-perm-{1,5,6,7}.c Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10. Is it ok for trunk? BR, Kewen ----- PR tree-optimization/110740 gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a loop with a single scalar iteration. --- gcc/tree-vect-loop.cc | 55 ++++++++++++++++++++++++++----------------- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b44fb9c7712..92d2abde094 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo, epilogue we can also decide whether the main loop leaves us with enough iterations, prefering a smaller vector epilog then also possibly used for the case we skip the vector loop. */ - if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) - && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) { widest_int scalar_niters = wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1; @@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo, % lowest_vf + gap); } } - - /* Check that the loop processes at least one full vector. */ - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - if (known_lt (scalar_niters, vf)) + /* Reject vectorizing for a single scalar iteration, even if + we could in principle implement that using partial vectors. */ + unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); + if (scalar_niters <= peeling_gap + 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "loop does not have enough iterations " - "to support vectorization.\n"); + "not vectorized: loop only has a single " + "scalar iteration.\n"); return 0; } - /* If we need to peel an extra epilogue iteration to handle data - accesses with gaps, check that there are enough scalar iterations - available. - - The check above is redundant with this one when peeling for gaps, - but the distinction is useful for diagnostics. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - && known_le (scalar_niters, vf)) + if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "loop does not have enough iterations " - "to support peeling for gaps.\n"); - return 0; + /* Check that the loop processes at least one full vector. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + if (known_lt (scalar_niters, vf)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "loop does not have enough iterations " + "to support vectorization.\n"); + return 0; + } + + /* If we need to peel an extra epilogue iteration to handle data + accesses with gaps, check that there are enough scalar iterations + available. + + The check above is redundant with this one when peeling for gaps, + but the distinction is useful for diagnostics. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + && known_le (scalar_niters, vf)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "loop does not have enough iterations " + "to support peeling for gaps.\n"); + return 0; + } } } -- 2.39.3