From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id BD5863858C52 for ; Mon, 28 Nov 2022 02:57:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD5863858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AS29sFI016355; Mon, 28 Nov 2022 02:57:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=content-type : message-id : date : mime-version : subject : to : references : cc : from : in-reply-to; s=pp1; bh=xYbYrmfe/L7rnejbs13tHocNi2E4GftSIZcsuPq4GA8=; b=FUyj8E6wmgfGjGLvhgmaCgQEo0jhF7FYCsSvWCUFYElerxiaOgV5H1MmDwUM+370hFPs L+s7N5LfB0sMrBro3t/lukrB/WjKzM8WpRkQ/QzpW1nweNU52DHubJzlo1ceWgXsffsH u2rRf6OI05kqEKXGQhmvdeJeR9sCMDFeUs6+wDtGN53CHGkOd8UX4tufeQs9xd9rvG4q pnALTqCgf2tJ7ovClAj6wR7pPOfQYc83bmCHMjHHQexFHKRnQXcloOAX7AcjzXnh7+Lo /mzNeFxHJLwyYLW5irKjmgInlOfzcRp+xc7WoDUUOnfuti7f+6OsFqOGTobOBjFSDv6L EQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m3vmr40k7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:57:39 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AS2cgMP015899; Mon, 28 Nov 2022 02:57:38 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m3vmr40je-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:57:38 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AS2plgd017528; Mon, 28 Nov 2022 02:57:36 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04fra.de.ibm.com with ESMTP id 3m3ae91bju-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 28 Nov 2022 02:57:36 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AS2vXGq65470840 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 28 Nov 2022 02:57:33 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C67D8A405B; Mon, 28 Nov 2022 02:57:33 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C7FB6A4054; Mon, 28 Nov 2022 02:57:31 +0000 (GMT) Received: from [9.200.36.84] (unknown [9.200.36.84]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 28 Nov 2022 02:57:31 +0000 (GMT) Content-Type: multipart/mixed; boundary="------------guvwlKvHIwQgPDDP8pNVAKx0" Message-ID: <20f398c0-4eb9-908d-d782-445b9ce5a79d@linux.ibm.com> Date: Mon, 28 Nov 2022 10:57:29 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH] vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412] Content-Language: en-US To: richard.sandiford@arm.com References: <94ac390b-a770-c868-051b-75319eb7f81d@linux.ibm.com> Cc: Richard Biener , GCC Patches , Segher Boessenkool , Peter Bergner From: "Kewen.Lin" In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: DRY8vFpUQPpck7DedGCuOfKQPabPrgYP X-Proofpoint-ORIG-GUID: kNTWqsDZMPrRl84JkbxZTgNTKxhEH5P2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-28_02,2022-11-25_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 mlxscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 adultscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211280018 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------guvwlKvHIwQgPDDP8pNVAKx0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi Richard, on 2022/11/24 17:24, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, >> STORE} into normal vector load/store if the given length is known >> to be equal to the length of the whole vector. It would help to >> improve overall cycles as normally the latency of vector access >> with length in bytes is bigger than normal vector access, and it >> also saves the preparation for length if constant length can not >> be encoded into instruction (such as on power). >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> PR tree-optimization/107412 >> >> gcc/ChangeLog: >> >> * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... >> (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter >> mask_p indicating it's for mask or length, and add some handlings for >> IFN LEN_{LOAD,STORE}. >> (gimple_fold_mask_load): Rename to ... >> (gimple_fold_partial_load): ... this, add one parameter mask_p. >> (gimple_fold_mask_store): Rename to ... >> (gimple_fold_partial_store): ... this, add one parameter mask_p. >> (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, >> and adjust calls on gimple_fold_mask_load_store_mem_ref to >> gimple_fold_partial_load_store_mem_ref. > > Sorry to reply to late (still catching up on email), but: > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr107412.c: New test. >> * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for >> folded LEN_LOAD. >> --- >> gcc/gimple-fold.cc | 57 ++++++++++++++----- >> .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++++++ >> 3 files changed, 64 insertions(+), 14 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c >> >> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc >> index a1704784bc9..e3a087defa6 100644 >> --- a/gcc/gimple-fold.cc >> +++ b/gcc/gimple-fold.cc >> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type, >> return wi::min_precision (wres, sign) > TYPE_PRECISION (type); >> } >> >> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF >> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF >> for the memory it references, otherwise return null. VECTYPE is the >> - type of the memory vector. */ >> + type of the memory vector. MASK_P indicates it's for MASK if true, >> + otherwise it's for LEN. */ >> >> static tree >> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) >> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p) >> { >> tree ptr = gimple_call_arg (call, 0); >> tree alias_align = gimple_call_arg (call, 1); >> - tree mask = gimple_call_arg (call, 2); >> - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) >> + if (!tree_fits_uhwi_p (alias_align)) >> return NULL_TREE; >> >> + if (mask_p) >> + { >> + tree mask = gimple_call_arg (call, 2); >> + if (!integer_all_onesp (mask)) >> + return NULL_TREE; >> + } else { > > Minor nit: }, else, and { should be on separate lines. But the thing > I actually wanted to say was... Thanks for catching, I must have forgotten to reformat these lines. > >> + tree basic_len = gimple_call_arg (call, 2); >> + if (!tree_fits_uhwi_p (basic_len)) >> + return NULL_TREE; >> + unsigned int nargs = gimple_call_num_args (call); >> + tree bias = gimple_call_arg (call, nargs - 1); >> + gcc_assert (tree_fits_uhwi_p (bias)); >> + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); >> + unsigned int len = tree_to_uhwi (biased_len); >> + unsigned int vect_len >> + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); >> + if (vect_len != len) >> + return NULL_TREE; > > Using "unsigned int" truncates the value. I realise that's probably > safe in this context, since large values have undefined behaviour. > But it still seems better to use an untruncated type, so that it > looks less like an oversight. (I think this is one case where "auto" > can help, since it gets the type right automatically.) > > It would also be better to avoid the to_constant, since we haven't > proven is_constant. How about: > > tree basic_len = gimple_call_arg (call, 2); > if (!poly_int_tree_p (basic_len)) > return NULL_TREE; > unsigned int nargs = gimple_call_num_args (call); > tree bias = gimple_call_arg (call, nargs - 1); > gcc_assert (TREE_CODE (bias) == INTEGER_CST); > if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), > GET_MODE_SIZE (TYPE_MODE (vectype)))) > return NULL_TREE; > > which also avoids using tree arithmetic for the subtraction? I agree your proposed code has better robustness, thanks! Sorry that the original patch was committed, I made a patch as attached. It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and powerpc64le-linux-gnu P9 and P10. Is it ok for trunk? BR, Kewen --------------guvwlKvHIwQgPDDP8pNVAKx0 Content-Type: text/plain; charset=UTF-8; name="0001-gimple-fold-Refine-gimple_fold_partial_load_store_me.patch" Content-Disposition: attachment; filename*0="0001-gimple-fold-Refine-gimple_fold_partial_load_store_me.pa"; filename*1="tch" Content-Transfer-Encoding: base64 RnJvbSAzOTg0YTdmODZhMzVkMTNlMWZkNDBiYzBjMTJlZDVhZDViMjM0MDQ3IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBLZXdlbiBMaW4gPGxpbmt3QGxpbnV4LmlibS5jb20+ CkRhdGU6IFN1biwgMjcgTm92IDIwMjIgMjA6Mjk6NTcgLTA2MDAKU3ViamVjdDogW1BBVENI XSBnaW1wbGUtZm9sZDogUmVmaW5lIGdpbXBsZV9mb2xkX3BhcnRpYWxfbG9hZF9zdG9yZV9t ZW1fcmVmCgpGb2xsb3dpbmcgUmljaGFyZCdzIHJldmlldyBjb21tZW50cywgdGhpcyBwYXRj aCBpcyB0byB1c2UKdW50cnVuY2F0ZWQgdHlwZSBmb3IgdGhlIGxlbmd0aCB1c2VkIGZvciBJ Rk5fTEVOX3tMT0FELFNUT1JFfQppbnN0ZWFkIG9mICJ1bnNpZ25lZCBpbnQiIGZvciBiZXR0 ZXIgcm9idXN0bmVzcy4gIEl0IGFsc28KYXZvaWQgdG8gdXNlIHRvX2NvbnN0YW50IGFuZCB0 cmVlIGFyaXRobWV0aWMgZm9yIHN1YnRyYWN0aW9uLgoKQ28tYXV0aG9yZWQtYnk6IFJpY2hh cmQgU2FuZGlmb3JkICA8cmljaGFyZC5zYW5kaWZvcmRAYXJtLmNvbT4KCmdjYy9DaGFuZ2VM b2c6CgoJKiBnaW1wbGUtZm9sZC5jYyAoZ2ltcGxlX2ZvbGRfcGFydGlhbF9sb2FkX3N0b3Jl X21lbV9yZWYpOiBVc2UKCXVudHJ1bmNhdGVkIHR5cGUgZm9yIHRoZSBsZW5ndGgsIGFuZCBh dm9pZCB0b19jb25zdGFudCBhbmQgdHJlZQoJYXJpdGhtZXRpYyBmb3Igc3VidHJhY3Rpb24u Ci0tLQogZ2NjL2dpbXBsZS1mb2xkLmNjIHwgMTUgKysrKysrKy0tLS0tLS0tCiAxIGZpbGUg Y2hhbmdlZCwgNyBpbnNlcnRpb25zKCspLCA4IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBh L2djYy9naW1wbGUtZm9sZC5jYyBiL2djYy9naW1wbGUtZm9sZC5jYwppbmRleCBjMmQ5Yzgw NmFlZS4uODhkMTRjN2FkY2MgMTAwNjQ0Ci0tLSBhL2djYy9naW1wbGUtZm9sZC5jYworKysg Yi9nY2MvZ2ltcGxlLWZvbGQuY2MKQEAgLTUzODcsMTggKzUzODcsMTcgQEAgZ2ltcGxlX2Zv bGRfcGFydGlhbF9sb2FkX3N0b3JlX21lbV9yZWYgKGdjYWxsICpjYWxsLCB0cmVlIHZlY3R5 cGUsIGJvb2wgbWFza19wKQogICAgICAgdHJlZSBtYXNrID0gZ2ltcGxlX2NhbGxfYXJnIChj YWxsLCAyKTsKICAgICAgIGlmICghaW50ZWdlcl9hbGxfb25lc3AgKG1hc2spKQogCXJldHVy biBOVUxMX1RSRUU7Ci0gICAgfSBlbHNlIHsKKyAgICB9CisgIGVsc2UKKyAgICB7CiAgICAg ICB0cmVlIGJhc2ljX2xlbiA9IGdpbXBsZV9jYWxsX2FyZyAoY2FsbCwgMik7Ci0gICAgICBp ZiAoIXRyZWVfZml0c191aHdpX3AgKGJhc2ljX2xlbikpCisgICAgICBpZiAoIXBvbHlfaW50 X3RyZWVfcCAoYmFzaWNfbGVuKSkKIAlyZXR1cm4gTlVMTF9UUkVFOwogICAgICAgdW5zaWdu ZWQgaW50IG5hcmdzID0gZ2ltcGxlX2NhbGxfbnVtX2FyZ3MgKGNhbGwpOwogICAgICAgdHJl ZSBiaWFzID0gZ2ltcGxlX2NhbGxfYXJnIChjYWxsLCBuYXJncyAtIDEpOwotICAgICAgZ2Nj X2Fzc2VydCAodHJlZV9maXRzX3Nod2lfcCAoYmlhcykpOwotICAgICAgdHJlZSBiaWFzZWRf bGVuID0gaW50X2NvbnN0X2Jpbm9wIChNSU5VU19FWFBSLCBiYXNpY19sZW4sIGJpYXMpOwot ICAgICAgdW5zaWduZWQgaW50IGxlbiA9IHRyZWVfdG9fdWh3aSAoYmlhc2VkX2xlbik7Ci0g ICAgICB1bnNpZ25lZCBpbnQgdmVjdF9sZW4KLQk9IEdFVF9NT0RFX1NJWkUgKFRZUEVfTU9E RSAodmVjdHlwZSkpLnRvX2NvbnN0YW50ICgpOwotICAgICAgaWYgKHZlY3RfbGVuICE9IGxl bikKKyAgICAgIGdjY19hc3NlcnQgKFRSRUVfQ09ERSAoYmlhcykgPT0gSU5URUdFUl9DU1Qp OworICAgICAgaWYgKG1heWJlX25lICh3aTo6dG9fcG9seV93aWRlc3QgKGJhc2ljX2xlbikg LSB3aTo6dG9fd2lkZXN0IChiaWFzKSwKKwkJICAgIEdFVF9NT0RFX1NJWkUgKFRZUEVfTU9E RSAodmVjdHlwZSkpKSkKIAlyZXR1cm4gTlVMTF9UUkVFOwogICAgIH0KIAotLSAKMi4yNy4w Cgo= --------------guvwlKvHIwQgPDDP8pNVAKx0--