From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0414B3858D32 for ; Thu, 1 Dec 2022 11:28:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0414B3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 518F4D6E; Thu, 1 Dec 2022 03:28:09 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D907E3F73D; Thu, 1 Dec 2022 03:28:01 -0800 (PST) From: Richard Sandiford To: "Kewen.Lin" Mail-Followup-To: "Kewen.Lin" ,Richard Biener , GCC Patches , Segher Boessenkool , Peter Bergner , richard.sandiford@arm.com Cc: Richard Biener , GCC Patches , Segher Boessenkool , Peter Bergner Subject: Re: [PATCH] vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412] References: <94ac390b-a770-c868-051b-75319eb7f81d@linux.ibm.com> <20f398c0-4eb9-908d-d782-445b9ce5a79d@linux.ibm.com> Date: Thu, 01 Dec 2022 11:28:00 +0000 In-Reply-To: <20f398c0-4eb9-908d-d782-445b9ce5a79d@linux.ibm.com> (Kewen Lin's message of "Mon, 28 Nov 2022 10:57:29 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-39.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Kewen.Lin" writes: > Hi Richard, > > on 2022/11/24 17:24, Richard Sandiford wrote: >> "Kewen.Lin" writes: >>> Hi, >>> >>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, >>> STORE} into normal vector load/store if the given length is known >>> to be equal to the length of the whole vector. It would help to >>> improve overall cycles as normally the latency of vector access >>> with length in bytes is bigger than normal vector access, and it >>> also saves the preparation for length if constant length can not >>> be encoded into instruction (such as on power). >>> >>> Bootstrapped and regtested on x86_64-redhat-linux, >>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >>> >>> Is it ok for trunk? >>> >>> BR, >>> Kewen >>> ----- >>> PR tree-optimization/107412 >>> >>> gcc/ChangeLog: >>> >>> * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... >>> (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter >>> mask_p indicating it's for mask or length, and add some handlings for >>> IFN LEN_{LOAD,STORE}. >>> (gimple_fold_mask_load): Rename to ... >>> (gimple_fold_partial_load): ... this, add one parameter mask_p. >>> (gimple_fold_mask_store): Rename to ... >>> (gimple_fold_partial_store): ... this, add one parameter mask_p. >>> (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, >>> and adjust calls on gimple_fold_mask_load_store_mem_ref to >>> gimple_fold_partial_load_store_mem_ref. >> >> Sorry to reply to late (still catching up on email), but: >> >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr107412.c: New test. >>> * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for >>> folded LEN_LOAD. >>> --- >>> gcc/gimple-fold.cc | 57 ++++++++++++++----- >>> .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- >>> gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++++++ >>> 3 files changed, 64 insertions(+), 14 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c >>> >>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc >>> index a1704784bc9..e3a087defa6 100644 >>> --- a/gcc/gimple-fold.cc >>> +++ b/gcc/gimple-fold.cc >>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type, >>> return wi::min_precision (wres, sign) > TYPE_PRECISION (type); >>> } >>> >>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF >>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF >>> for the memory it references, otherwise return null. VECTYPE is the >>> - type of the memory vector. */ >>> + type of the memory vector. MASK_P indicates it's for MASK if true, >>> + otherwise it's for LEN. */ >>> >>> static tree >>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) >>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p) >>> { >>> tree ptr = gimple_call_arg (call, 0); >>> tree alias_align = gimple_call_arg (call, 1); >>> - tree mask = gimple_call_arg (call, 2); >>> - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) >>> + if (!tree_fits_uhwi_p (alias_align)) >>> return NULL_TREE; >>> >>> + if (mask_p) >>> + { >>> + tree mask = gimple_call_arg (call, 2); >>> + if (!integer_all_onesp (mask)) >>> + return NULL_TREE; >>> + } else { >> >> Minor nit: }, else, and { should be on separate lines. But the thing >> I actually wanted to say was... > > Thanks for catching, I must have forgotten to reformat these lines. > >> >>> + tree basic_len = gimple_call_arg (call, 2); >>> + if (!tree_fits_uhwi_p (basic_len)) >>> + return NULL_TREE; >>> + unsigned int nargs = gimple_call_num_args (call); >>> + tree bias = gimple_call_arg (call, nargs - 1); >>> + gcc_assert (tree_fits_uhwi_p (bias)); >>> + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); >>> + unsigned int len = tree_to_uhwi (biased_len); >>> + unsigned int vect_len >>> + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); >>> + if (vect_len != len) >>> + return NULL_TREE; >> >> Using "unsigned int" truncates the value. I realise that's probably >> safe in this context, since large values have undefined behaviour. >> But it still seems better to use an untruncated type, so that it >> looks less like an oversight. (I think this is one case where "auto" >> can help, since it gets the type right automatically.) >> >> It would also be better to avoid the to_constant, since we haven't >> proven is_constant. How about: >> >> tree basic_len = gimple_call_arg (call, 2); >> if (!poly_int_tree_p (basic_len)) >> return NULL_TREE; >> unsigned int nargs = gimple_call_num_args (call); >> tree bias = gimple_call_arg (call, nargs - 1); >> gcc_assert (TREE_CODE (bias) == INTEGER_CST); >> if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), >> GET_MODE_SIZE (TYPE_MODE (vectype)))) >> return NULL_TREE; >> >> which also avoids using tree arithmetic for the subtraction? > > I agree your proposed code has better robustness, thanks! > > Sorry that the original patch was committed, I made a patch as attached. > It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and > powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? OK, thanks. Richard > BR, > Kewen > > From 3984a7f86a35d13e1fd40bc0c12ed5ad5b234047 Mon Sep 17 00:00:00 2001 > From: Kewen Lin > Date: Sun, 27 Nov 2022 20:29:57 -0600 > Subject: [PATCH] gimple-fold: Refine gimple_fold_partial_load_store_mem_ref > > Following Richard's review comments, this patch is to use > untruncated type for the length used for IFN_LEN_{LOAD,STORE} > instead of "unsigned int" for better robustness. It also > avoid to use to_constant and tree arithmetic for subtraction. > > Co-authored-by: Richard Sandiford > > gcc/ChangeLog: > > * gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use > untruncated type for the length, and avoid to_constant and tree > arithmetic for subtraction. > --- > gcc/gimple-fold.cc | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index c2d9c806aee..88d14c7adcc 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -5387,18 +5387,17 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p) > tree mask = gimple_call_arg (call, 2); > if (!integer_all_onesp (mask)) > return NULL_TREE; > - } else { > + } > + else > + { > tree basic_len = gimple_call_arg (call, 2); > - if (!tree_fits_uhwi_p (basic_len)) > + if (!poly_int_tree_p (basic_len)) > return NULL_TREE; > unsigned int nargs = gimple_call_num_args (call); > tree bias = gimple_call_arg (call, nargs - 1); > - gcc_assert (tree_fits_shwi_p (bias)); > - tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); > - unsigned int len = tree_to_uhwi (biased_len); > - unsigned int vect_len > - = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); > - if (vect_len != len) > + gcc_assert (TREE_CODE (bias) == INTEGER_CST); > + if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), > + GET_MODE_SIZE (TYPE_MODE (vectype)))) > return NULL_TREE; > }