From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 833BA3858C2D for ; Fri, 25 Mar 2022 12:11:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 833BA3858C2D Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9FE0E210FC; Fri, 25 Mar 2022 12:11:19 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 82296132E9; Fri, 25 Mar 2022 12:11:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id tPp3HmexPWI0UgAAMHmgww (envelope-from ); Fri, 25 Mar 2022 12:11:19 +0000 Date: Fri, 25 Mar 2022 13:11:19 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com, hongtao.liu@intel.com Subject: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Message-Id: <20220325121119.82296132E9@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Mar 2022 12:11:24 -0000 Since we're now vectorizing by default at -O2 issues like PR101908 become more important where we apply basic-block vectorization to parts of the function covering loads from function parameters passed on the stack. Since we have no good idea how the stack pushing was performed but we do have a good idea that it was done recent when at the start of the function a STLF failure is inevitable if the argument passing code didn't use the same or larger vector stores and also has them aligned the same way. Until there's a robust IPA based solution the following implements target independent heuristics in the vectorizer to retain scalar loads for loads from parameters likely passed in memory (I use a BLKmode DECL_MODE check for this rather than firing up cummulative-args). I've restricted this also to loads from the first "block" (that can be less than the first basic block if there's a call for example), since that covers the testcase. Note that for the testcase (but not c-ray from the bugreport) there's a x86 peephole2 that vectorizes things back, so the patch is not effective there. Any comments? I know we're also looking at x86 port specific mitigations but the issue will hit arm and power/z as well I think. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Richard. 2022-03-25 Richard Biener PR tree-optimization/101908 * tree-vect-stmts.cc (get_group_load_store_type): Add heuristic to limit BB vectorization of function parameters. * gcc.dg/vect/bb-slp-pr101908.c: New testcase. --- gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++ gcc/tree-vect-stmts.cc | 27 ++++++++++++++++++++- 2 files changed, 44 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c new file mode 100644 index 00000000000..b7534a18f0e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ + +struct vec3 { + double x, y, z; +}; + +struct ray { + struct vec3 orig, dir; +}; + +void ray_sphere (struct ray ray, double *res) +{ + res[0] = ray.orig.y * ray.dir.x; + res[1] = ray.orig.z * ray.dir.y; +} + +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f7449a79d1c..1e37e9678b6 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, /* Stores can't yet have gaps. */ gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0); - if (slp_node) + tree parm; + if (!loop_vinfo + && vls_type == VLS_LOAD + /* The access is based on a PARM_DECL. */ + && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR + && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), true) + && TREE_CODE (parm) == PARM_DECL + /* Likely passed on the stack. */ + && DECL_MODE (parm) == BLKmode + /* The access is in the first group. */ + && first_dr_info->group == 0) + { + /* When doing BB vectorizing force early loads from function parameters + passed on the stack and thus stored recently to be done elementwise + to avoid store-to-load forwarding penalties. + Note this will cause vectorization to fail for the load because of + the fear of underestimating the cost of elementwise accesses, + see the end of get_load_store_type. We are then going to effectively + do the same via handling the loads as external input to the SLP. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Not using vector loads from function parameter " + "for the fear of causing a STLF fail\n"); + *memory_access_type = VMAT_ELEMENTWISE; + } + else if (slp_node) { /* For SLP vectorization we directly vectorize a subchain without permutation. */ -- 2.34.1