From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2c.google.com (mail-vk1-xa2c.google.com [IPv6:2607:f8b0:4864:20::a2c]) by sourceware.org (Postfix) with ESMTPS id 89A5B385741A for ; Fri, 25 Mar 2022 14:09:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 89A5B385741A Received: by mail-vk1-xa2c.google.com with SMTP id l184so4321107vkh.0 for ; Fri, 25 Mar 2022 07:09:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=teZDliQ5ysRtLF5X6tsRKLlw264DL1xrKbxR5jX7Bw0=; b=ShuNZ6PUICh3FBGQRQUOrctDW81PLa5kmrmGciv7V4nzWWlV6cnv6Alfw2en3Won3y aPNRPSNceA2mS0DJL0fnOqi42tvsoLceVo7UMu1Zu68OB8jtHLfEwpIA6KS0nAK2bBD0 tzcFuthCgTLuahIpssdFR9wC2wbqbwjRCocnatXlIMYhULfjvhqUmKWFWaFKZsv0IODv CcyFf3OcYZMPJZREdBZmHToDc09WW0MjuGXUroTi9HrdHqCjeLKe7DAcE7J95QHjtf8d GpCiboSfrRQiYrryZH0jgereJD3obyrwWKAQ6RD99QR5miRCMY7L92c2nu0OP5mSlCYA 31nQ== X-Gm-Message-State: AOAM5304Uer/Cplo2YO4iUiD27Tp/uQ3dC9aFy9Ugx3m2EhqTok3b4Z5 2OqsWfOE7+y0rQJY1uVofz7bhOA5QhxcemBR7zY= X-Google-Smtp-Source: ABdhPJzRpuTnBYfYT3502FD2gbBJcmDnp5U9z6N3fmMjn/iW9NnaOPxLeR2g/Yxtcv/roq6uYsH4IsyTfrwc7N15T74= X-Received: by 2002:ac5:c961:0:b0:33f:4c03:df44 with SMTP id t1-20020ac5c961000000b0033f4c03df44mr5484495vkm.19.1648217384357; Fri, 25 Mar 2022 07:09:44 -0700 (PDT) MIME-Version: 1.0 References: <20220325121119.82296132E9@imap2.suse-dmz.suse.de> In-Reply-To: From: Hongtao Liu Date: Fri, 25 Mar 2022 22:09:33 +0800 Message-ID: Subject: Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing To: Richard Biener Cc: GCC Patches , Richard Sandiford , "Liu, Hongtao" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Mar 2022 14:09:48 -0000 On Fri, Mar 25, 2022 at 9:42 PM Richard Biener wrote: > > On Fri, 25 Mar 2022, Hongtao Liu wrote: > > > On Fri, Mar 25, 2022 at 8:11 PM Richard Biener via Gcc-patches > > wrote: > > > > > > Since we're now vectorizing by default at -O2 issues like PR101908 > > > become more important where we apply basic-block vectorization to > > > parts of the function covering loads from function parameters passed > > > on the stack. Since we have no good idea how the stack pushing > > > was performed but we do have a good idea that it was done recent > > > when at the start of the function a STLF failure is inevitable > > > if the argument passing code didn't use the same or larger vector > > > stores and also has them aligned the same way. > > > > > > Until there's a robust IPA based solution the following implements > > > target independent heuristics in the vectorizer to retain scalar > > > loads for loads from parameters likely passed in memory (I use > > > a BLKmode DECL_MODE check for this rather than firing up > > > cummulative-args). I've restricted this also to loads from the > > > first "block" (that can be less than the first basic block if there's > > > a call for example), since that covers the testcase. > > I prefer this patch to the md-reorg way. > > I was mostly posting the variant because it is target agnostic. In > general I agree that mitigation is best done at the RTL level after > scheduling/RA so that the instruction sequence from function entry > is visible. > > Did you have any success in coding this up yet? Do you think it Not yet. > can be done in a way to be re-used by multiple targets? > > > Can vectorizer similarly handle by-reference passed arguments if their > > corresponding real parameters are local variables of the caller? > > .i.e. we can also prevent vectorization for the below case. > > > > struct vec3 { > > double x, y, z; > > }; > > > > struct ray { > > struct vec3 orig, dir; > > }; > > > > void > > __attribute__((noinline)) > > ray_sphere (struct ray* __restrict ray, double *__restrict res) > > { > > res[0] = ray->orig.y * ray->dir.x; > > res[1] = ray->orig.z * ray->dir.y; > > } > > > > extern struct ray g; > > void bar (double* res) > > { > > struct ray tmp = g; > > ray_sphere (&tmp, res); > > } > > Without IPA analysis this will be hard, if not impossible. > > Richard. > > > > > > > Note that for the testcase (but not c-ray from the bugreport) there's > > > a x86 peephole2 that vectorizes things back, so the patch is > > > not effective there. > > > > > > Any comments? I know we're also looking at x86 port specific > > > mitigations but the issue will hit arm and power/z as well I think. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > Thanks, > > > Richard. > > > > > > 2022-03-25 Richard Biener > > > > > > PR tree-optimization/101908 > > > * tree-vect-stmts.cc (get_group_load_store_type): Add > > > heuristic to limit BB vectorization of function parameters. > > > > > > * gcc.dg/vect/bb-slp-pr101908.c: New testcase. > > > --- > > > gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c | 18 ++++++++++++++ > > > gcc/tree-vect-stmts.cc | 27 ++++++++++++++++++++- > > > 2 files changed, 44 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > new file mode 100644 > > > index 00000000000..b7534a18f0e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101908.c > > > @@ -0,0 +1,18 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_double } */ > > > + > > > +struct vec3 { > > > + double x, y, z; > > > +}; > > > + > > > +struct ray { > > > + struct vec3 orig, dir; > > > +}; > > > + > > > +void ray_sphere (struct ray ray, double *res) > > > +{ > > > + res[0] = ray.orig.y * ray.dir.x; > > > + res[1] = ray.orig.z * ray.dir.y; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "STLF fail" "slp2" } } */ > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > index f7449a79d1c..1e37e9678b6 100644 > > > --- a/gcc/tree-vect-stmts.cc > > > +++ b/gcc/tree-vect-stmts.cc > > > @@ -2197,7 +2197,32 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, > > > /* Stores can't yet have gaps. */ > > > gcc_assert (slp_node || vls_type == VLS_LOAD || gap == 0); > > > > > > - if (slp_node) > > > + tree parm; > > > + if (!loop_vinfo > > > + && vls_type == VLS_LOAD > > > + /* The access is based on a PARM_DECL. */ > > > + && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR > > > + && ((parm = TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0)), true) > > > + && TREE_CODE (parm) == PARM_DECL > > > + /* Likely passed on the stack. */ > > > + && DECL_MODE (parm) == BLKmode > > > + /* The access is in the first group. */ > > > + && first_dr_info->group == 0) > > > + { > > > + /* When doing BB vectorizing force early loads from function parameters > > > + passed on the stack and thus stored recently to be done elementwise > > > + to avoid store-to-load forwarding penalties. > > > + Note this will cause vectorization to fail for the load because of > > > + the fear of underestimating the cost of elementwise accesses, > > > + see the end of get_load_store_type. We are then going to effectively > > > + do the same via handling the loads as external input to the SLP. */ > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > + "Not using vector loads from function parameter " > > > + "for the fear of causing a STLF fail\n"); > > > + *memory_access_type = VMAT_ELEMENTWISE; > > > + } > > > + else if (slp_node) > > > { > > > /* For SLP vectorization we directly vectorize a subchain > > > without permutation. */ > > > -- > > > 2.34.1 > > > > > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg) -- BR, Hongtao