From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x542.google.com (mail-ed1-x542.google.com [IPv6:2a00:1450:4864:20::542]) by sourceware.org (Postfix) with ESMTPS id BC19A386F825; Mon, 14 Sep 2020 09:48:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BC19A386F825 Received: by mail-ed1-x542.google.com with SMTP id w1so16937731edr.3; Mon, 14 Sep 2020 02:48:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UrLSQY7MOrtW3j0u/5WLv4iQr3tViLTjlST626vHZfA=; b=nsRT/FNuamYMX5RZXvzt7Kq873kC75zSLucP6UZWXRtxP+/cUTy3FYMkheLepbVAKT 5jgYSDuC/4mAF7D8/whnX1Ged9jh+u9k3d5V5YKyjfHXpGt4u31ixlxNuQPB0ao9+np5 NJMKb682i6wo6UGioDrmj4SNGThJIOh89Ea05eZlxQs8N045JXkX4VDyrXJAlLKBfqE0 v7O/A/8/XZNtGBrCGHBvZuSYGpxDLRGa9c6Uu5Dg95pbPmLFZU8oKYD6QZMM0yYSSuOS lHZH1tbtrCt0Be0GlpbKnnC18batQgFuUpFsvOcYCtKeAYURBqFFrUrPYz5ZK3Jc1IcS nJ4g== X-Gm-Message-State: AOAM531a9MNVxIbwUIUm+G8Zzhga/4tdTXyl1tXlooyp0q1od0SbUJdH /9mT/kkLOR0G32yKOUqCOgiROGYaIcB6GIIyKnk= X-Google-Smtp-Source: ABdhPJxAfX+Jw600XfyG2QEYYvitU/HwgK/wmSh9+RLJerI8VX2BgDWN0jCyXHGjc9lmqpiH6j8cZr3Zt9psuzjasHM= X-Received: by 2002:aa7:d68c:: with SMTP id d12mr16603372edr.274.1600076887772; Mon, 14 Sep 2020 02:48:07 -0700 (PDT) MIME-Version: 1.0 References: <20200904102357.GF28786@gate.crashing.org> <98b124ee-b32d-71f7-a662-e0ce2520de6a@linux.ibm.com> <20200909134739.GX28786@gate.crashing.org> <20200909160057.GZ28786@gate.crashing.org> <572b36a7-2d52-2f46-05bd-140e16794a61@linux.ibm.com> In-Reply-To: <572b36a7-2d52-2f46-05bd-140e16794a61@linux.ibm.com> From: Richard Biener Date: Mon, 14 Sep 2020 11:47:56 +0200 Message-ID: Subject: Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251] To: luoxhu Cc: Segher Boessenkool , GCC Patches , David Edelsohn , Bill Schmidt , linkw@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Sep 2020 09:48:10 -0000 On Mon, Sep 14, 2020 at 10:05 AM luoxhu wrote: > > > > On 2020/9/10 18:08, Richard Biener wrote: > > On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool > > wrote: > >> > >> On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote: > >>> On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool > >>> wrote: > >>>> > >>>> Hi! > >>>> > >>>> On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote: > >>>>> Hmm, yeah - I guess that's what should be addressed first then. > >>>>> I'm quite sure that in case 'v' is not on the stack but in memory like > >>>>> in my case a SImode store is better than what we get from > >>>>> vec_insert - in fact vec_insert will likely introduce a RMW cycle > >>>>> which is prone to inserting store-data-races? > >>>> > >>>> The other way around -- if it is in memory, and was stored as vector > >>>> recently, then reading back something shorter from it is prone to > >>>> SHL/LHS problems. There is nothing special about the stack here, except > >>>> of course it is more likely to have been stored recently if on the > >>>> stack. So it depends how often it has been stored recently which option > >>>> is best. On newer CPUs, although they can avoid SHL/LHS flushes more > >>>> often, the penalty is relatively bigger, so memory does not often win. > >>>> > >>>> I.e.: it needs to be measured. Intuition is often wrong here. > >>> > >>> But the current method would simply do a direct store to memory > >>> without a preceeding read of the whole vector. > >> > >> The problem is even worse the other way: you do a short store here, but > >> so a full vector read later. If the store and read are far apart, that > >> is fine, but if they are close (that is on the order of fifty or more > >> insns), there can be problems. > > > > Sure, but you can't simply load/store a whole vector when the code didn't > > unless you know it will not introduce data races and it will not trap (thus > > the whole vector needs to be at least naturally aligned). > > > > Also if there's a preceeding short store you will now load the whole vector > > to avoid the short store ... catch-22 > > > >> There often are problems over function calls (where the compiler cannot > >> usually *see* how something is used). > > > > Yep. The best way would be to use small loads and larger stores > > which is what CPUs usually tend to handle fine (with alignment > > constraints, etc.). Of course that's not what either of the "solutions" > > can do. > > > > That said, since you seem to be "first" in having an instruction > > to insert into a vector at a variable position the idea that we'd > > have to spill anyway for this to be expanded and thus we expand > > the vector to a stack location in the first place falls down. And > > that's where I'd first try to improve things. > > > > So what can the CPU actually do? > > > > Not sure whether this reflects the issues you discussed above. > I constructed below test cases and tested with and without this patch, > only if "a+c"(which means store only), the performance is getting bad with > this patch due to extra load/store(about 50% slower). > For "v = vec_insert (i, v, n);" usage, v is always loaded after store, so this > patch will always get big benefit. > But for "v[n % 4] = i;" usage, it depends on whether "v" is used immediately > inside the function or out of the function soon. Does this mean unify the two > usage to same gimple code not a good idea sometimes? Or is it possible to > detect the generated IFN ".VEC_INSERT (&v, i_4(D), _1);" destination "v" is > used not far away inside or out side of the function? > > > #define TYPE int > > vector TYPE v = {1, 2, 3, 4}; // a. global vector. > vector TYPE s = {4, 3, 2, 1}; > > __attribute__ ((noinline)) > vector TYPE > test (vector TYPE u, TYPE i, size_t n) > { > v[n % 4] = i; ^^^ this should be u[n % 4] = i; I guess. Is the % 4 mandated by the vec_insert semantics btw? If you tested with the above error you probably need to re-measure. On gimple the above function (after fixing it) looks like VIEW_CONVERT_EXPR(u)[_1] = i_4(D); and the IFN idea I had would - for non-global memory 'u' only - transform this to vector_register_2 = u; vector_register_3 = .IFN_VEC_SET (vector_register_2, _1, i_4(D)); u = vector_register_3; if vec_set can handle variable indexes. This then becomes a vec_set on a register and if that was the only variable indexing of 'u' will also cause 'u' to be expanded as register rather than stack memory. Note we can't use the direct-optab method here since the vec_set optab modifies operand 0 which isn't possible in SSA form. That might hint at that we eventually want to extend BIT_INSERT_EXPR to handle a non-constant bit position but for experiments using an alternate internal function is certainly easier. Richard. > return u; > } > > int main () > { > //vector TYPE v = {1, 2, 3, 4}; // b. local vector. > //vector TYPE s = {4, 3, 2, 1}; > vector TYPE r = {0}; > for (long k = 0; k < 3989478945; k++) > { > r += test (s, 254.0f, k); // c. access different vector. store only. > //r += test (v, 254.0f, k); // d. access same vector. load after store in callee function. > //test (s, 254.0f, k); r += v; //e. load after store out of function. > } > return r[0]; > } > > > Thanks, > Xionghu >