From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by sourceware.org (Postfix) with ESMTPS id 992083857035; Thu, 10 Sep 2020 10:08:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 992083857035 Received: by mail-ej1-x643.google.com with SMTP id nw23so7855734ejb.4; Thu, 10 Sep 2020 03:08:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8qiTxJy+si8FbLxmU1BG8KRKlQWpgsVuUpWrlN8li5c=; b=fN4puLq8UofdoQL+AV9d2c29v+4s6Sh023mVGThPE0b/KmmqHTutz3sp2VtUsj9lWQ /RnDyPS6Kem7exwbrwVcilKMp+HqOCafCTPgZ4z8XWw7JuY492B4V89MSX6j6Yi2l7pA jlBfVBkqmV/nUfWKp+VTw8L/6eylZIEuwmxG+PpIG2jWZd+oR8KbX5lD7riwPTp6S9yB BYgh+ewDQ4aMRne6aPb8i1BSCaGz4xOQ2YXCjKo1uo/rgIy2c+fNBahJdGVPRNSOHfLZ ou4v57/iQZVaWquA3OLpA5sDlBOjwLTeAo0Bi1+sync+JDdgI2sYLD+9rVMfMltv8ymY MR1g== X-Gm-Message-State: AOAM530Ld+/wQSFGWXuHBOPs2jJDe4vUefQIQw0SP8WuYGav4dUB1y2Z KBZmWUp/h355zJyERnjLYRdP1HArC8b1rr6CRAw= X-Google-Smtp-Source: ABdhPJz1m3fBD1AAagNWft8LwOXvwGwn55kFSbD7XaILHQuNuJSWp6pWTSJ440moFifkpovFzZoYuTnHJlIvDuxY/hg= X-Received: by 2002:a17:906:71c9:: with SMTP id i9mr7791232ejk.250.1599732535695; Thu, 10 Sep 2020 03:08:55 -0700 (PDT) MIME-Version: 1.0 References: <20200904102357.GF28786@gate.crashing.org> <98b124ee-b32d-71f7-a662-e0ce2520de6a@linux.ibm.com> <20200909134739.GX28786@gate.crashing.org> <20200909160057.GZ28786@gate.crashing.org> In-Reply-To: <20200909160057.GZ28786@gate.crashing.org> From: Richard Biener Date: Thu, 10 Sep 2020 12:08:44 +0200 Message-ID: Subject: Re: [PATCH v2] rs6000: Expand vec_insert in expander instead of gimple [PR79251] To: Segher Boessenkool Cc: luoxhu , GCC Patches , David Edelsohn , Bill Schmidt , linkw@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Sep 2020 10:08:58 -0000 On Wed, Sep 9, 2020 at 6:03 PM Segher Boessenkool wrote: > > On Wed, Sep 09, 2020 at 04:28:19PM +0200, Richard Biener wrote: > > On Wed, Sep 9, 2020 at 3:49 PM Segher Boessenkool > > wrote: > > > > > > Hi! > > > > > > On Tue, Sep 08, 2020 at 10:26:51AM +0200, Richard Biener wrote: > > > > Hmm, yeah - I guess that's what should be addressed first then. > > > > I'm quite sure that in case 'v' is not on the stack but in memory like > > > > in my case a SImode store is better than what we get from > > > > vec_insert - in fact vec_insert will likely introduce a RMW cycle > > > > which is prone to inserting store-data-races? > > > > > > The other way around -- if it is in memory, and was stored as vector > > > recently, then reading back something shorter from it is prone to > > > SHL/LHS problems. There is nothing special about the stack here, except > > > of course it is more likely to have been stored recently if on the > > > stack. So it depends how often it has been stored recently which option > > > is best. On newer CPUs, although they can avoid SHL/LHS flushes more > > > often, the penalty is relatively bigger, so memory does not often win. > > > > > > I.e.: it needs to be measured. Intuition is often wrong here. > > > > But the current method would simply do a direct store to memory > > without a preceeding read of the whole vector. > > The problem is even worse the other way: you do a short store here, but > so a full vector read later. If the store and read are far apart, that > is fine, but if they are close (that is on the order of fifty or more > insns), there can be problems. Sure, but you can't simply load/store a whole vector when the code didn't unless you know it will not introduce data races and it will not trap (thus the whole vector needs to be at least naturally aligned). Also if there's a preceeding short store you will now load the whole vector to avoid the short store ... catch-22 > There often are problems over function calls (where the compiler cannot > usually *see* how something is used). Yep. The best way would be to use small loads and larger stores which is what CPUs usually tend to handle fine (with alignment constraints, etc.). Of course that's not what either of the "solutions" can do. That said, since you seem to be "first" in having an instruction to insert into a vector at a variable position the idea that we'd have to spill anyway for this to be expanded and thus we expand the vector to a stack location in the first place falls down. And that's where I'd first try to improve things. So what can the CPU actually do? Richard. > > > Segher