From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 933063858C60 for ; Fri, 21 Jan 2022 21:36:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 933063858C60 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 20LLZYaI004560; Fri, 21 Jan 2022 15:35:34 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 20LLZYWO004555; Fri, 21 Jan 2022 15:35:34 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 21 Jan 2022 15:35:33 -0600 From: Segher Boessenkool To: Michael Meissner , gcc-patches@gcc.gnu.org, David Edelsohn , Bill Schmidt , Peter Bergner , Will Schmidt Subject: Re: [PATCH] Mark XXSPLTIW/XXSPLTIDP as prefixed -- PR 104136 Message-ID: <20220121213533.GJ614@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jan 2022 21:36:37 -0000 Hi! On Fri, Jan 21, 2022 at 02:49:26PM -0500, Michael Meissner wrote: > If you compile module_advect_em.F90 with -Ofast -mcpu=power10, one module > is large enough that we can't use a single conditional jump to span the > function. Instead, GCC has to reverse the condition, and do a conditional > jump around an unconditional branch. It turns out when xxspltiw and > xxspltdp instructions were generated, they were not marked as being > prefixed (i.e. length of 12 bytes instead of 4 bytes). (The prefixed insn itself is 8B, but there can be 4B more because prefixed insns cannot cross 64B boundaries, necessitating an extra nop insn or other 4B padding). > This meant the > calculations for the branch length were off, which in turn meant the > assembler raised an error because it couldn't do the conditional jump. That is the most common symptom, yup. But there are other problems as well (other correctness problems -- it obviously does not help performance either). > The fix is to explicitly set the prefixed attribute when we are loading up > vector constants with the xxspltiw or xxspltidp instructions. That attribute should be set on *all* xxsplti{w,dp} insns, and more in general on all insns that are always prefixed. The maybe_prefixed attribute is only for insns for which a porefixed as well as a not prefixed version exists, the prefixed version with a "p" prefixed to the mnemonic. > I have removed the code that sets the prefixed attribute for xxspltiw, > xxspltidp, and xxsplti32dx instructions, since it no longer will be invoked. Great cleanup / simplification! > I have also explicitly set the prefixed attribute for load SF and DF mode > constants with xxsplitw and xxspltidp. Previously, it was not set on these > insns, but when the insn was split to get the XXSPLTIW/XXSPLTIDP forms, those > forms already had the prefixed attribute set. So now we have more correct information before the insn is split. Good. > - (eq_attr "type" "vecperm") > - (if_then_else (match_test "prefixed_xxsplti_p (insn)") > (const_string "yes") > (const_string "no"))] Excellent to see this go :-) > + (set_attr "prefixed" > + "*, *, *, *, *, *, > + *, *, *, *, *, *, > + *, *, *, *, yes")]) You could do some formula that computes it from isa==p10 btw. But wrap that in some helper, "is can have prefixed" or something. Not really worth it unless you need this often, the four we have now (which could be two perhaps, by merging each pair of patterns again) isn't enough to warrant the extra indirection. Okay for trunk. Also fine for backports if you need them. Thanks! Segher