From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44698 invoked by alias); 24 Sep 2019 14:51:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 44562 invoked by uid 89); 24 Sep 2019 14:51:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.3 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 Sep 2019 14:51:18 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8F95F337; Tue, 24 Sep 2019 07:51:16 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F28D43F59C; Tue, 24 Sep 2019 07:51:15 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Martin =?utf-8?Q?Li=C5=A1ka?= , GCC Patches , richard.sandiford@arm.com Cc: Martin =?utf-8?Q?Li=C5=A1ka?= , GCC Patches Subject: Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs. References: <366dfa22-58e7-e1df-62c3-cc98082a552c@suse.cz> Date: Tue, 24 Sep 2019 14:51:00 -0000 In-Reply-To: (Richard Biener's message of "Tue, 24 Sep 2019 14:17:45 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg01399.txt.bz2 Richard Biener writes: > On Tue, Sep 24, 2019 at 1:57 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford >> > wrote: >> >> >> >> Martin Li=C5=A1ka writes: >> >> > Hi. >> >> > >> >> > The patch introduces couple of new TREE_CODEs that will help us to = have >> >> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now, >> >> > the first argument is typically a GENERIC tcc_expression tree with = 2 operands >> >> > that are visited at various places in GIMPLE code. That said, based= on the discussion >> >> > with Richi, I'm suggesting to come up with e.g. >> >> > VECT_COND_LT_EXPR. Such= a change logically >> >> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the VEC= _COND_EXPR remains >> >> > and is only valid in GENERIC and gimplifier will take care of the c= orresponding transition. >> >> > >> >> > The patch is a prototype and missing bits are: >> >> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing >> >> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, = ORDERED_EXPR, >> >> > UNLT_EXPR and others are not supported right now >> >> > - comments are missing for various functions added >> >> > >> >> > Apart from that I was able to bootstrap and run tests with a quite = small fallout. >> >> > Thoughts? >> >> > Martin >> >> >> >> I think this is going in the wrong direction. There are some targets >> >> that can only handle VEC_COND_EXPRs well if we know the associated >> >> condition, and others where a compare-and-VEC_COND_EXPR will always be >> >> two operations. In that situation, it seems like the native gimple >> >> representation should be the simpler representation rather than the >> >> more complex one. That way the comparisons can be optimised >> >> independently of any VEC_COND_EXPRs on targets that benefit from that. >> >> >> >> So IMO it would be better to use three-operand VEC_COND_EXPRs with >> >> no embedded conditions as the preferred gimple representation and >> >> have internal functions for the fused operations that some targets >> >> prefer. This means that using fused operations is "just" an instruct= ion >> >> selection decision rather than hard-coded throughout gimple. (And th= at >> >> fits in well with the idea of doing more instruction selection in gim= ple.) >> > >> > So I've been doing that before, but more generally also for COND_EXPR. >> > We cannot rely on TER and the existing RTL expansion "magic" for the >> > instruction selection issue you mention because TER isn't reliable. W= ith >> > IFNs for optabs we could do actual [vector] condition instruction sele= ction >> > before RTL expansion, ignoring "single-use" issues - is that what you = are >> > hinting at? >> >> Yeah. It'd be similar to how most FMA selection happens after >> vectorisation but before expand. >> >> > How should the vectorizer deal with this? Should it directly >> > use the optab IFNs then when facing "split" COND_EXPRs? IIRC the >> > most fallout of a simple patch (adjusting is_gimple_condexpr) is in the >> > vectorizer. >> >> I guess that would be down to how well the vector costings work if we >> just stick to VEC_COND_EXPR and cost the comparison separately. Using >> optabs directly in the vectoriser definitely sounds OK if that ends up >> being necessary for good code. But if (like you say) the COND_EXPR is >> also split apart, we'd be costing the scalar comparison and selection >> separately as well. >> >> > Note I'm specifically looking for a solution that applies to both COND= _EXPR >> > and VEC_COND_EXPR since both suffer from the same issues. >> >> Yeah, think the same approach would work for COND_EXPR if it's needed. >> (And I think the same trade-off applies there too. Some targets will >> always need a separate comparison to implement a four-operand COND_EXPR.) >> >> > There was also recent work in putting back possibly trapping compariso= ns >> > into [VEC_]COND_EXPR because it doesn't interfere with EH and allows >> > better code. >> >> OK, that's a good counter-reason :-) But it seems quite special-purpose. >> I assume this works even for targets that do split the VEC_COND_EXPR >> because the result is undefined on entry to the EH receiver if the >> operation didn't complete. But that should be true of any non-trapping >> work done after the comparison, with the same proviso. >> >> So this still seems like an instruction-selection issue. We're just >> saying that it's OK to combine a trapping comparison and a VEC_COND_EXPR >> from the non-trapping path. The same would be true for any other >> instruction selection that combines trapping and non-trapping >> operations, provided that the speculated parts can never trap. > > Sure, but that case would necessarily be combining the compare and the > select to the compare place which is "backwards" (and would speculate > the select). Certainly something we don't do anywhere. This case btw > made me consider going the four-operand way (I've pondered with all avail= able > ops multiple times...). Yeah, but that was my point: speculating/moving back operations that are dependent on the result of the comparison is valid for any non-trapping operation, not just selects. E.g. maybe some future target will want to have a version of IFN_COND_ADD with an embedded condition, and the same thing would then be useful for integer additions based on FP comparison results. So I don't think VEC_COND_EXPR is such a special case that we need the four-operand form all the way through gimple. >> > Also you SVE people had VN issues with cond-exprs and >> > VN runs into the exact same issue (but would handle separate compariso= ns >> > better - with the caveat of breaking TER). >> >> The VN thing turned out to be a red herring there, sorry. I think >> I was remembering the state before ifcvt did its own value numbering. >> The remaining issue for the vectoriser is that we don't avoid duplicate >> cast conversions in vect_recog_mask_conversion_pattern, but that's >> mostly a cost thing. The redundancies do get removed by later passes. > > Well, I checked and value-numbering doesn't really handle non-trivial > "equalities" of the condition operand (if one of the operands of the > condition need to be valueized to be detected equal). > > So to go forward and to make sure we don't regress the appropriate > way would probably to tackle the expansion part first. I guess we'll > not notice for scalar COND_EXPRs (because those don't happen > very often) so we could "lower" VEC_COND_EXPRs to the desired > form (and key IL verificataion on PROP_gimple_lvec), which then > means late FRE/DOM have the chance to break things by doing > CSE. At the same time we'd remove the forwprop pieces that put > the condition back in. Then we can see to implement the > instruction selection somehow somewhere... (does it need to happen > at -O0? I think that might be desirable - looking at vectorizer > intrinsic code might help to decide). Not sure why we'd need it for correctness at -O0. Can't VEC_COND_EXPR always be emulated (albeit inefficiently)? Even if you only have FP compare-and-select, you can emulate VEC_COND_EXPRs on a 0/-1 mask. If the code produced really is too poor even for -O0, then keeping intrinsics as intrinsics during gimple would probably be better. > Does that sound sensible? I've searched my patch archieves and > could share several incomplete attempts on tackling this, dating > back to as far as 2010...) Sounds good to me FWIW. Thanks, Richard