From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14121 invoked by alias); 8 Aug 2019 15:24:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 2324 invoked by uid 89); 8 Aug 2019 15:24:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=barebone, honza, Honza X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Aug 2019 15:24:34 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E8F3B65401; Thu, 8 Aug 2019 15:24:32 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-43.rdu2.redhat.com [10.10.112.43]) by smtp.corp.redhat.com (Postfix) with ESMTP id 161B85C219; Thu, 8 Aug 2019 15:24:30 +0000 (UTC) Subject: Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs To: Uros Bizjak , Richard Biener Cc: "gcc-patches@gcc.gnu.org" , Jakub Jelinek , "H. J. Lu" , Jan Hubicka References: From: Jeff Law Openpgp: preference=signencrypt Message-ID: <5fa725f8-d821-cfb2-c5dc-b0cbec8801d4@redhat.com> Date: Thu, 08 Aug 2019 16:23:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg00564.txt.bz2 On 8/5/19 6:32 AM, Uros Bizjak wrote: > On Mon, Aug 5, 2019 at 1:50 PM Richard Biener wrote: >> >> On Sun, 4 Aug 2019, Uros Bizjak wrote: >> >>> On Sat, Aug 3, 2019 at 7:26 PM Richard Biener wrote: >>>> >>>> On Thu, 1 Aug 2019, Uros Bizjak wrote: >>>> >>>>> On Thu, Aug 1, 2019 at 11:28 AM Richard Biener wrote: >>>>> >>>>>>>> So you unconditionally add a smaxdi3 pattern - indeed this looks >>>>>>>> necessary even when going the STV route. The actual regression >>>>>>>> for the testcase could also be solved by turing the smaxsi3 >>>>>>>> back into a compare and jump rather than a conditional move sequence. >>>>>>>> So I wonder how you'd do that given that there's pass_if_after_reload >>>>>>>> after pass_split_after_reload and I'm not sure we can split >>>>>>>> as late as pass_split_before_sched2 (there's also a split _after_ >>>>>>>> sched2 on x86 it seems). >>>>>>>> >>>>>>>> So how would you go implement {s,u}{min,max}{si,di}3 for the >>>>>>>> case STV doesn't end up doing any transform? >>>>>>> >>>>>>> If STV doesn't transform the insn, then a pre-reload splitter splits >>>>>>> the insn back to compare+cmove. >>>>>> >>>>>> OK, that would work. But there's no way to force a jumpy sequence then >>>>>> which we know is faster than compare+cmove because later RTL >>>>>> if-conversion passes happily re-discover the smax (or conditional move) >>>>>> sequence. >>>>>> >>>>>>> However, considering the SImode move >>>>>>> from/to int/xmm register is relatively cheap, the cost function should >>>>>>> be tuned so that STV always converts smaxsi3 pattern. >>>>>> >>>>>> Note that on both Zen and even more so bdverN the int/xmm transition >>>>>> makes it no longer profitable but a _lot_ slower than the cmp/cmov >>>>>> sequence... (for the loop in hmmer which is the only one I see >>>>>> any effect of any of my patches). So identifying chains that >>>>>> start/end in memory is important for cost reasons. >>>>> >>>>> Please note that the cost function also considers the cost of move >>>>> from/to xmm. So, the cost of the whole chain would disable the >>>>> transformation. >>>>> >>>>>> So I think the splitting has to happen after the last if-conversion >>>>>> pass (and thus we may need to allocate a scratch register for this >>>>>> purpose?) >>>>> >>>>> I really hope that the underlying issue will be solved by a machine >>>>> dependant pass inserted somewhere after the pre-reload split. This >>>>> way, we can split unconverted smax to the cmove, and this later pass >>>>> would handle jcc and cmove instructions. Until then... yes your >>>>> proposed approach is one of the ways to avoid unwanted if-conversion, >>>>> although sometimes we would like to split to cmove instead. >>>> >>>> So the following makes STV also consider SImode chains, re-using the >>>> DImode chain code. I've kept a simple incomplete smaxsi3 pattern >>>> and also did not alter the {SI,DI}mode chain cost function - it's >>>> quite off for TARGET_64BIT. With this I get the expected conversion >>>> for the testcase derived from hmmer. >>>> >>>> No further testing sofar. >>>> >>>> Is it OK to re-use the DImode chain code this way? I'll clean things >>>> up some more of course. >>> >>> Yes, the approach looks OK to me. It makes chain building mode >>> agnostic, and the chain building can be used for >>> a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added. >>> b) SImode x86_32 and x86_64 (this will be mainly used for SImode >>> minmax and surrounding SImode operations) >>> c) DImode x86_64 (also, mainly used for DImode minmax and surrounding >>> DImode operations) >>> >>>> Still need help with the actual patterns for minmax and how the splitters >>>> should look like. >>> >>> Please look at the attached patch. Maybe we can add memory_operand as >>> operand 1 and operand 2 predicate, but let's keep things simple for >>> now. >> >> Thanks. The attached patch makes the patch cleaner and it survives >> "some" barebone testing. It also touches the cost function to >> avoid being too overly trigger-happy. I've also ended up using >> ix86_cost->sse_op instead of COSTS_N_INSN-based magic. In >> particular we estimated GPR reg-reg move as COST_N_INSNS(2) while >> move costs shouldn't be wrapped in COST_N_INSNS. >> IMHO we should probably disregard any reg-reg moves for costing pre-RA. >> At least with the current code every reg-reg move biases in favor of >> SSE... > > This is currently a bit mixed-up area in x86 target support. HJ is > looking into this [1] and I hope Honza can review the patch. Yea, Honza's input on that would be greatly appreciated. Jeff >