From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 40591 invoked by alias); 4 Aug 2019 17:11:16 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 40583 invoked by uid 89); 4 Aug 2019 17:11:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:CAFULd4, Maybe, H*f:sk:ri6o91i, H*f:sk:VZFRLE5 X-HELO: mail-io1-f66.google.com Received: from mail-io1-f66.google.com (HELO mail-io1-f66.google.com) (209.85.166.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 04 Aug 2019 17:11:14 +0000 Received: by mail-io1-f66.google.com with SMTP id j5so158521171ioj.8 for ; Sun, 04 Aug 2019 10:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HF/NSaPPm2TgVcTvfhCTHHaak7G6ib8festvMyva7Bs=; b=SD//Tuua6VrO2AKKCJwgUg2YUZM91gXcsSdKfbi6mYojZeyxD2ErwT1LPSKfinxOaF FrbDV6lXUuNzSMJK5PAA8BqW+49WrXBTQwIF8SXEj4PoU4UtyN2wrLKsIjBaXgTXV4uf aZ0ca0Nosc6Q43bnLb4pFl4wXrlo++Z4BulQ8toMvFtIIPkV3v3GSlz1dN9fPXgMSWEE mgOYsF32Cq0FU+uaZ9FT+wTwWRmgpVzaPHvXD1JUUnNsSwZ+6NrSA8G1aJ/lf0u+cKLn 9jvSSetVpdCXsv9XKwPqhSDF+s9NCnHNBKfCNx2U9qe769YUFbAOxZhUJWbWSCfJ7H8z AKXA== MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Sun, 04 Aug 2019 17:11:00 -0000 Message-ID: Subject: Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" , Jakub Jelinek Content-Type: multipart/mixed; boundary="000000000000b2ef4b058f4dad61" X-SW-Source: 2019-08/txt/msg00215.txt.bz2 --000000000000b2ef4b058f4dad61 Content-Type: text/plain; charset="UTF-8" Content-length: 3584 On Sat, Aug 3, 2019 at 7:26 PM Richard Biener wrote: > > On Thu, 1 Aug 2019, Uros Bizjak wrote: > > > On Thu, Aug 1, 2019 at 11:28 AM Richard Biener wrote: > > > >>>> So you unconditionally add a smaxdi3 pattern - indeed this looks > >>>> necessary even when going the STV route. The actual regression > >>>> for the testcase could also be solved by turing the smaxsi3 > >>>> back into a compare and jump rather than a conditional move sequence. > >>>> So I wonder how you'd do that given that there's pass_if_after_reload > >>>> after pass_split_after_reload and I'm not sure we can split > >>>> as late as pass_split_before_sched2 (there's also a split _after_ > >>>> sched2 on x86 it seems). > >>>> > >>>> So how would you go implement {s,u}{min,max}{si,di}3 for the > >>>> case STV doesn't end up doing any transform? > >>> > >>> If STV doesn't transform the insn, then a pre-reload splitter splits > >>> the insn back to compare+cmove. > >> > >> OK, that would work. But there's no way to force a jumpy sequence then > >> which we know is faster than compare+cmove because later RTL > >> if-conversion passes happily re-discover the smax (or conditional move) > >> sequence. > >> > >>> However, considering the SImode move > >>> from/to int/xmm register is relatively cheap, the cost function should > >>> be tuned so that STV always converts smaxsi3 pattern. > >> > >> Note that on both Zen and even more so bdverN the int/xmm transition > >> makes it no longer profitable but a _lot_ slower than the cmp/cmov > >> sequence... (for the loop in hmmer which is the only one I see > >> any effect of any of my patches). So identifying chains that > >> start/end in memory is important for cost reasons. > > > > Please note that the cost function also considers the cost of move > > from/to xmm. So, the cost of the whole chain would disable the > > transformation. > > > >> So I think the splitting has to happen after the last if-conversion > >> pass (and thus we may need to allocate a scratch register for this > >> purpose?) > > > > I really hope that the underlying issue will be solved by a machine > > dependant pass inserted somewhere after the pre-reload split. This > > way, we can split unconverted smax to the cmove, and this later pass > > would handle jcc and cmove instructions. Until then... yes your > > proposed approach is one of the ways to avoid unwanted if-conversion, > > although sometimes we would like to split to cmove instead. > > So the following makes STV also consider SImode chains, re-using the > DImode chain code. I've kept a simple incomplete smaxsi3 pattern > and also did not alter the {SI,DI}mode chain cost function - it's > quite off for TARGET_64BIT. With this I get the expected conversion > for the testcase derived from hmmer. > > No further testing sofar. > > Is it OK to re-use the DImode chain code this way? I'll clean things > up some more of course. Yes, the approach looks OK to me. It makes chain building mode agnostic, and the chain building can be used for a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added. b) SImode x86_32 and x86_64 (this will be mainly used for SImode minmax and surrounding SImode operations) c) DImode x86_64 (also, mainly used for DImode minmax and surrounding DImode operations) > Still need help with the actual patterns for minmax and how the splitters > should look like. Please look at the attached patch. Maybe we can add memory_operand as operand 1 and operand 2 predicate, but let's keep things simple for now. Uros. --000000000000b2ef4b058f4dad61 Content-Type: text/plain; charset="US-ASCII"; name="minmax-md.diff.txt" Content-Disposition: attachment; filename="minmax-md.diff.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jyx7xo9c0 Content-length: 1286 SW5kZXg6IGkzODYubWQKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gaTM4 Ni5tZAkocmV2aXNpb24gMjc0MDA4KQorKysgaTM4Ni5tZAkod29ya2luZyBj b3B5KQpAQCAtMTc3MjEsNiArMTc3MjEsMjcgQEAKICAgICBzdGQ6OnN3YXAg KG9wZXJhbmRzWzRdLCBvcGVyYW5kc1s1XSk7CiB9KQogCis7OyBtaW4vbWF4 IHBhdHRlcm5zCisKKyhkZWZpbmVfY29kZV9hdHRyIHNtYXhtaW5fcmVsIFso c21heCAiZ2UiKSAoc21pbiAibGUiKV0pCisKKyhkZWZpbmVfaW5zbl9hbmRf c3BsaXQgIjxjb2RlPjxtb2RlPjMiCisgIFsoc2V0IChtYXRjaF9vcGVyYW5k OlNXSTQ4IDAgInJlZ2lzdGVyX29wZXJhbmQiKQorCShzbWF4bWluOlNXSTQ4 IChtYXRjaF9vcGVyYW5kOlNXSTQ4IDEgInJlZ2lzdGVyX29wZXJhbmQiKQor CQkgICAgICAgKG1hdGNoX29wZXJhbmQ6U1dJNDggMiAicmVnaXN0ZXJfb3Bl cmFuZCIpKSkKKyAgIChjbG9iYmVyIChyZWc6Q0MgRkxBR1NfUkVHKSldCisg ICJUQVJHRVRfU1RWICYmIFRBUkdFVF9TU0U0XzEKKyAgICYmIGNhbl9jcmVh dGVfcHNldWRvX3AgKCkiCisgICIjIgorICAiJiYgMSIKKyAgWyhzZXQgKHJl ZzpDQ0dDIEZMQUdTX1JFRykKKwkoY29tcGFyZTpDQ0dDIChtYXRjaF9kdXAg MSkobWF0Y2hfZHVwIDIpKSkKKyAgIChzZXQgKG1hdGNoX2R1cCAwKQorICAg CShpZl90aGVuX2Vsc2U6U1dJNDgKKwkgICg8c21heG1pbl9yZWw+IChyZWc6 Q0NHQyBGTEFHU19SRUcpKGNvbnN0X2ludCAwKSkKKwkgIChtYXRjaF9kdXAg MSkKKwkgIChtYXRjaF9kdXAgMikpKV0pCisKIDs7IENvbmRpdGlvbmFsIGFk ZGl0aW9uIHBhdHRlcm5zCiAoZGVmaW5lX2V4cGFuZCAiYWRkPG1vZGU+Y2Mi CiAgIFsobWF0Y2hfb3BlcmFuZDpTV0kgMCAicmVnaXN0ZXJfb3BlcmFuZCIp Cg== --000000000000b2ef4b058f4dad61--