Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Jakub Jelinek <jakub@redhat.com>,
		"H. J. Lu" <hjl.tools@gmail.com>, Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
Date: Mon, 05 Aug 2019 12:33:00 -0000	[thread overview]
Message-ID: <CAFULd4aDDZCw0LkZbo7J1c=eEBqjeh-Q9NLWv94HLBDvNhd+WQ@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.20.1908051316260.19626@zhemvz.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 7285 bytes --]

On Mon, Aug 5, 2019 at 1:50 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Sun, 4 Aug 2019, Uros Bizjak wrote:
>
> > On Sat, Aug 3, 2019 at 7:26 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Thu, 1 Aug 2019, Uros Bizjak wrote:
> > >
> > > > On Thu, Aug 1, 2019 at 11:28 AM Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > >>>> So you unconditionally add a smaxdi3 pattern - indeed this looks
> > > >>>> necessary even when going the STV route.  The actual regression
> > > >>>> for the testcase could also be solved by turing the smaxsi3
> > > >>>> back into a compare and jump rather than a conditional move sequence.
> > > >>>> So I wonder how you'd do that given that there's pass_if_after_reload
> > > >>>> after pass_split_after_reload and I'm not sure we can split
> > > >>>> as late as pass_split_before_sched2 (there's also a split _after_
> > > >>>> sched2 on x86 it seems).
> > > >>>>
> > > >>>> So how would you go implement {s,u}{min,max}{si,di}3 for the
> > > >>>> case STV doesn't end up doing any transform?
> > > >>>
> > > >>> If STV doesn't transform the insn, then a pre-reload splitter splits
> > > >>> the insn back to compare+cmove.
> > > >>
> > > >> OK, that would work.  But there's no way to force a jumpy sequence then
> > > >> which we know is faster than compare+cmove because later RTL
> > > >> if-conversion passes happily re-discover the smax (or conditional move)
> > > >> sequence.
> > > >>
> > > >>> However, considering the SImode move
> > > >>> from/to int/xmm register is relatively cheap, the cost function should
> > > >>> be tuned so that STV always converts smaxsi3 pattern.
> > > >>
> > > >> Note that on both Zen and even more so bdverN the int/xmm transition
> > > >> makes it no longer profitable but a _lot_ slower than the cmp/cmov
> > > >> sequence... (for the loop in hmmer which is the only one I see
> > > >> any effect of any of my patches).  So identifying chains that
> > > >> start/end in memory is important for cost reasons.
> > > >
> > > > Please note that the cost function also considers the cost of move
> > > > from/to xmm. So, the cost of the whole chain would disable the
> > > > transformation.
> > > >
> > > >> So I think the splitting has to happen after the last if-conversion
> > > >> pass (and thus we may need to allocate a scratch register for this
> > > >> purpose?)
> > > >
> > > > I really hope that the underlying issue will be solved by a machine
> > > > dependant pass inserted somewhere after the pre-reload split. This
> > > > way, we can split unconverted smax to the cmove, and this later pass
> > > > would handle jcc and cmove instructions. Until then... yes your
> > > > proposed approach is one of the ways to avoid unwanted if-conversion,
> > > > although sometimes we would like to split to cmove instead.
> > >
> > > So the following makes STV also consider SImode chains, re-using the
> > > DImode chain code.  I've kept a simple incomplete smaxsi3 pattern
> > > and also did not alter the {SI,DI}mode chain cost function - it's
> > > quite off for TARGET_64BIT.  With this I get the expected conversion
> > > for the testcase derived from hmmer.
> > >
> > > No further testing sofar.
> > >
> > > Is it OK to re-use the DImode chain code this way?  I'll clean things
> > > up some more of course.
> >
> > Yes, the approach looks OK to me. It makes chain building mode
> > agnostic, and the chain building can be used for
> > a) DImode x86_32 (as is now), but maybe 64bit minmax operation can be added.
> > b) SImode x86_32 and x86_64 (this will be mainly used for SImode
> > minmax and surrounding SImode operations)
> > c) DImode x86_64 (also, mainly used for DImode minmax and surrounding
> > DImode operations)
> >
> > > Still need help with the actual patterns for minmax and how the splitters
> > > should look like.
> >
> > Please look at the attached patch. Maybe we can add memory_operand as
> > operand 1 and operand 2 predicate, but let's keep things simple for
> > now.
>
> Thanks.  The attached patch makes the patch cleaner and it survives
> "some" barebone testing.  It also touches the cost function to
> avoid being too overly trigger-happy.  I've also ended up using
> ix86_cost->sse_op instead of COSTS_N_INSN-based magic.  In
> particular we estimated GPR reg-reg move as COST_N_INSNS(2) while
> move costs shouldn't be wrapped in COST_N_INSNS.
> IMHO we should probably disregard any reg-reg moves for costing pre-RA.
> At least with the current code every reg-reg move biases in favor of
> SSE...

This is currently a bit mixed-up area in x86 target support. HJ is
looking into this [1] and I hope Honza can review the patch.

> And we're simply adding move and non-move costs in 'gain', somewhat
> mixing apples and oranges?  We could separate those and require
> both to be a net positive win?
>
> Still using -mtune=bdverN exposes that some cost tables have xmm and gpr
> costs as apples and oranges... (so it never triggers for Bulldozer)
>
> I now run into
>
> /space/rguenther/src/svn/trunk-bisect/libgcc/libgcov-driver.c:509:1:
> error: unrecognizable insn:
> (insn 116 115 1511 8 (set (subreg:V2DI (reg/v:DI 87 [ run_max ]) 0)
>         (smax:V2DI (subreg:V2DI (reg/v:DI 87 [ run_max ]) 0)
>             (subreg:V2DI (reg:DI 349 [ MEM[base: _261, offset: 0B] ]) 0)))
> -1
>      (expr_list:REG_DEAD (reg:DI 349 [ MEM[base: _261, offset: 0B] ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> during RTL pass: stv
>
> where even with -mavx2 we do not have s{min,max}v2di3.  We do have
> an expander here but it seems only AVX512F has the DImode min/max
> ops.  I have adjusted dimode_scalar_to_vector_candidate_p
> accordingly.
>
> I'm considering to rename the
> dimode_{scalar_to_vector_candidate_p,remove_non_convertible_regs}
> functions to drop the dimode_ prefix - is that OK or do you
> prefer some other prefix?
>
> So - bootstrap with --with-arch=skylake in progress.
>
> It detects quite a few chains (unsurprisingly) so I guess we need
> to address compile-time issues in the pass before enabling this
> enhancement (maybe as followup?).
>
> Further comments on the actual patch welcome, I consider it
> "finished" if testing reveals no issues.  ChangeLog still needs
> to be written and testcases to be added.

> +;; min/max patterns
> +
> +(define_code_attr smaxmin_rel [(smax "ge") (smin "le")])
> +
> +(define_insn_and_split "<code><mode>3"
> +  [(set (match_operand:SWI48 0 "register_operand")
> +       (smaxmin:SWI48 (match_operand:SWI48 1 "register_operand")
> +                      (match_operand:SWI48 2 "register_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_STV && TARGET_SSE4_1
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCGC FLAGS_REG)
> +       (compare:CCGC (match_dup 1)(match_dup 2)))
> +   (set (match_dup 0)
> +       (if_then_else:SWI48
> +         (<smaxmin_rel> (reg:CCGC FLAGS_REG)(const_int 0))
> +         (match_dup 1)
> +         (match_dup 2)))])
> +
>  ;; Conditional addition patterns
>  (define_expand "add<mode>cc"
>    [(match_operand:SWI 0 "register_operand")

Please find attached (untested) i386.md patch that defines signed and
unsigned min/max pattern.

[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01542.html

Uros.

[-- Attachment #2: maxmin-md.diff.txt --]
[-- Type: text/plain, Size: 1117 bytes --]

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e19a591fa9d..8a492626103 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17721,6 +17721,30 @@
     std::swap (operands[4], operands[5]);
 })
 
+;; min/max patterns
+
+(define_code_attr maxmin_rel
+  [(smax "ge") (smin "le") (umax "geu") (umin "leu")])
+(define_code_attr maxmin_cmpmode
+  [(smax "CCGC") (smin "CCGC") (umax "CC") (umin "CC")])
+
+(define_insn_and_split "<code><mode>3"
+  [(set (match_operand:SWI48 0 "register_operand")
+	(maxmin:SWI48 (match_operand:SWI48 1 "register_operand")
+		      (match_operand:SWI48 2 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_STV && TARGET_SSE4_1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:<maxmin_cmpmode> FLAGS_REG)
+	(compare:<maxmin_cmpmode> (match_dup 1)(match_dup 2)))
+   (set (match_dup 0)
+	(if_then_else:SWI48
+	  (<maxmin_rel> (reg:<maxmin_cmpmode> FLAGS_REG)(const_int 0))
+	  (match_dup 1)
+	  (match_dup 2)))])
+
 ;; Conditional addition patterns
 (define_expand "add<mode>cc"
   [(match_operand:SWI 0 "register_operand")

next prev parent reply	other threads:[~2019-08-05 12:33 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-23 14:03 Richard Biener
2019-07-24  9:14 ` Richard Biener
2019-07-24 11:30   ` Richard Biener
2019-07-24 15:12 ` Jeff Law
2019-07-27 10:07   ` Uros Bizjak
2019-08-09 22:15     ` Jeff Law
2019-07-25  9:15 ` Martin Jambor
2019-07-25 12:57   ` Richard Biener
2019-07-27 11:14     ` Uros Bizjak
2019-07-27 18:23       ` Uros Bizjak
2019-07-31 12:01         ` Richard Biener
2019-08-01  8:54           ` Uros Bizjak
2019-08-01  9:28             ` Richard Biener
2019-08-01  9:38               ` Uros Bizjak
2019-08-03 17:26                 ` Richard Biener
2019-08-04 17:11                   ` Uros Bizjak
2019-08-04 17:23                     ` Jakub Jelinek
2019-08-04 17:36                       ` Uros Bizjak
2019-08-05  8:47                         ` Richard Biener
2019-08-05  9:13                     ` Richard Sandiford
2019-08-05 10:08                       ` Uros Bizjak
2019-08-05 10:12                         ` Richard Sandiford
2019-08-05 10:24                           ` Uros Bizjak
2019-08-05 10:39                             ` Richard Sandiford
2019-08-05 11:50                     ` Richard Biener
2019-08-05 11:59                       ` Uros Bizjak
2019-08-05 12:16                         ` Richard Biener
2019-08-05 12:23                           ` Uros Bizjak
2019-08-05 12:33                       ` Uros Bizjak [this message]
2019-08-08 16:23                         ` Jeff Law
2019-08-05 12:44                       ` Uros Bizjak
2019-08-05 12:51                         ` Uros Bizjak
2019-08-05 12:54                           ` Jakub Jelinek
2019-08-05 12:57                             ` Uros Bizjak
2019-08-05 13:04                               ` Richard Biener
2019-08-05 13:09                                 ` Uros Bizjak
2019-08-05 13:29                                   ` Richard Biener
2019-08-05 19:35                                     ` Uros Bizjak
2019-08-07  9:52                                       ` Richard Biener
2019-08-07 12:04                                         ` Richard Biener
2019-08-07 12:11                                           ` Uros Bizjak
2019-08-07 12:42                                           ` Uros Bizjak
2019-08-07 12:58                                             ` Uros Bizjak
2019-08-07 13:00                                               ` Richard Biener
2019-08-07 13:32                                                 ` Uros Bizjak
2019-08-07 14:15                                         ` Richard Biener
2019-08-09  7:28                                   ` Uros Bizjak
2019-08-09 10:13                                     ` Richard Biener
2019-08-09 10:26                                       ` Jakub Jelinek
2019-08-09 11:15                                         ` Richard Biener
2019-08-09 11:06                                       ` Richard Biener
2019-08-09 13:13                                         ` Richard Biener
2019-08-09 14:39                                           ` Uros Bizjak
2019-08-12 12:57                                             ` Richard Biener
2019-08-12 14:48                                               ` Uros Bizjak
2019-08-13 16:28                                               ` Jeff Law
2019-08-13 20:07                                                 ` H.J. Lu
2019-08-15  9:24                                                   ` Uros Bizjak
2019-08-13 15:20                                           ` Jeff Law
2019-08-14  9:15                                             ` Richard Biener
2019-08-14  9:36                                               ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4aDDZCw0LkZbo7J1c=eEBqjeh-Q9NLWv94HLBDvNhd+WQ@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=hubicka@ucw.cz \
    --cc=jakub@redhat.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).