public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.
Date: Sun, 5 Jun 2022 20:57:10 +0200	[thread overview]
Message-ID: <CAFULd4biUta4TOPT8UBEY7L0Hv1jGejzUp5vyxKp7XSJrn-_=w@mail.gmail.com> (raw)
In-Reply-To: <009201d878d2$2591ff60$70b5fe20$@nextmovesoftware.com>

On Sun, Jun 5, 2022 at 1:48 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Many thanks for your speedy review.  This revised patch implements
> all three of your recommended improvements; the use of
> ix86_binary_operator_ok with code UNKNOWN, the removal of
> "n" constraints from const_int_operand predicates, and the use
> of force_reg (for input operands, and REG_P for output operands)
> to ensure that it's always safe to call gen_lowpart/gen_highpart.
>
> [If we proceed with the recent proposal to split double word
> addition, subtraction and other operations before reload, then
> these new add/sub variants should be updated at the same time,
> but for now this patch keeps double word patterns consistent].
>
> This revised patch has been retested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32} with no new failures.  Ok for mainline?

+(define_insn_and_split "*concat<mode><dwi>3_1"
+  [(set (match_operand:<DWI> 0 "register_operand" "=r")
+ (any_or_plus:<DWI>
+  (ashift:<DWI> (match_operand:<DWI> 1 "register_operand" "r")
+ (match_operand:<DWI> 2 "const_int_operand"))
+  (zero_extend:<DWI> (match_operand:DWIH 3 "register_operand" "r"))))]
+  "INTVAL (operands[2]) == <MODE_SIZE> * BITS_PER_UNIT
+   && REG_P (operands[0])
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 5) (match_dup 6))]
+{
+  operands[1] = force_reg (<DWI>mode, operands[1]);
+  operands[4] = gen_lowpart (<MODE>mode, operands[0]);
+  operands[5] = gen_highpart (<MODE>mode, operands[0]);
+  operands[6] = gen_lowpart (<MODE>mode, operands[1]);
+})

Hm, but in this particular case (and other) you can use
split_double_mode on operands[0], instead of manually splitting REG_P
constrained operands, and it will handle everything correctly. Please
note that split_double_mode has:

split_double_mode (machine_mode mode, rtx operands[],
           int num, rtx lo_half[], rtx hi_half[])

so with some care you can use:

"split_double_mode (<DWI>mode, &operands[0],1, &operands[4], &operands[5]);"

followed by:

operands[6] = simplify_gen_subreg (<MODE>mode, op, <DWI>mode, 0);

The above line is partially what split_double_mode does.

This is the approach other pre_reload doubleword splitters take, it
looks the safest (otherwise it would break left and right with
existing patterns ...), and the most effective to me.

Please also get approval for sse.md change from Hongtao, AVX512F stuff
is in a separate playground.

Uros.


>
> 2022-06-05  Roger Sayle  <roger@nextmovesoftware.com>
>             Uroš Bizjak  <ubizjak@gmail.com>
>
> gcc/ChangeLog
>         PR target/91681
>         * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
>         (*add<dwi>3_doubleword_zext): New define_insn_and_split.
>         (*sub<dwi>3_doubleword_zext): New define_insn_and_split.
>         (*concat<mode><dwi>3_1): New define_insn_and_split replacing
>         previous define_split for implementing DST = (HI<<32)|LO as
>         pair of move instructions, setting lopart and hipart.
>         (*concat<mode><dwi>3_2): Likewise.
>         (*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
>         (*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.
>         * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
>         (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
>         (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
>         (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec.
>         (vec_pack_trunc_<mode>): Likewise.
>
> gcc/testsuite/ChangeLog
>         PR target/91681
>         * g++.target/i386/pr91681.C: New test case (from the PR).
>         * gcc.target/i386/pr91681-1.c: New int128 test case.
>         * gcc.target/i386/pr91681-2.c: Likewise.
>         * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.
>
>
> Thanks again,
> Roger
> --
>
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 03 June 2022 11:08
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
> > optimizations.
> >
> > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > Technically, PR target/91681 has already been resolved; we now
> > > recognize the highpart multiplication at the tree-level, we no longer
> > > use the stack, and we currently generate the same number of
> > > instructions as LLVM.  However, it is still possible to do better, the
> > > current x86_64 code to generate a double word addition of a zero extended
> > operand, looks like:
> > >
> > >         xorl    %r11d, %r11d
> > >         addq    %r10, %rax
> > >         adcq    %r11, %rdx
> > >
> > > when it's possible (as LLVM does) to use an immediate constant:
> > >
> > >         addq    %r10, %rax
> > >         adcq    $0, %rdx
> > >
> > > To do this, the backend required one or two simple changes, that then
> > > themselves required one or two more obscure tweaks.
> > >
> > > The simple starting point is to define a zero_extendditi2 pattern, for
> > > zero extension from DImode to TImode on TARGET_64BIT that is split
> > > after reload.  Double word (TImode) addition/subtraction is split
> > > after reload, so that constrains when things should happen.
> > >
> > > With zero extension now visible to combine, we add two new
> > > define_insn_and_split that add/subtract a zero extended operand in
> > > double word mode.  These apply to both 32-bit and 64-bit code
> > > generation, to produce adc $0 and sbb $0.
> > >
> > > The first strange tweak is that these new patterns interfere with the
> > > optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair of
> > > register moves, or more accurately the combine splitter no longer
> > > triggers as we're now converting two instructions into two
> > > instructions (not three instructions into two instructions).  This is
> > > easily repaired (and extended to handle TImode) by changing from a
> > > pair of define_split (that handle operand commutativity) to a set of
> > > four define_insn_and_split (again to handle operand commutativity).
> > >
> > > The other/final strange tweak that the above splitters now interfere
> > > with AVX512's kunpckdq instruction which is defined as identical RTL,
> > > DW:DI = (HI:SI<<32)|zero_extend(LO:SI).  To distinguish this, and also
> > > avoid AVX512 mask registers being used by reload to perform SImode
> > > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the
> > > unpack mask operations, which matches what sse.md does for the other
> > > mask specific (logic) operations.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2022-06-03  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR target/91681
> > >         * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
> > >         (*add<dwi>3_doubleword_zext): New define_insn_and_split.
> > >         (*sub<dwi>3_doubleword_zext): New define_insn_and_split.
> > >         (*concat<mode><dwi>3_1): New define_insn_and_split replacing
> > >         previous define_split for implementing DST = (HI<<32)|LO as
> > >         pair of move instructions, setting lopart and hipart.
> > >         (*concat<mode><dwi>3_2): Likewise.
> > >         (*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
> > >         (*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.
> > >         * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
> > >         (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
> > >         (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
> > >         (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP
> > > unspec.
> > >         (vec_pack_trunc_<mode>): Likewise.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR target/91681
> > >         * g++.target/i386/pr91681.C: New test case (from the PR).
> > >         * gcc.target/i386/pr91681-1.c: New int128 test case.
> > >         * gcc.target/i386/pr91681-2.c: Likewise.
> > >         * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.
> >
> > +  "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1]) &&
> > + !MEM_P (operands[2])
> > +       : !MEM_P (operands[1])"
> >
> > Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) here
> > instead?
> >
> > (UNKNOWN RTX code is used to prevent unwanted optimization with
> > commutative operands).
> >
> > Uros.

      reply	other threads:[~2022-06-05 18:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-03  9:49 Roger Sayle
2022-06-03 10:08 ` Uros Bizjak
2022-06-03 10:15   ` Uros Bizjak
2022-06-05 11:48   ` Roger Sayle
2022-06-05 18:57     ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4biUta4TOPT8UBEY7L0Hv1jGejzUp5vyxKp7XSJrn-_=w@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).