From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ubizjak@gmail.com>
Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com
 [IPv6:2607:f8b0:4864:20::f35])
 by sourceware.org (Postfix) with ESMTPS id 8FDF23AA9C1E
 for <gcc-patches@gcc.gnu.org>; Fri,  3 Jun 2022 10:15:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FDF23AA9C1E
Received: by mail-qv1-xf35.google.com with SMTP id a9so5261067qvt.6
 for <gcc-patches@gcc.gnu.org>; Fri, 03 Jun 2022 03:15:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=QsCrRhp4e0EP0w2ERDaXnxLVF4mHjN9j1zTkgG+5+IM=;
 b=ft/MCcaYbJGWovhNPIHZjNyJuyBZGNzJZ7d1gTP9lSSO8MHNj46yGfkcWcyBuxcKF8
 YBjKJPNSsueCK67ABga9J87YZnUyOponAf1x+b49Rh/hWpfGl4OLKdBfSAdLMnCzRdY2
 j9yWacpml4Uje3hWwMeH7Y5qROG00FdjMqWGQx4icGAJvMzV/XOqLFFxCs9oS2Z0xMOa
 cG2Bnwz5S2hJwtrUo/EST5NsxR2mta20s8dsPYu1XdwqH9LX7WYohys+ku+MBgkIjc9w
 NkKZIKnmrVbv0gcfabRIWzLFMg4UUIjrdomNcZr+B8L71R4OVHEebdRHRERu0wPLAd/n
 78sw==
X-Gm-Message-State: AOAM5327nlaiGE/Z2pDiA4JwC8ZEjxojvCHCRIihbDc9iXbm8XOMJyCO
 Hjrn54jGEmfPyh77ElMVImAKBSpf7B+cdMGN9cPoihtztZf1gg==
X-Google-Smtp-Source: ABdhPJwNUEfVsz7AUwbkDX2r7p2hEW/oEIBgj0b06nG355w/ixvPgBsnoUF3HA2GN8GtPGlHAwf1IsJfrI3fGDwJr5c=
X-Received: by 2002:a0c:fd6b:0:b0:462:5f5f:9ef with SMTP id
 k11-20020a0cfd6b000000b004625f5f09efmr40149727qvs.48.1654251317900; Fri, 03
 Jun 2022 03:15:17 -0700 (PDT)
MIME-Version: 1.0
References: <0af601d8772f$43ba22f0$cb2e68d0$@nextmovesoftware.com>
 <CAFULd4ZHSN_5+QOMo5FvSb7DzmN-FH=0OCX3+oWHApQRrXK7Qg@mail.gmail.com>
In-Reply-To: <CAFULd4ZHSN_5+QOMo5FvSb7DzmN-FH=0OCX3+oWHApQRrXK7Qg@mail.gmail.com>
From: Uros Bizjak <ubizjak@gmail.com>
Date: Fri, 3 Jun 2022 12:15:07 +0200
Message-ID: <CAFULd4ZMd-J_8HtuuJd1YL59hJpwChj0vimkB8RePsi=8wcmnQ@mail.gmail.com>
Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
 optimizations.
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2022 10:15:20 -0000

On Fri, Jun 3, 2022 at 12:08 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > Technically, PR target/91681 has already been resolved; we now recognize the
> > highpart multiplication at the tree-level, we no longer use the stack, and
> > we currently generate the same number of instructions as LLVM.  However, it
> > is still possible to do better, the current x86_64 code to generate a double
> > word addition of a zero extended operand, looks like:
> >
> >         xorl    %r11d, %r11d
> >         addq    %r10, %rax
> >         adcq    %r11, %rdx
> >
> > when it's possible (as LLVM does) to use an immediate constant:
> >
> >         addq    %r10, %rax
> >         adcq    $0, %rdx
> >
> > To do this, the backend required one or two simple changes, that
> > then themselves required one or two more obscure tweaks.
> >
> > The simple starting point is to define a zero_extendditi2 pattern,
> > for zero extension from DImode to TImode on TARGET_64BIT that is
> > split after reload.  Double word (TImode) addition/subtraction is
> > split after reload, so that constrains when things should happen.
> >
> > With zero extension now visible to combine, we add two new
> > define_insn_and_split that add/subtract a zero extended operand
> > in double word mode.  These apply to both 32-bit and 64-bit code
> > generation, to produce adc $0 and sbb $0.
> >
> > The first strange tweak is that these new patterns interfere with
> > the optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair
> > of register moves, or more accurately the combine splitter no longer
> > triggers as we're now converting two instructions into two instructions
> > (not three instructions into two instructions).  This is easily
> > repaired (and extended to handle TImode) by changing from a pair
> > of define_split (that handle operand commutativity) to a set of
> > four define_insn_and_split (again to handle operand commutativity).
> >
> > The other/final strange tweak that the above splitters now interfere
> > with AVX512's kunpckdq instruction which is defined as identical RTL,
> > DW:DI = (HI:SI<<32)|zero_extend(LO:SI).  To distinguish this, and
> > also avoid AVX512 mask registers being used by reload to perform
> > SImode scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP)
> > to the unpack mask operations, which matches what sse.md does for
> > the other mask specific (logic) operations.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-06-03  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/91681
> >         * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
> >         (*add<dwi>3_doubleword_zext): New define_insn_and_split.
> >         (*sub<dwi>3_doubleword_zext): New define_insn_and_split.
> >         (*concat<mode><dwi>3_1): New define_insn_and_split replacing
> >         previous define_split for implementing DST = (HI<<32)|LO as
> >         pair of move instructions, setting lopart and hipart.
> >         (*concat<mode><dwi>3_2): Likewise.
> >         (*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
> >         (*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.
> >         * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
> >         (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
> >         (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
> >         (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP
> > unspec.
> >         (vec_pack_trunc_<mode>): Likewise.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/91681
> >         * g++.target/i386/pr91681.C: New test case (from the PR).
> >         * gcc.target/i386/pr91681-1.c: New int128 test case.
> >         * gcc.target/i386/pr91681-2.c: Likewise.
> >         * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.

+(define_insn_and_split "*concat<mode><dwi>3_1"
+  [(set (match_operand:<DWI> 0 "register_operand" "=r")
+ (any_or_plus:<DWI>
+  (ashift:<DWI> (match_operand:<DWI> 1 "register_operand" "r")
+ (match_operand:<DWI> 2 "const_int_operand" "n"))

You can remove "n" when we deal with non-commutative (without %) operands.

+  (zero_extend:<DWI> (match_operand:DWIH 3 "register_operand" "r"))))]
+  "INTVAL (operands[2]) == <MODE_SIZE> * BITS_PER_UNIT
+  && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 5) (match_dup 6))]
 {
-  operands[3] = gen_highpart (SImode, operands[0]);
-  operands[4] = gen_lowpart (SImode, operands[1]);
-  operands[5] = gen_lowpart (SImode, operands[0]);
+  operands[4] = gen_lowpart (<MODE>mode, operands[0]);
+  operands[5] = gen_highpart (<MODE>mode, operands[0]);
+  operands[6] = gen_lowpart (<MODE>mode, operands[1]);

Pre-reload splitters need to force_reg their register_operands. You
can have SUBREG here and gen_lowpar/gen_higpart will fail on SUBREG.

Uros.