From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by sourceware.org (Postfix) with ESMTPS id 62EB63AA9C1E for ; Fri, 3 Jun 2022 10:08:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 62EB63AA9C1E Received: by mail-qt1-x836.google.com with SMTP id x16so1901582qtw.12 for ; Fri, 03 Jun 2022 03:08:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o9MBJ6thfSGDVFBNix/m7C/9qOanKHn7zkeV5Yl882g=; b=zKqcqwA/PklvnhnV/oRNbl/twD8J/zEtHRsW2mu7Ivn7psFgpDyLsYR9N9B0Z4g4gJ cNUwu1cehxwc8iyjB4FmCuaFdn4r9Pf5GKeMk+gcPZCONl6jOZ+M82sd88qY42/G47zg hdf9ueB58Tus2YC67WDbDUBKJSqYxRRujxvxI4uc4qJfSIQs3n+g9R/vbAgyqXEzrvdm 5cJvetdfxMRj5G/MggR3kARrsUPxsmpoGPQ5dv/29mwqSkLXJ7DWrc0SXkyM5xE8wYVX rFd0njf7AonyY5vSAOEcIsQFLwWx5YrLt8jtczCS27r70lvvsHs50MaxpOfzmtcaoRJJ x6fA== X-Gm-Message-State: AOAM532vq7p5qfac6KTqd+AW6DslfA6ZDA8QWVqEjjYOWt/mIWDlHGMV ZlDUqYU+ZBMhy4UEAjFZuUcfZKx80WhrVoGEucs5y/ycT1TIsA== X-Google-Smtp-Source: ABdhPJy+pmpm0kf4g78ngqbQDjvkvXdRoWA8wxqcX/WG6kfHPIYcUJKP4oQeykMxTQvAcVzZKs2y+FbwlFLzqBQzIIc= X-Received: by 2002:a05:622a:c7:b0:2fa:ec2f:a386 with SMTP id p7-20020a05622a00c700b002faec2fa386mr6687592qtw.5.1654250906654; Fri, 03 Jun 2022 03:08:26 -0700 (PDT) MIME-Version: 1.0 References: <0af601d8772f$43ba22f0$cb2e68d0$@nextmovesoftware.com> In-Reply-To: <0af601d8772f$43ba22f0$cb2e68d0$@nextmovesoftware.com> From: Uros Bizjak Date: Fri, 3 Jun 2022 12:08:15 +0200 Message-ID: Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations. To: Roger Sayle Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2022 10:08:29 -0000 On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle wrote: > > > Technically, PR target/91681 has already been resolved; we now recognize the > highpart multiplication at the tree-level, we no longer use the stack, and > we currently generate the same number of instructions as LLVM. However, it > is still possible to do better, the current x86_64 code to generate a double > word addition of a zero extended operand, looks like: > > xorl %r11d, %r11d > addq %r10, %rax > adcq %r11, %rdx > > when it's possible (as LLVM does) to use an immediate constant: > > addq %r10, %rax > adcq $0, %rdx > > To do this, the backend required one or two simple changes, that > then themselves required one or two more obscure tweaks. > > The simple starting point is to define a zero_extendditi2 pattern, > for zero extension from DImode to TImode on TARGET_64BIT that is > split after reload. Double word (TImode) addition/subtraction is > split after reload, so that constrains when things should happen. > > With zero extension now visible to combine, we add two new > define_insn_and_split that add/subtract a zero extended operand > in double word mode. These apply to both 32-bit and 64-bit code > generation, to produce adc $0 and sbb $0. > > The first strange tweak is that these new patterns interfere with > the optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair > of register moves, or more accurately the combine splitter no longer > triggers as we're now converting two instructions into two instructions > (not three instructions into two instructions). This is easily > repaired (and extended to handle TImode) by changing from a pair > of define_split (that handle operand commutativity) to a set of > four define_insn_and_split (again to handle operand commutativity). > > The other/final strange tweak that the above splitters now interfere > with AVX512's kunpckdq instruction which is defined as identical RTL, > DW:DI = (HI:SI<<32)|zero_extend(LO:SI). To distinguish this, and > also avoid AVX512 mask registers being used by reload to perform > SImode scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) > to the unpack mask operations, which matches what sse.md does for > the other mask specific (logic) operations. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-06-03 Roger Sayle > > gcc/ChangeLog > PR target/91681 > * config/i386/i386.md (zero_extendditi2): New define_insn_and_split. > (*add3_doubleword_zext): New define_insn_and_split. > (*sub3_doubleword_zext): New define_insn_and_split. > (*concat3_1): New define_insn_and_split replacing > previous define_split for implementing DST = (HI<<32)|LO as > pair of move instructions, setting lopart and hipart. > (*concat3_2): Likewise. > (*concat3_3): Likewise, where HI is zero_extended. > (*concat3_4): Likewise, where HI is zero_extended. > * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec. > (kunpcksi): Likewise, add UNSPEC_MASKOP unspec. > (kunpckdi): Likewise, add UNSPEC_MASKOP unspec. > (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP > unspec. > (vec_pack_trunc_): Likewise. > > gcc/testsuite/ChangeLog > PR target/91681 > * g++.target/i386/pr91681.C: New test case (from the PR). > * gcc.target/i386/pr91681-1.c: New int128 test case. > * gcc.target/i386/pr91681-2.c: Likewise. > * gcc.target/i386/pr91681-3.c: Likewise, but for ia32. + "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1]) + && !MEM_P (operands[2]) + : !MEM_P (operands[1])" Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) here instead? (UNKNOWN RTX code is used to prevent unwanted optimization with commutative operands). Uros.