From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by sourceware.org (Postfix) with ESMTPS id 6A17B3858413 for ; Tue, 9 Aug 2022 08:43:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6A17B3858413 Received: by mail-qv1-xf35.google.com with SMTP id l8so3358746qvr.5 for ; Tue, 09 Aug 2022 01:43:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=YTtU2ZxvJUshplzpFJZsFqW8MzBIedtDqV7V8ozZWkw=; b=8JmL/mvqQWSC3Lcha3jrIuhYyyUcld9VimLJBHBFWIHhgYJkbEj3OVQhcpVYNhf9b1 OvUuwPdETOb3fLKV4B+T0keqZJx2gAdIFVI4LkuddFGJjnUjqo/NmIkPZC5m987Pvijr pE7ZsjxWsi7kAu4nd+WZMLXBHmWWBg56kdjLDGnYqgkr1qJ0QjyGSV/LNI5aEyPbOQMz p+qQdGRz6wK/8jZ/fd1YwNTqaVJn3Mq1KkaHeTMNthWRlPTZZE6CN7o9gLZSI8yMBAVO AIPdaNcniVWeA3y6o4Lg78TzPZlKFE+J0v1PfoapXtbF0R3PAZhhsRr9JClmlL3Sw/wt aU5Q== X-Gm-Message-State: ACgBeo1TxPosqZ9dc1Q3QJ/gBmhxa2lhARMVq4cEYA0WWev04+TEglyf eeiysaK/Kq/SKXJTZB7/syKzWPbhDp6xA3GlcpbJXuAD4pQ= X-Google-Smtp-Source: AA6agR53YjtB4tXexZhtxuq5JCjPmPpEG6eUu39lglm/3wM7OLetLwRNqPEH9OANyz5k3Rlp1NkHqVABwCNHPVbRN1A= X-Received: by 2002:a05:6214:c67:b0:476:e8f8:4f6 with SMTP id t7-20020a0562140c6700b00476e8f804f6mr19362237qvj.125.1660034624774; Tue, 09 Aug 2022 01:43:44 -0700 (PDT) MIME-Version: 1.0 References: <004d01d8abc8$45845920$d08d0b60$@nextmovesoftware.com> In-Reply-To: <004d01d8abc8$45845920$d08d0b60$@nextmovesoftware.com> From: Uros Bizjak Date: Tue, 9 Aug 2022 10:43:33 +0200 Message-ID: Subject: Re: [x86_64 PATCH] Use PTEST to perform AND in TImode STV of (A & B) != 0. To: Roger Sayle Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2022 08:43:46 -0000 On Tue, Aug 9, 2022 at 10:16 AM Roger Sayle wrote: > > > This x86_64 backend patch allows TImode STV to take advantage of the > fact that the PTEST instruction performs an AND operation. Previously > PTEST was (mostly) used for comparison against zero, by using the same > operands. The benefits are demonstrated by the new test case: > > __int128 a,b; > int foo() > { > return (a & b) != 0; > } > > Currently with -O2 -msse4 we generate: > > movdqa a(%rip), %xmm0 > pand b(%rip), %xmm0 > xorl %eax, %eax > ptest %xmm0, %xmm0 > setne %al > ret > > with this patch we now generate: > > movdqa a(%rip), %xmm0 > xorl %eax, %eax > ptest b(%rip), %xmm0 > setne %al > ret > > Technically, the magic happens using new define_insn_and_split patterns. > Using two patterns allows this transformation to performed independently > of whether TImode STV is run before or after combine. The one tricky > case is that immediate constant operands of the AND behave slightly > differently between TImode and V1TImode: All V1TImode immediate operands > becomes loads, but for TImode only values that are not hilo_operands > need to be loaded. Hence the new *testti_doubleword accepts any > general_operand, but internally during split calls force_reg whenever > the second operand is not x86_64_hilo_general_operand. This required > (benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in > more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-08-09 Roger Sayle > > gcc/ChangeLog > * config/i386/i386-features.cc (scalar_chain::convert_compare): > Create new pseudos only when/if needed. Add support for TEST > (i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST. > When broadcasting V2DImode and V4SImode use new pseudo register. > (timode_scalar_chain::convert_op): Do nothing if operand is > already V1TImode. Avoid generating useless SUBREG conversions, > i.e. (SUBREG:V1TImode (REG:V1TImode) 0). Handle CONST_WIDE_INT > in addition to CONST_INT by using CONST_SCALAR_INT_P. > (convertible_comparison_p): Use CONST_SCALAR_INT_P to match both > CONST_WIDE_INT and CONST_INT. Recognize new *testti_doubleword > pattern as an STV candidate. > (timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P > operands in binary logic operations. > > * config/i386/i386.cc (ix86_rtx_costs) : Add costs > for UNSPEC_PTEST; a PTEST that performs an AND has the same cost > as regular PTEST, i.e. cost->sse_op. > > * config/i386/i386.md (*testti_doubleword): New pre-reload > define_insn_and_split that recognizes comparison of TI mode AND > against zero. > * config/i386/sse.md (*ptest_and): New pre-reload > define_insn_and_split that recognizes UNSPEC_PTEST of identical > AND operands. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse4_1-stv-8.c: New test case. OK. BTW, does your patch also handle DImode and SImode comparisons? They can be implemented with PTEST, and perhaps they could benefit from embedded AND, too. Thanks, Uros. > > > Thanks in advance, > Roger > -- >