From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 30C47385828E for ; Sun, 10 Jul 2022 18:36:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30C47385828E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=OLITwUNeF6bm8gsCVVTWuRzJu5moc9ry24iA+BUdkEE=; b=juYyu02R1LbYqWX83qNndWnKtG JDKAemVFWZT8lNap7N0jABvs1uGZFSO968+iTNBhpMGNu4bnfAm8XtJ0Sn5jjIzMF8szQ22Pfx6RO DsUiKrkxFLU/Jbaxz3phsSrjZnOkvFckU0toh3HuAY/sCvhymVD4B8t/9lF0dIINh+R7Ud1ttJJyd AM41e+nH+y4YzAAriqc7cSA43lGWg7LMXecI5l1ycrOG/57CQpe/VdRGCstkCP3zlEj3OmEvRmANu 7uBppIXqQdC1D5+IuLkFHdtEpzbLaeOUPM35JynWYFi1Whl1bTUcCxYz/UL6b6E9gw/LYD2ZbUxso VaJvBQgQ==; Received: from host86-130-134-60.range86-130.btcentralplus.com ([86.130.134.60]:53686 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oAbms-0006dG-HD; Sun, 10 Jul 2022 14:36:10 -0400 From: "Roger Sayle" To: "'Uros Bizjak'" Cc: , "'H. J. Lu'" References: <000901d8938d$ead4dc40$c07e94c0$@nextmovesoftware.com> In-Reply-To: Subject: RE: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode. Date: Sun, 10 Jul 2022 19:36:07 +0100 Message-ID: <00f201d8948b$ec82a6e0$c587f4a0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQDLEU72DOSSolIGHYPmM3CDaDY5HQFm5hbmr4e6aZA= Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jul 2022 18:36:12 -0000 Hi Uros, Yes, I agree. I think it makes sense to have a single STV pass (after combine and before reload). Let's hear what HJ thinks, but I'm happy to investigate a follow-up patch that unifies the STV passes. But it'll be easier to confirm there are no "code generation" changes if those modifications are pushed independently of these ones. Time to look into the (git) history of multiple STV passes... Thanks for the review. I'll wait for HJ's thoughts. Cheers, Roger -- > -----Original Message----- > From: Uros Bizjak > Sent: 10 July 2022 19:06 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; H. J. Lu > Subject: Re: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support = for > TImode to V1TImode. >=20 > On Sat, Jul 9, 2022 at 2:17 PM Roger Sayle = > wrote: > > > > > > This patch upgrades x86_64's scalar-to-vector (STV) pass to more > > aggressively transform 128-bit scalar TImode operations into vector > > V1TImode operations performed on SSE registers. TImode = functionality > > already exists in STV, but only for move operations, this changes > > brings support for logical operations (AND, IOR, XOR, NOT and ANDN) > > and comparisons. > > > > The effect of these changes are conveniently demonstrated by the new > > sse4_1-stv-5.c test case: > > > > __int128 a[16]; > > __int128 b[16]; > > __int128 c[16]; > > > > void foo() > > { > > for (unsigned int i=3D0; i<16; i++) > > a[i] =3D b[i] & ~c[i]; > > } > > > > which when currently compiled on mainline wtih -O2 -msse4 produces: > > > > foo: xorl %eax, %eax > > .L2: movq c(%rax), %rsi > > movq c+8(%rax), %rdi > > addq $16, %rax > > notq %rsi > > notq %rdi > > andq b-16(%rax), %rsi > > andq b-8(%rax), %rdi > > movq %rsi, a-16(%rax) > > movq %rdi, a-8(%rax) > > cmpq $256, %rax > > jne .L2 > > ret > > > > but with this patch now produces: > > > > foo: xorl %eax, %eax > > .L2: movdqa c(%rax), %xmm0 > > pandn b(%rax), %xmm0 > > addq $16, %rax > > movaps %xmm0, a-16(%rax) > > cmpq $256, %rax > > jne .L2 > > ret > > > > Technically, the STV pass is implemented by three C++ classes, a > > common abstract base class "scalar_chain" that contains common > > functionality, and two derived classes: general_scalar_chain (which > > handles SI and DI modes) and timode_scalar_chain (which handles TI > > modes). As mentioned previously, because only TI mode moves were > > handled the two worker classes behaved significantly differently. > > These changes bring the functionality of these two classes closer > > together, which is reflected by refactoring more shared code from > > general_scalar_chain to the parent scalar_chain and reusing it from > > timode. There still remain significant differences (and > > simplifications) so the existing division of classes (as = specializations) continues > to make sense. >=20 > Please note that there are in fact two STV passes, one before combine = and the > other after combine. The TImode pass that previously handled only = loads and > stores is positioned before combine (there was a reason for this = decision, but I > don't remember the details - let's ask HJ...). However, DImode STV = pass > transforms much more instructions and the reason it was positioned = after the > combine pass was that STV pass transforms optimized insn stream where > forward propagation was already performed. >=20 > What is not clear to me from the above explanation is: is the new = TImode STV > pass positioned after the combine pass, and if this is the case, how = the change > affects current load/store TImode STV pass. I must admit, I don't like = two > separate STV passess, so if TImode is now similar to DImode, I suggest = we > abandon STV1 pass and do everything concerning TImode after the = combine > pass. HJ, what is your opinion on this? >=20 > Other than the above, the patch LGTM to me. >=20 > Uros. >=20 > > Obviously, there are more changes to come (shifts and rotates), and > > compute_convert_gain doesn't yet have its final (tuned) form, but is > > already an improvement over the "return 1;" used previously. > > > > This patch has been tested on x86_64-pc-linux-gnu with make boostrap > > and make -k check, both with and without --target_board=3Dunix{-m32} > > with no new failures. Ok for mainline? > > > > > > 2022-07-09 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-features.h (scalar_chain): Add fields > > insns_conv, n_sse_to_integer and n_integer_to_sse to this > > parent class, moved from general_scalar_chain. > > (scalar_chain::convert_compare): Protected method moved > > from general_scalar_chain. > > (mark_dual_mode_def): Make protected, not private virtual. > > (scalar_chain:convert_op): New private virtual method. > > > > (general_scalar_chain::general_scalar_chain): Simplify = constructor. > > (general_scalar_chain::~general_scalar_chain): Delete = destructor. > > (general_scalar_chain): Move insns_conv, n_sse_to_integer = and > > n_integer_to_sse fields to parent class, scalar_chain. > > (general_scalar_chain::mark_dual_mode_def): Delete = prototype. > > (general_scalar_chain::convert_compare): Delete prototype. > > > > (timode_scalar_chain::compute_convert_gain): Remove = simplistic > > implementation, convert to a method prototype. > > (timode_scalar_chain::mark_dual_mode_def): Delete prototype. > > (timode_scalar_chain::convert_op): Prototype new virtual = method. > > > > * config/i386/i386-features.cc (scalar_chain::scalar_chain): > > Allocate insns_conv and initialize n_sse_to_integer and > > n_integer_to_sse fields in constructor. > > (scalar_chain::scalar_chain): Free insns_conv in destructor. > > > > (general_scalar_chain::general_scalar_chain): Delete > > constructor, now defined in the class declaration. > > (general_scalar_chain::~general_scalar_chain): Delete = destructor. > > > > (scalar_chain::mark_dual_mode_def): Renamed from > > general_scalar_chain::mark_dual_mode_def. > > (timode_scalar_chain::mark_dual_mode_def): Delete. > > (scalar_chain::convert_compare): Renamed from > > general_scalar_chain::convert_compare. > > > > (timode_scalar_chain::compute_convert_gain): New method to > > determine the gain from converting a TImode chain to = V1TImode. > > (timode_scalar_chain::convert_op): New method to convert an > > operand from TImode to V1TImode. > > > > (timode_scalar_chain::convert_insn) : Only = PUT_MODE > > on REG_EQUAL notes that were originally TImode (not = CONST_INT). > > Handle AND, ANDN, XOR, IOR, NOT and COMPARE. > > (timode_mem_p): Helper predicate to check where operand is > > memory reference with sufficient alignment for TImode STV. > > (timode_scalar_to_vector_candidate_p): Use = convertible_comparison_p > > to check whether COMPARE is convertible. Handle SET_DESTs = that > > that are REG_P or MEM_P and SET_SRCs that are REG, = CONST_INT, > > CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse4_1-stv-2.c: New test case, pand. > > * gcc.target/i386/sse4_1-stv-3.c: New test case, por. > > * gcc.target/i386/sse4_1-stv-4.c: New test case, pxor. > > * gcc.target/i386/sse4_1-stv-5.c: New test case, pandn. > > * gcc.target/i386/sse4_1-stv-6.c: New test case, ptest. > > > > Roger > > -- > >