From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by sourceware.org (Postfix) with ESMTPS id EA4B9385737C for ; Mon, 6 Jun 2022 09:22:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EA4B9385737C Received: by mail-qv1-xf35.google.com with SMTP id s10so6709209qvt.8 for ; Mon, 06 Jun 2022 02:22:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=07QBarFoTlEFEnOpKCuku8RFiAfs+1iIJzJR4xZDj4M=; b=PCOJHN1hmn8Z/WZPpW1dxOxf1cBVkk8ga5pm/oNAWJdUHk43pRteJUlvmbKbjO3TQD 75oDBrGGsDvfS5fDZeH/vI5zoVXJMYJygbG2EWTkP7vGvbVFetEB5dBRlnwCPdy5PHhs 6Y2G4MKRG9Vqfk+82euEFPsolzTfRNzrweexj/szpwSqwzmhO2YdRCiIoPRHa6jS9OXo krAduyKi7OaxCe1jVPyI4cgRTJrIeC+JvMwtXljmyGvRSK7QN4JNEsx/CGjhjpP/BoMk f4vGEsh5avTPVJgOLFCkroRibdYEJYENsiMyRDGKrSKES06kg3LXtmmOHf0TtcbOwqWr 3h+g== X-Gm-Message-State: AOAM5338TcNNM/F6Qj3JtCoKqAaErH/iAkkmuZ41Uip1juctHfEr//lW MXQD9+zizSAg79iJveRbbmMGp9l2Dk8e17y6011iMFrbdjg= X-Google-Smtp-Source: ABdhPJyumMi4Qej4BHXN3KwAvWzQQpKcWafK1wZkc9CbvsfHgvThaKUvfP1W/tcZNUpQQ1utzSwu6Dm8GNmKOLJ9YQ8= X-Received: by 2002:a0c:aa99:0:b0:467:8ac1:3ee with SMTP id f25-20020a0caa99000000b004678ac103eemr15889609qvb.2.1654507357221; Mon, 06 Jun 2022 02:22:37 -0700 (PDT) MIME-Version: 1.0 References: <016e01d87900$615a7a30$240f6e90$@nextmovesoftware.com> <034301d8797e$b05350c0$10f9f240$@nextmovesoftware.com> In-Reply-To: <034301d8797e$b05350c0$10f9f240$@nextmovesoftware.com> From: Uros Bizjak Date: Mon, 6 Jun 2022 11:22:26 +0200 Message-ID: Subject: Re: [x86 PATCH] Double word implementation of and; cmp to not; test optimization. To: Roger Sayle Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jun 2022 09:22:39 -0000 On Mon, Jun 6, 2022 at 10:23 AM Roger Sayle wr= ote: > > > Hi Uros, > > > The major theme of this patch is to generalize many of i386.md's > > > *di3_doubleword patterns to become *_doubleword patterns, i.e. > > > whenever there exists a "double word" optimization for DImode with > > > -m32, there should be an equivalent TImode optimization on TARGET_64B= IT. > > > > No, please do not mix two different themes in one patch. > > > > OTOH, the only TImode optimization that can be used with SSE registers = is with > > logic instructions and some constant shifts, but there is no TImode ari= thmetic. I > > assume your end goal is to introduce STV for TImode on 64-bit targets, = because > > DImode patterns for x86_32 were introduced to avoid early decomposition= by > > middle end and to split instructions that STV didn't convert to vector = instructions > > after STV pass. So, let's start with basic V1TImode support before opti= mizations > > are introduced. > > I'm not sure I understand. What basic V1TImode support do you/we want ne= xt? > > This testcase and worked example with this patch shows its benefits witho= ut STV > nor using V1TI mode vectors. As explained in the subject, and;cmp can be= turned > into the cheaper not;cmp $0, for TImode (and DImode with -m32) in the sam= e way > as we currently do for SImode everywhere. Having double word modes visib= le to > combine, allows it to work its magic. A recent patch ensured that double= word > compares were visible to combine, this optimization just required that do= uble > word logic (AND, IOR and XOR) are visible after combine, and in fact for = -m32 DImode > they already are, it's just that TImode is inconsistent, leading to misse= d optimizations. > Likewise, STV can't choose between implementations before there are alter= native > Implementations to choose from. Let me clarify my statement: When double-mode patterns are NOT present, the middle-end splits double-mode operations to word-mode at expansion time, taking into account constant propagation on split operations, etc. The reason that DImode patterns are present are due to STV on a 32-bit target, which wants double-word operations to be unsplit until STV pass. Unfortunately, this approach inhibits constant propagation, and the missing functionality was implemented in a "manual way" when operations are split to word-mode. Without targeting STV, optimization opportunities are quite small (one of them is the conversion you proposed above), so there is no pressing need to introduce TImode operations. So, by extending all DImode logic patterns to also handle TImode on x86_64, we can also use them to implement TImode STV pass on x86_64. This is something that would have a noticeable impact on the generated code. Uros. > As always I'm happy to do things in the order you want (modulo my 36 hour= spin > cycle), in fact the reason this is being done now is that you recommended= it best > to fix pr65105-5.c after the "double word comparison", which I fully agre= e with, > as it leads to a better solution that doesn=E2=80=99t require peephole2 (= in your own words, > "why isn't this being done in combine?"). > > I'm also certainly misunderstanding. Which piece needs to be done next? > > Perhaps I should have used the term "the common theme" rather than > "the major theme" that may have made it sound like there were unrelated > or Independent bits in this patch. But there are no V1TI changes in it. > > Thanks in advance, for any clarification. > > Cheers, > Roger > -- > >