From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id EAFAF3986406 for ; Sun, 20 Nov 2022 16:38:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAFAF3986406 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x102c.google.com with SMTP id e7-20020a17090a77c700b00216928a3917so12370874pjs.4 for ; Sun, 20 Nov 2022 08:38:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=98A7NCpVnF3fphnjxZppjyxelDpcq2mJUyYzwEC1IwM=; b=ZTra1J5bhyDfCfp97z9arADX0UImpaiA+7HsxbuxlqkYu5sB7gq2cvJqSpM5ysNrLq RWTa3SulbWcNP8Wgr9a1wF6Cozt7gDe5WlQxgzy+IHQp0VWAds7NRAuuJsaYwucLj5R5 jPx1WWdjLVSWBJ7wLF5yCYKn8id9cyHpM7kVUboCsCBju+MLccTF+VIjMoKxhwJnfwkf M11ARHLovlHgPu0gOSZDQ2VyOFY6bmdplTKMDpIkiFw+4FzACPbIkPK+sqt8u5rOkLPh U/I6wqdw1zegYb4nJ9UnJOFawaNagew2Id6Fg2mqzQZ7qoGpX4xSGd9pNGb4BjcJ64h6 h+QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=98A7NCpVnF3fphnjxZppjyxelDpcq2mJUyYzwEC1IwM=; b=V7+/v6WT2KNp5w79ZiGxAeAH4AiIYdFshIx3Rh0nYjLvGer28ewa8swpY/hlBrrEAG EYgmdhg2rBE1kWavfXwAm+dkm0XviYUQ7vYLXLvKic3Hbbz5fmZdQvlwWYNCOqpQsu4q KqYZ/5LGaslGmprgjUb4IHJPi5uWB/QVv3rfAgBmnZkAGheJDiCDoRa8gXIK4Chj3JN2 84f7YG5m8UcrEvpE5dFzFEUzNoV7yVPPeZaRVomGbgpVcyCXtr4YemNU/fu5DoVg7CIY C5mJeM05+cgWfJ+ccSr2dzvzdscnWlFmUnlYkndZ0ZAJJ5eYa+8CXKyYZW7kXD6ZFN4P MDfQ== X-Gm-Message-State: ANoB5plf7/OejDB7X5ff8VZJH4Ij8ZxcyJHUV4Okr8wTdQ/KYmqLzy9+ 2u6C4XVm7+y9q2XolXoB/qY= X-Google-Smtp-Source: AA0mqf5xHbmPmaEuVYAxJVIEVVKsGaLOeCBJQICOia2viE11+ujyTubA68aNO+jO3WI0xxRcFFICFQ== X-Received: by 2002:a17:902:ce04:b0:187:3a54:9b93 with SMTP id k4-20020a170902ce0400b001873a549b93mr3682172plg.2.1668962329347; Sun, 20 Nov 2022 08:38:49 -0800 (PST) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id d10-20020a170902654a00b00188fcc4fc00sm5970333pln.79.2022.11.20.08.38.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 20 Nov 2022 08:38:48 -0800 (PST) Message-ID: <12bcc671-c96b-aa78-9a2d-9a832b389148@gmail.com> Date: Sun, 20 Nov 2022 09:38:45 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split Content-Language: en-US To: Philipp Tomsich , gcc-patches@gcc.gnu.org Cc: Vineet Gupta , Kito Cheng , Jeff Law , Manolis Tsamis , Palmer Dabbelt , Christoph Muellner References: <20221109231006.3240799-1-philipp.tomsich@vrull.eu> From: Jeff Law In-Reply-To: <20221109231006.3240799-1-philipp.tomsich@vrull.eu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 11/9/22 16:10, Philipp Tomsich wrote: > The current method of treating shifts of extended values on RISC-V > frequently causes sequences of 3 shifts, despite the presence of the > 'zero_extendsidi2_shifted' pattern. > > Consider: > unsigned long f(unsigned int a, unsigned long b) > { > a = a << 1; > unsigned long c = (unsigned long) a; > c = b + (c<<4); > return c; > } > which will present at combine-time as: > Trying 7, 8 -> 9: > 7: r78:SI=r81:DI#0<<0x1 > REG_DEAD r81:DI > 8: r79:DI=zero_extend(r78:SI) > REG_DEAD r78:SI > 9: r72:DI=r79:DI<<0x4 > REG_DEAD r79:DI > Failed to match this instruction: > (set (reg:DI 72 [ _1 ]) > (and:DI (ashift:DI (reg:DI 81) > (const_int 5 [0x5])) > (const_int 68719476704 [0xfffffffe0]))) > and produce the following (optimized) assembly: > f: > slliw a5,a0,1 > slli a5,a5,32 > srli a5,a5,28 > add a0,a5,a1 > ret > > The current way of handling this (in 'zero_extendsidi2_shifted') > doesn't apply for two reasons: > - this is seen before reload, and > - (more importantly) the constant mask is not 0xfffffffful. > > To address this, we introduce a generalized version of shifting > zero-extended values that supports any mask of consecutive ones as > long as the number of training zeros is the inner shift-amount. > > With this new split, we generate the following assembly for the > aforementioned function: > f: > slli a0,a0,33 > srli a0,a0,28 > add a0,a0,a1 > ret > > Unfortunately, all of this causes some fallout (especially in how it > interacts with Zb* extensions and zero_extract expressions formed > during combine): this is addressed through additional instruction > splitting and handling of zero_extract. > > gcc/ChangeLog: > > * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed > as an and:DI. > (*andi_add.uw): New pattern. > (*slli_slli_uw): New pattern. > (*shift_then_shNadd.uw): New pattern. > (*slliuw): Rename to riscv_slli_uw. > (riscv_slli_uw): Renamed from *slliuw. > (*zeroextract2_highbits): New pattern. > (*zero_extract): New pattern, which will be split to > shift-left + shift-right. > * config/riscv/predicates.md (dimode_shift_operand): > * config/riscv/riscv.md (*zero_extract_lowbits): > (zero_extendsidi2_shifted): Rename. > (*zero_extendsidi2_shifted): Generalize. > (*shift_truthvalue): New pattern. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/shift-shift-6.c: New test. > * gcc.target/riscv/shift-shift-7.c: New test. > * gcc.target/riscv/shift-shift-8.c: New test. > * gcc.target/riscv/shift-shift-9.c: New test. > * gcc.target/riscv/snez.c: New test. > > Commit notes: > - Depends on a predicate posted in "RISC-V: Optimize branches testing > a bit-range or a shifted immediate". Depending on the order of > applying these, I'll take care to pull that part out of the other > patch if needed. > > Version-changes: 2 > - refactor > - optimise for additional corner cases and deal with fallout > > Version-changes: 3 > - removed the [WIP] from the commit message (no other changes) > > Signed-off-by: Philipp Tomsich > --- > > (no changes since v1) > > gcc/config/riscv/bitmanip.md | 142 ++++++++++++++---- > gcc/config/riscv/predicates.md | 5 + > gcc/config/riscv/riscv.md | 75 +++++++-- > .../gcc.target/riscv/shift-shift-6.c | 14 ++ > .../gcc.target/riscv/shift-shift-7.c | 16 ++ > .../gcc.target/riscv/shift-shift-8.c | 20 +++ > .../gcc.target/riscv/shift-shift-9.c | 15 ++ > gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ > 8 files changed, 261 insertions(+), 40 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c > create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md > index 78fdf02c2ec..06126ac4819 100644 > --- a/gcc/config/riscv/bitmanip.md > +++ b/gcc/config/riscv/bitmanip.md > @@ -29,7 +29,20 @@ > [(set_attr "type" "bitmanip,load") > (set_attr "mode" "DI")]) > > -(define_insn "riscv_shNadd" > +;; We may end up forming a slli.uw with an immediate of 0 (while > +;; splitting through "*slli_slli_uw", below). > +;; Match this back to a zext.w > +(define_insn "*zext.w" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") > + (const_int 0)) > + (const_int 4294967295)))] > + "TARGET_64BIT && TARGET_ZBA" > + "zext.w\t%0,%1" > + [(set_attr "type" "bitmanip") > + (set_attr "mode" "DI")]) Would it be better to detect that we're going to create a shift count of zero  and a -1 mask and emit RTL for a SI->DI zero extension directly rather than having this pattern? It overall looks sensible -- I didn't check all the conditions and such, just the overall structure. Jeff