From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=gPoH=BV=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629])
	by sourceware.org (Postfix) with ESMTPS id 678E4385840D
	for <gcc-patches@gcc.gnu.org>; Thu,  1 Jun 2023 18:52:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 678E4385840D
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1b01d912a76so6635055ad.2
        for <gcc-patches@gcc.gnu.org>; Thu, 01 Jun 2023 11:52:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1685645529; x=1688237529;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=jhsskk46k2OD74EdLzFIUbhlnXxapr0rfPHEM4sidng=;
        b=dDJWoKJhfse5/VTqcKOsrv/e8kUVXlKChv+Krfx20sMnP++qclNKpJRLEF8EyoRlAi
         II2UG/HXvRO0ATguJjXfVshy8Snp9GKJmWb8KEj9nMO/w0zO1h9zJOxSRjsQ+ZTDwexc
         9wUna/CjYxrNSk89sxcbPZKcfPkDUR1WBujWtOii4HpaWdhYIvZ1ZGVduwWIhrKJWx2P
         pb0JSNLUZ7nQ8r/l3WkRzmkZnzH++uI9r045S6WsDMdcrD1AkUONQFhYrDhb1NTGfoEV
         BL/gnZ3UE5hD0kScavbtYh7tIj37/99ZBW4QLB2VmP957yIjwBGFac34OQ1/6ZEeX+lz
         YxYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1685645529; x=1688237529;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=jhsskk46k2OD74EdLzFIUbhlnXxapr0rfPHEM4sidng=;
        b=KcI2+g8PrjBfyc4wbBGVpUbL/y9mXkfltT18UCpRoGX4b7iFEsqjoB+pOwyk/9GuXF
         pv5jvOB3Z+dGVV6zBjF25+ESyQRjyRDLKVeIaQz6ZBDupfCjC+1azxrsKUX8ieDe7dRc
         2pCzybpFeCeJ0bxIByW4XqGCuA0n53sZmAFyd+cNP/udasqZYIQcSaOFVkk68yLBdArI
         gtZgRRwzL2M2xFvdZ8FoaFeDxU8zmSdQKOXjk4Cz+ZR74tNVl5nW5eO1ESn1fd5vPYdR
         DXm1eVNbT+DfwJhumyBituWQ0NyllF0UTIoHHCJNvVChhlJPI8Q5xJ8Lfe5b1gpoCXwk
         dwxg==
X-Gm-Message-State: AC+VfDzhQAVahLZFlif0hZYRZkte4jXlKB4QBCN+3JIwTK626oCG2oMV
	68IOxbHfpTGdaffnAwXTFYc=
X-Google-Smtp-Source: ACHHUZ6KeMrz6CCiMdfz6u2WnvmcWySZqxDN4B3W167j90r46eXMeM8mLKuusz5R+2I0sp9N7GIj2Q==
X-Received: by 2002:a17:903:284:b0:1ad:fa2e:17f8 with SMTP id j4-20020a170903028400b001adfa2e17f8mr288067plr.12.1685645529297;
        Thu, 01 Jun 2023 11:52:09 -0700 (PDT)
Received: from ?IPV6:2601:681:8d00:265::f0a? ([2601:681:8d00:265::f0a])
        by smtp.gmail.com with ESMTPSA id b1-20020a170902d50100b0019aaab3f9d7sm3862377plg.113.2023.06.01.11.52.08
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 01 Jun 2023 11:52:08 -0700 (PDT)
Message-ID: <8d19e9af-094f-aff3-027b-7f6bfa7c1324@gmail.com>
Date: Thu, 1 Jun 2023 12:52:07 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.1
Subject: Re: [PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering
 optimization
Content-Language: en-US
To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com,
 palmer@rivosinc.com, rdapp.gcc@gmail.com
References: <20230601034823.235258-1-juzhe.zhong@rivai.ai>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <20230601034823.235258-1-juzhe.zhong@rivai.ai>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 5/31/23 21:48, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
> 
> 1. This patch optimize the codegen of the following auto-vectorization codes:
> 
> void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n)
> {
>      for (int i = 0; i < n; i++)
>        c[i] = (int64_t)a[i] + b[i];
> }
> 
> Combine instruction from:
> 
> ...
> vsext.vf2
> vadd.vv
> ...
> 
> into:
> 
> ...
> vwadd.wv
> ...
> 
> Since for PLUS operation, GCC prefer the following RTL operand order when combining:
> 
> (plus: (sign_extend:..)
>         (reg:)
> 
> instead of
> 
> (plus: (reg:..)
>         (sign_extend:)

> 
> which is different from MINUS pattern.
Right.  Canonicaliation rules will have the sign_extend as the first 
operand when the opcode is associative.
> 
> I split patterns of vwadd/vwsub, and add dedicated patterns for them.
> 
> 2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv
>     optimization for complicate PLUS/MINUS codes, consider this following codes:
>     
> __attribute__ ((noipa)) void
> vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
> 		      int16_t *__restrict dst3, int8_t *__restrict a,
> 		      int8_t *__restrict b, int8_t *__restrict a2,
> 		      int8_t *__restrict b2, int n)
> {
>    for (int i = 0; i < n; i++)
>      {
>        dst[i] = (int16_t) a[i] + (int16_t) b[i];
>        dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
>        dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
>      }
> }
> 
> Before this patch:
> ...
>          vsetvli zero,a6,e8,mf2,ta,ma
>          vle8.v  v2,0(a3)
>          vle8.v  v1,0(a4)
>          vsetvli t1,zero,e16,m1,ta,ma
>          vsext.vf2       v3,v2
>          vsext.vf2       v2,v1
>          vadd.vv v1,v2,v3
>          vsetvli zero,a6,e16,m1,ta,ma
>          vse16.v v1,0(a0)
>          vle8.v  v4,0(a5)
>          vsetvli t1,zero,e16,m1,ta,ma
>          vsext.vf2       v1,v4
>          vadd.vv v2,v1,v2
> ...
> 
> After this patch:
> ...
>          vsetvli	zero,a6,e8,mf2,ta,ma
> 	vle8.v	v3,0(a4)
> 	vle8.v	v1,0(a3)
> 	vsetvli	t4,zero,e8,mf2,ta,ma
> 	vwadd.vv	v2,v1,v3
> 	vsetvli	zero,a6,e16,m1,ta,ma
> 	vse16.v	v2,0(a0)
> 	vle8.v	v2,0(a5)
> 	vsetvli	t4,zero,e8,mf2,ta,ma
> 	vwadd.vv	v4,v3,v2
> 	vsetvli	zero,a6,e16,m1,ta,ma
> 	vse16.v	v4,0(a1)
> 	vsetvli	t4,zero,e8,mf2,ta,ma
> 	sub	a7,a7,a6
> 	vwadd.vv	v3,v2,v1
> 	vsetvli	zero,a6,e16,m1,ta,ma
> 	vse16.v	v3,0(a2)
> ...
> 
> The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS
> needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate
> RTL IR, extend the other operand to generate vwadd.vv.
> 
> So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
>   
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv intrinsic API expander
>          * config/riscv/vector.md (@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
>          (@pred_single_widen_sub<any_extend:su><mode>): New pattern.
>          (@pred_single_widen_add<any_extend:su><mode>): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
>          * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
OK
jeff