From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 678E4385840D for ; Thu, 1 Jun 2023 18:52:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 678E4385840D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1b01d912a76so6635055ad.2 for ; Thu, 01 Jun 2023 11:52:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685645529; x=1688237529; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jhsskk46k2OD74EdLzFIUbhlnXxapr0rfPHEM4sidng=; b=dDJWoKJhfse5/VTqcKOsrv/e8kUVXlKChv+Krfx20sMnP++qclNKpJRLEF8EyoRlAi II2UG/HXvRO0ATguJjXfVshy8Snp9GKJmWb8KEj9nMO/w0zO1h9zJOxSRjsQ+ZTDwexc 9wUna/CjYxrNSk89sxcbPZKcfPkDUR1WBujWtOii4HpaWdhYIvZ1ZGVduwWIhrKJWx2P pb0JSNLUZ7nQ8r/l3WkRzmkZnzH++uI9r045S6WsDMdcrD1AkUONQFhYrDhb1NTGfoEV BL/gnZ3UE5hD0kScavbtYh7tIj37/99ZBW4QLB2VmP957yIjwBGFac34OQ1/6ZEeX+lz YxYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685645529; x=1688237529; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jhsskk46k2OD74EdLzFIUbhlnXxapr0rfPHEM4sidng=; b=KcI2+g8PrjBfyc4wbBGVpUbL/y9mXkfltT18UCpRoGX4b7iFEsqjoB+pOwyk/9GuXF pv5jvOB3Z+dGVV6zBjF25+ESyQRjyRDLKVeIaQz6ZBDupfCjC+1azxrsKUX8ieDe7dRc 2pCzybpFeCeJ0bxIByW4XqGCuA0n53sZmAFyd+cNP/udasqZYIQcSaOFVkk68yLBdArI gtZgRRwzL2M2xFvdZ8FoaFeDxU8zmSdQKOXjk4Cz+ZR74tNVl5nW5eO1ESn1fd5vPYdR DXm1eVNbT+DfwJhumyBituWQ0NyllF0UTIoHHCJNvVChhlJPI8Q5xJ8Lfe5b1gpoCXwk dwxg== X-Gm-Message-State: AC+VfDzhQAVahLZFlif0hZYRZkte4jXlKB4QBCN+3JIwTK626oCG2oMV 68IOxbHfpTGdaffnAwXTFYc= X-Google-Smtp-Source: ACHHUZ6KeMrz6CCiMdfz6u2WnvmcWySZqxDN4B3W167j90r46eXMeM8mLKuusz5R+2I0sp9N7GIj2Q== X-Received: by 2002:a17:903:284:b0:1ad:fa2e:17f8 with SMTP id j4-20020a170903028400b001adfa2e17f8mr288067plr.12.1685645529297; Thu, 01 Jun 2023 11:52:09 -0700 (PDT) Received: from ?IPV6:2601:681:8d00:265::f0a? ([2601:681:8d00:265::f0a]) by smtp.gmail.com with ESMTPSA id b1-20020a170902d50100b0019aaab3f9d7sm3862377plg.113.2023.06.01.11.52.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 01 Jun 2023 11:52:08 -0700 (PDT) Message-ID: <8d19e9af-094f-aff3-027b-7f6bfa7c1324@gmail.com> Date: Thu, 1 Jun 2023 12:52:07 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization Content-Language: en-US To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com, rdapp.gcc@gmail.com References: <20230601034823.235258-1-juzhe.zhong@rivai.ai> From: Jeff Law In-Reply-To: <20230601034823.235258-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/31/23 21:48, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong > > 1. This patch optimize the codegen of the following auto-vectorization codes: > > void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n) > { > for (int i = 0; i < n; i++) > c[i] = (int64_t)a[i] + b[i]; > } > > Combine instruction from: > > ... > vsext.vf2 > vadd.vv > ... > > into: > > ... > vwadd.wv > ... > > Since for PLUS operation, GCC prefer the following RTL operand order when combining: > > (plus: (sign_extend:..) > (reg:) > > instead of > > (plus: (reg:..) > (sign_extend:) > > which is different from MINUS pattern. Right. Canonicaliation rules will have the sign_extend as the first operand when the opcode is associative. > > I split patterns of vwadd/vwsub, and add dedicated patterns for them. > > 2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv > optimization for complicate PLUS/MINUS codes, consider this following codes: > > __attribute__ ((noipa)) void > vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2, > int16_t *__restrict dst3, int8_t *__restrict a, > int8_t *__restrict b, int8_t *__restrict a2, > int8_t *__restrict b2, int n) > { > for (int i = 0; i < n; i++) > { > dst[i] = (int16_t) a[i] + (int16_t) b[i]; > dst2[i] = (int16_t) a2[i] + (int16_t) b[i]; > dst3[i] = (int16_t) a2[i] + (int16_t) a[i]; > } > } > > Before this patch: > ... > vsetvli zero,a6,e8,mf2,ta,ma > vle8.v v2,0(a3) > vle8.v v1,0(a4) > vsetvli t1,zero,e16,m1,ta,ma > vsext.vf2 v3,v2 > vsext.vf2 v2,v1 > vadd.vv v1,v2,v3 > vsetvli zero,a6,e16,m1,ta,ma > vse16.v v1,0(a0) > vle8.v v4,0(a5) > vsetvli t1,zero,e16,m1,ta,ma > vsext.vf2 v1,v4 > vadd.vv v2,v1,v2 > ... > > After this patch: > ... > vsetvli zero,a6,e8,mf2,ta,ma > vle8.v v3,0(a4) > vle8.v v1,0(a3) > vsetvli t4,zero,e8,mf2,ta,ma > vwadd.vv v2,v1,v3 > vsetvli zero,a6,e16,m1,ta,ma > vse16.v v2,0(a0) > vle8.v v2,0(a5) > vsetvli t4,zero,e8,mf2,ta,ma > vwadd.vv v4,v3,v2 > vsetvli zero,a6,e16,m1,ta,ma > vse16.v v4,0(a1) > vsetvli t4,zero,e8,mf2,ta,ma > sub a7,a7,a6 > vwadd.vv v3,v2,v1 > vsetvli zero,a6,e16,m1,ta,ma > vse16.v v3,0(a2) > ... > > The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS > needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate > RTL IR, extend the other operand to generate vwadd.vv. > > So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv intrinsic API expander > * config/riscv/vector.md (@pred_single_widen_): Remove it. > (@pred_single_widen_sub): New pattern. > (@pred_single_widen_add): New pattern. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test. > * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test. > * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test. > * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test. > * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test. > * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test. OK jeff