From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id 35AB33858D20 for ; Fri, 11 Aug 2023 23:02:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35AB33858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-565334377d0so1861784a12.2 for ; Fri, 11 Aug 2023 16:02:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691794926; x=1692399726; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=k4EdIdUeg4P+iun9oPU7+i7s6+7w5Ye1Cw4zFhK63KI=; b=RP5L0GPAHTp5g2cbqgKgQMhH4pHLrfm2I3dEDhiekJhXPt+V3MGd7jn2SWJCz4Q8t+ XESNSICy06iy6kq11S21DU3SdhX2XXldB/eYKec1++5y/PkCmZeGuLvYha/KnW/efTpv G8P/9j2z7s136yV/mzR0B+vCYnTBAmeLzR9ukavU0lx+ABw/YdgFCZM+JOjYSbnJaYZ3 UxbjMJWusiTZ3Z6+yT+75aT9O+jx7NpH5MrKtlPTWujachirC3y2aGNm7fFvVgYxwmi6 AIdnrSiWiIfuX4Y6LPqzmWkj4urGyvs4/xO25U7XX1ePKaMBmrM9UBre9Chq5ZzWwdFg 4oWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691794926; x=1692399726; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k4EdIdUeg4P+iun9oPU7+i7s6+7w5Ye1Cw4zFhK63KI=; b=euoLnvAgHtvvNEq2ooFJrS4SySzdNgd/hz9FFKl3uRL0bRR32jtEtXvjb4JqqMO/F8 mOzlEMHa/Q+w2+VSzOiHXmKzsjrxEfEIVXKtQS0CwGlUqOdHDhfVJTzgu8hwwAthWQL9 dikCtpmZl4IqPrZHgvZ7U3k3oMOoGnAPEFveWr3LFuEz8zlSSlz+lD8nBwdJ3XbrR5GS 73QQwbxMGi7VpvK6raYxBq4AinTlHkpJ02b/49DAe3ZgVE+TZlpTxjh/cvro07o8o0nt PBYvPrMRsBhS6jngQnQlbgtowVUaOBS8mMnS8N/+D77/Q/Gi1gVLWYQodJ245ElDvEjF uz2w== X-Gm-Message-State: AOJu0YzAm5XexQJMNVJaJEnt9p+avx3iW1chvVJXt8eD+eAIJu/UkINX 5EAAvtAQmq/pkBkvqULUHo8= X-Google-Smtp-Source: AGHT+IHq3p2oRlKBajZ5biPfJzHCk/UacaitE0jVaTCO4pNFTPm+H9c28d0QT+xId6ij47rBgzo/Bw== X-Received: by 2002:a17:90b:1190:b0:263:41d2:4e2 with SMTP id gk16-20020a17090b119000b0026341d204e2mr2861147pjb.32.1691794925926; Fri, 11 Aug 2023 16:02:05 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id jx20-20020a17090b46d400b0026822cb85casm3858718pjb.18.2023.08.11.16.02.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Aug 2023 16:02:05 -0700 (PDT) Message-ID: Date: Fri, 11 Aug 2023 17:02:03 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i Content-Language: en-US To: Lehua Ding , gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, rdapp.gcc@gmail.com, palmer@rivosinc.com References: <20230811090121.1789446-1-lehua.ding@rivai.ai> From: Jeff Law In-Reply-To: <20230811090121.1789446-1-lehua.ding@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 8/11/23 03:01, Lehua Ding wrote: > Hi, > > This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern > optimize the special case when the scalar operand is zero. > > Currently, the broadcast pattern where the scalar operand is a imm > will be converted to vmv.v.i from vmv.s.x and the mask operand will be > converted from 00..01 to 11..11. There are some advantages and > disadvantages before and after the conversion after discussing > with Juzhe offline and we chose not to do this transform. > > Before: > > Advantages: The vsetvli info required by vmv.s.x has better compatibility since > vmv.s.x only required SEW and VLEN be zero or one. That mean there > is more opportunities to combine with other vsetlv infos in vsetvl pass. > > Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction > will be needed. > > After: > > Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand. > > Disadvantages: Like before's advantages. Worse compatibility leads to more > vsetvl instrunctions need. > > Consider the bellow C code and asm after autovec. > there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma) > after converted vmv.s.x to vmv.v.i. > > ``` > int foo1(int* restrict a, int* restrict b, int *restrict c, int n) { > int sum = 0; > for (int i = 0; i < n; i++) > sum += a[i] * b[i]; > > return sum; > } > ``` > > asm (Before): > > ``` > foo1: > ble a3,zero,.L7 > vsetvli a2,zero,e32,m1,ta,ma > vmv.v.i v1,0 > .L6: > vsetvli a5,a3,e32,m1,tu,ma > slli a4,a5,2 > sub a3,a3,a5 > vle32.v v2,0(a0) > vle32.v v3,0(a1) > add a0,a0,a4 > add a1,a1,a4 > vmacc.vv v1,v3,v2 > bne a3,zero,.L6 > vsetvli a2,zero,e32,m1,ta,ma > vmv.s.x v2,zero > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > .L7: > li a0,0 > ret > ``` > > asm (After): > > ``` > foo1: > ble a3,zero,.L4 > vsetvli a2,zero,e32,m1,ta,ma > vmv.v.i v1,0 > .L3: > vsetvli a5,a3,e32,m1,tu,ma > slli a4,a5,2 > sub a3,a3,a5 > vle32.v v2,0(a0) > vle32.v v3,0(a1) > add a0,a0,a4 > add a1,a1,a4 > vmacc.vv v1,v3,v2 > bne a3,zero,.L3 > vsetivli zero,1,e32,m1,ta,ma > vmv.v.i v2,0 > vsetvli a2,zero,e32,m1,ta,ma > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > .L4: > li a0,0 > ret > ``` > > Best, > Lehua > > Co-Authored-By: Ju-Zhe Zhong > > gcc/ChangeLog: > > * config/riscv/predicates.md (vector_const_0_operand): New. > * config/riscv/vector.md (*pred_broadcast_zero): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/scalar_move-5.c: Update. > * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. If we encounter a uarch where the other sequence is better, then I think we can do something like query costs or the like and select between the approaches -- but no need to do that now. So OK for the trunk. jeff