From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 170903857C45 for ; Fri, 17 Nov 2023 08:27:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 170903857C45 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 170903857C45 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::632 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700209668; cv=none; b=eyDc8CMc5V3Ijw8h9IU76/Acce3CxggDRLHCHhV9bd2AeNleTIrIY/9KMPdtm20herjgQsYUzQs/D9sV85Fbn8rGLy3HQsvu73wUic01lKHZ4fnnAbnKFNoVRQMCxDdcA2xTPxl7qb2W/m8rs3pIyxJn6P/0ei4HwUVB6vJ8AZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700209668; c=relaxed/simple; bh=+5BJVnojg2aGgxdXu/C45eM5VPxEk8XShNQH2Y5WnlA=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=iWYK2E06D9oh8uy2K3aTu/qCzlyHcoi0y+O2Jj04/MLrOwRROCGufDnNFXO/Hao5wh8hj3qs0YryS+1Om95PXvvYz18ZgK2X42x1LWLIrFcJMCom0HCCSmEQj8Ywk4oN02l+RIVEPl32oXz7dszZEhSHoE2F/ZwOSFmmBJB0BWY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-9e4675c7a5fso234705966b.0 for ; Fri, 17 Nov 2023 00:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700209662; x=1700814462; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:cc:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=KaT0YiCIbRd9iY+tsz0QNxkoGyaHouYTH/g0Fneds8I=; b=DPsxw5T22/vv1b01JCzJLpHGO88qs9oQvApRSVdKMTV2N7i9K0/OKRe4LeESZIFHpS hO8AMSdm7i8NSv8gBkVhrHfHBQZF6cNSaCPvpBxIiSVTAwfxckQXT6Z/Ql99HApagigU XeePbi+mKUouAbpCRTasX89z4hhEIPhUKUWgFgoe2CqX9HXXtmfY3rQR0Qh7CzAzrbYi r9zQpdsKoKVE8lLhiBv6z3fVJcVcHeAMQSCFxEpUe+sYWzx5oXb+R57ZIA3eSrOcZUtd PCnZ/Pfg12flxxH3O58v1qTDFZP8RKcKkmkisz7fnAgp3qOJgFYemxw+fdJMYP+p6l1w o0UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700209662; x=1700814462; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:cc:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KaT0YiCIbRd9iY+tsz0QNxkoGyaHouYTH/g0Fneds8I=; b=AltVWGcCQLTIHuXEZ/S9pVyku88/PtYX1jA21RMIpSNq7Hp5nnzC+rC4n+/HhX74FA q/zPpQeuVkdNP20wnTOmkJL/uUtRj6ZLf6vR5F+Pimi7v3GCdhIsZXmiake7PCskD9Lx U9dVY1LOeaV0DyGg4fHrnzGxe84SDsjuXsi4nkS/1w+BSok2BGBpxUflP2TkbnA0lQNf w7xuo/dAX7NdvIECyYQ1Y+rds1MZDSYViEZ2oNLiJkgKJhqbQ2S8qePTGte2D+rkUm1l xPWdvkHs0fifv2Hd2ZEHaX4sJk3+Jh/0d6D0fF71Fq4SDi/9qUaQleqYUjQnDoW54/fa BkcQ== X-Gm-Message-State: AOJu0Yx9rAaUQIDoWBo9pyjop+C+dGAlLevyhv4Fh8HZ4TGGaVKrj3lm Q+IAGbgESaymZQGxwVYcImU= X-Google-Smtp-Source: AGHT+IGVYUX5ydGx6G/ZACBsLFrljIn4pm/rD+U9tVQAxfe0KbAQWUQkzwA3o2LJhgjux6aiYtTTgQ== X-Received: by 2002:a17:906:e084:b0:9c6:64be:a3c9 with SMTP id gh4-20020a170906e08400b009c664bea3c9mr14402900ejb.39.1700209661603; Fri, 17 Nov 2023 00:27:41 -0800 (PST) Received: from [192.168.1.23] (ip-046-223-203-173.um13.pools.vodafone-ip.de. [46.223.203.173]) by smtp.gmail.com with ESMTPSA id f8-20020a1709062c4800b0099bccb03eadsm522102ejh.205.2023.11.17.00.27.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Nov 2023 00:27:41 -0800 (PST) Message-ID: Date: Fri, 17 Nov 2023 09:27:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: rdapp.gcc@gmail.com, kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com Subject: Re: [PATCH] RISC-V: Optimize VLA SLP with duplicate VLA shuffle indice To: Juzhe-Zhong , gcc-patches@gcc.gnu.org References: <20231117044319.3912782-1-juzhe.zhong@rivai.ai> Content-Language: en-US From: Robin Dapp In-Reply-To: <20231117044319.3912782-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Juzhe, > csrr a4,vlenb > csrr a5,vlenb Totally unrelated to this patch but this looks odd. I don't remember if we had a patch for this already at some point. In general the idea for the patch is to use the largest vector element mode for the indices and compress several of those in one vector element. We are limited to XLEN because the computation is done in scalar registers. Is that right? I would find it easier to understand what's happening with the explanation of the bigger picture upfront. It's all a bit implicit right now, so a few more comments would help. > +void > +rvv_builder::merge_pattern (rtx src) It's more like a compress? Actually both merge and compress are misleading because we have dedicated instructions with that name. > + unsigned int ele_num = GET_MODE_BITSIZE (Pmode) / this->inner_bits_size (); Rename to elems_per_scalar. > + if (!get_vector_mode (Pmode, nunits).exists (&mode)) > + return; There is no return value, what if we fail here? > + = gen_lowpart (Pmode, CONST_VECTOR_ELT (src, k + j * ele_num)); > + e = expand_simple_binop (Pmode, AND, e, imm, NULL_RTX, false, > + OPTAB_DIRECT); > + e = expand_simple_binop ( > + Pmode, ASHIFT, e, > + gen_int_mode (this->inner_bits_size () * k, Pmode), NULL_RTX, > + false, OPTAB_DIRECT); > + val = expand_simple_binop (Pmode, IOR, e, val, NULL_RTX, false, > + OPTAB_DIRECT); Did you try the same doing everything in vector registers? I.e. some reinterpretation with a larger element size. Maybe it doesn't make sense, I didn't check but just curious. > + /* We don't apply such approach for LMUL = 8 since vrgather.vv doesn't > + allow dest overlap with any source register and VLA repeating vector > + always by a addition. So, it such VLA constant vector will consume > + 32 registers if LMUL = 8 which cause serious high register pressure. */ > + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && npatterns > 2) > + { Don't we also need to check that MAX_EEW <= Pmode? Like for rv64gc_zve32x. We would probably fail to find a vector mode in merge_pattern but without a return value. I would prefer to check that as well here, maybe we can ensure that merge_pattern cannot fail even? Wouldn't hurt to add a test case for zve32x as well. Regards Robin