From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=tdo1=G6=gmail.com=rdapp.gcc@sourceware.org>
Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632])
	by sourceware.org (Postfix) with ESMTPS id 170903857C45
	for <gcc-patches@gcc.gnu.org>; Fri, 17 Nov 2023 08:27:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 170903857C45
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 170903857C45
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::632
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700209668; cv=none;
	b=eyDc8CMc5V3Ijw8h9IU76/Acce3CxggDRLHCHhV9bd2AeNleTIrIY/9KMPdtm20herjgQsYUzQs/D9sV85Fbn8rGLy3HQsvu73wUic01lKHZ4fnnAbnKFNoVRQMCxDdcA2xTPxl7qb2W/m8rs3pIyxJn6P/0ei4HwUVB6vJ8AZM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1700209668; c=relaxed/simple;
	bh=+5BJVnojg2aGgxdXu/C45eM5VPxEk8XShNQH2Y5WnlA=;
	h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=iWYK2E06D9oh8uy2K3aTu/qCzlyHcoi0y+O2Jj04/MLrOwRROCGufDnNFXO/Hao5wh8hj3qs0YryS+1Om95PXvvYz18ZgK2X42x1LWLIrFcJMCom0HCCSmEQj8Ywk4oN02l+RIVEPl32oXz7dszZEhSHoE2F/ZwOSFmmBJB0BWY=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-9e4675c7a5fso234705966b.0
        for <gcc-patches@gcc.gnu.org>; Fri, 17 Nov 2023 00:27:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1700209662; x=1700814462; darn=gcc.gnu.org;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:to:subject:cc:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=KaT0YiCIbRd9iY+tsz0QNxkoGyaHouYTH/g0Fneds8I=;
        b=DPsxw5T22/vv1b01JCzJLpHGO88qs9oQvApRSVdKMTV2N7i9K0/OKRe4LeESZIFHpS
         hO8AMSdm7i8NSv8gBkVhrHfHBQZF6cNSaCPvpBxIiSVTAwfxckQXT6Z/Ql99HApagigU
         XeePbi+mKUouAbpCRTasX89z4hhEIPhUKUWgFgoe2CqX9HXXtmfY3rQR0Qh7CzAzrbYi
         r9zQpdsKoKVE8lLhiBv6z3fVJcVcHeAMQSCFxEpUe+sYWzx5oXb+R57ZIA3eSrOcZUtd
         PCnZ/Pfg12flxxH3O58v1qTDFZP8RKcKkmkisz7fnAgp3qOJgFYemxw+fdJMYP+p6l1w
         o0UQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1700209662; x=1700814462;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:to:subject:cc:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=KaT0YiCIbRd9iY+tsz0QNxkoGyaHouYTH/g0Fneds8I=;
        b=AltVWGcCQLTIHuXEZ/S9pVyku88/PtYX1jA21RMIpSNq7Hp5nnzC+rC4n+/HhX74FA
         q/zPpQeuVkdNP20wnTOmkJL/uUtRj6ZLf6vR5F+Pimi7v3GCdhIsZXmiake7PCskD9Lx
         U9dVY1LOeaV0DyGg4fHrnzGxe84SDsjuXsi4nkS/1w+BSok2BGBpxUflP2TkbnA0lQNf
         w7xuo/dAX7NdvIECyYQ1Y+rds1MZDSYViEZ2oNLiJkgKJhqbQ2S8qePTGte2D+rkUm1l
         xPWdvkHs0fifv2Hd2ZEHaX4sJk3+Jh/0d6D0fF71Fq4SDi/9qUaQleqYUjQnDoW54/fa
         BkcQ==
X-Gm-Message-State: AOJu0Yx9rAaUQIDoWBo9pyjop+C+dGAlLevyhv4Fh8HZ4TGGaVKrj3lm
	Q+IAGbgESaymZQGxwVYcImU=
X-Google-Smtp-Source: AGHT+IGVYUX5ydGx6G/ZACBsLFrljIn4pm/rD+U9tVQAxfe0KbAQWUQkzwA3o2LJhgjux6aiYtTTgQ==
X-Received: by 2002:a17:906:e084:b0:9c6:64be:a3c9 with SMTP id gh4-20020a170906e08400b009c664bea3c9mr14402900ejb.39.1700209661603;
        Fri, 17 Nov 2023 00:27:41 -0800 (PST)
Received: from [192.168.1.23] (ip-046-223-203-173.um13.pools.vodafone-ip.de. [46.223.203.173])
        by smtp.gmail.com with ESMTPSA id f8-20020a1709062c4800b0099bccb03eadsm522102ejh.205.2023.11.17.00.27.41
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 17 Nov 2023 00:27:41 -0800 (PST)
Message-ID: <eaa16783-c7e0-4684-a97f-022bc760ae3b@gmail.com>
Date: Fri, 17 Nov 2023 09:27:40 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Cc: rdapp.gcc@gmail.com, kito.cheng@gmail.com, kito.cheng@sifive.com,
 jeffreyalaw@gmail.com
Subject: Re: [PATCH] RISC-V: Optimize VLA SLP with duplicate VLA shuffle
 indice
To: Juzhe-Zhong <juzhe.zhong@rivai.ai>, gcc-patches@gcc.gnu.org
References: <20231117044319.3912782-1-juzhe.zhong@rivai.ai>
Content-Language: en-US
From: Robin Dapp <rdapp.gcc@gmail.com>
In-Reply-To: <20231117044319.3912782-1-juzhe.zhong@rivai.ai>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi Juzhe,

> 	csrr	a4,vlenb
> 	csrr	a5,vlenb

Totally unrelated to this patch but this looks odd.  I don't
remember if we had a patch for this already at some point.

In general the idea for the patch is to use the largest vector
element mode for the indices and compress several of those
in one vector element.  We are limited to XLEN because the
computation is done in scalar registers.  Is that right?

I would find it easier to understand what's happening with the
explanation of the bigger picture upfront.  It's all a bit
implicit right now, so a few more comments would help.

> +void
> +rvv_builder::merge_pattern (rtx src)

It's more like a compress?  Actually both merge and compress are
misleading because we have dedicated instructions with that name.

> +  unsigned int ele_num = GET_MODE_BITSIZE (Pmode) / this->inner_bits_size ();
Rename to elems_per_scalar.

> +  if (!get_vector_mode (Pmode, nunits).exists (&mode))
> +    return;

There is no return value, what if we fail here?

> +		= gen_lowpart (Pmode, CONST_VECTOR_ELT (src, k + j * ele_num));
> +	      e = expand_simple_binop (Pmode, AND, e, imm, NULL_RTX, false,
> +				       OPTAB_DIRECT);
> +	      e = expand_simple_binop (
> +		Pmode, ASHIFT, e,
> +		gen_int_mode (this->inner_bits_size () * k, Pmode), NULL_RTX,
> +		false, OPTAB_DIRECT);
> +	      val = expand_simple_binop (Pmode, IOR, e, val, NULL_RTX, false,
> +					 OPTAB_DIRECT);

Did you try the same doing everything in vector registers?
I.e. some reinterpretation with a larger element size.  Maybe
it doesn't make sense, I didn't check but just curious.

> +      /* We don't apply such approach for LMUL = 8 since vrgather.vv doesn't
> +	 allow dest overlap with any source register and VLA repeating vector
> +	 always by a addition.  So, it such VLA constant vector will consume
> +	 32 registers if LMUL = 8 which cause serious high register pressure. */
> +      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT && npatterns > 2)
> +	{

Don't we also need to check that MAX_EEW <= Pmode?  Like for
rv64gc_zve32x.  We would probably fail to find a vector mode
in merge_pattern but without a return value.  I would prefer
to check that as well here, maybe we can ensure that merge_pattern
cannot fail even?  Wouldn't hurt to add a test case for zve32x
as well.

Regards
 Robin