From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 542B1385841D for ; Wed, 25 Oct 2023 08:42:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 542B1385841D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 542B1385841D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698223370; cv=none; b=ZLHklFgW59vOs4X/hEP/Af+3m0yNC6cfKc/r1ngQoWck/6Zsa79eUxvlzj1h7b+snvHlY+EP1fRbct9FcoCbOTtM7hTWtN7SMlUtTaWveIBW7k26oMCBOTtNpI+L0lyG+kEhGXKRRIzV12MY+2KOAdqcJe5YUjIiJ2wurizGguM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698223370; c=relaxed/simple; bh=IpH9lQQVlCjRTANZf5PEI2ZRuOcVaPyEocStRc8PUdc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=NBKbSv+nDiAaN9Pcwgykl5aOnWg5iER3wV4mP0LG6BaAoMB1B0vbFHypNmmYrtq7JWwG5wo4pnuNMg0NMwXcTVzu4NYOqS38w7nTNZaV0NlF3WgfBgALkzEStXJ17eYn5DcsEWtu/TRxbPIKgJPhN4unlAdw0Bd90ifjFCguwtY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4C9342F4; Wed, 25 Oct 2023 01:43:28 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4645D3F64C; Wed, 25 Oct 2023 01:42:46 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , Richard Biener , richard.sandiford@arm.com Cc: gcc Patches , Richard Biener Subject: Re: PR111754 References: Date: Wed, 25 Oct 2023 09:42:45 +0100 In-Reply-To: (Richard Sandiford's message of "Tue, 24 Oct 2023 22:28:48 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-23.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Sigh, I knew I should have waited until the morning to proof-read and send this. Richard Sandiford writes: > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 40767736389..00fce4945a7 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel, > unsigned res_npatterns, res_nelts_per_pattern; > unsigned HOST_WIDE_INT res_nelts; > > - /* (1) If SEL is a suitable mask as determined by > - valid_mask_for_fold_vec_perm_cst_p, then: > - res_npatterns = max of npatterns between ARG0, ARG1, and SEL > - res_nelts_per_pattern = max of nelts_per_pattern between > - ARG0, ARG1 and SEL. > - (2) If SEL is not a suitable mask, and TYPE is VLS then: > - res_npatterns = nelts in result vector. > - res_nelts_per_pattern = 1. > - This exception is made so that VLS ARG0, ARG1 and SEL work as before. */ > - if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) > - { > - res_npatterns > - = std::max (VECTOR_CST_NPATTERNS (arg0), > - std::max (VECTOR_CST_NPATTERNS (arg1), > - sel.encoding ().npatterns ())); > + /* First try to implement the fold in a VLA-friendly way. > + > + (1) If the selector is simply a duplication of N elements, the > + result is likewise a duplication of N elements. > + > + (2) If the selector is N elements followed by a duplication > + of N elements, the result is too. > > - res_nelts_per_pattern > - = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0), > - std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1), > - sel.encoding ().nelts_per_pattern ())); > + (3) If the selector is N elements followed by an interleaving > + of N linear series, the situation is more complex. > > + valid_mask_for_fold_vec_perm_cst_p detects whether we > + can handle this case. If we can, then each of the N linear > + series either (a) selects the same element each time or > + (b) selects a linear series from one of the input patterns. > + > + If (b) holds for one of the linear series, the result > + will contain a linear series, and so the result will have > + the same shape as the selector. If (a) holds for all of > + the lienar series, the result will be the same as (2) above. linear > + > + (b) can only hold if one of the inputs pattern has a input patterns Sorry for the typos. Richard > + stepped encoding. */ > + if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) > + { > + res_npatterns = sel.encoding ().npatterns (); > + res_nelts_per_pattern = sel.encoding ().nelts_per_pattern (); > + if (res_nelts_per_pattern == 3 > + && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3 > + && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3) > + res_nelts_per_pattern = 2; > res_nelts = res_npatterns * res_nelts_per_pattern; > } > else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))