From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C7126386183D for ; Thu, 15 Feb 2024 09:46:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C7126386183D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C7126386183D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707990396; cv=none; b=LkvuF4kJOYvLa4HvZhyz0bgjeAvQw4WFjRUgqLQjNyP4v6d6QKHyjGtTkp2EahEqpd1yrCjUQD+yzdYgccyQ1DzwEtbDWZX+TpC+3J/2WfLjkXmAPjveDzKbZ/IBAJbKIrY20q/+T6KdbEbWzadtZD5kusFehUab79ZoSh5d66s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707990396; c=relaxed/simple; bh=fF6LmOwJ1NneFs4mooejwJHP5R1bx5SfnxF61CXJyHU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=RXfZeyF631c+C2NC6vDl7xouMzhchdlIaA0OXJaV0qq2/dhWVo6cWa8PD1I0zvgXNQxVADTIEmDyDtWRJurd8i5bPFyUZjiUYj0zXAOvXkVsQqKoEMlZmV9kfGE3dS4OFmi7NyGWgn/Hd5DfUuf5v/qCNSR68G52TntzReSWkEY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 325471FB; Thu, 15 Feb 2024 01:47:14 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DAD633F766; Thu, 15 Feb 2024 01:46:32 -0800 (PST) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access References: <20240206113819.2335F3858402@sourceware.org> <75s55n4q-2768-6o3s-0q19-n0r55349423n@fhfr.qr> Date: Thu, 15 Feb 2024 09:46:31 +0000 In-Reply-To: (Richard Biener's message of "Thu, 15 Feb 2024 09:12:47 +0100 (CET)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-21.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Wed, 14 Feb 2024, Richard Sandiford wrote: >> > >> >> Richard Biener writes: >> >> > The following avoids accessing out-of-bound vector elements when >> >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision >> >> > elements. The error was basing the number of elements to extract >> >> > on the rounded up total byte size involved and the patch bases >> >> > everything on the total number of elements to extract instead. >> >> >> >> It's too long ago to be certain, but I think this was a deliberate choice. >> >> The point of the new vector constant encoding is that it can give an >> >> allegedly sensible value for any given index, even out-of-range ones. >> >> >> >> Since the padding bits are undefined, we should in principle have a free >> >> choice of what to use. And for VLA, it's often better to continue the >> >> existing pattern rather than force to zero. >> >> >> >> I don't strongly object to changing it. I think we should be careful >> >> about relying on zeroing for correctness though. The bits are in principle >> >> undefined and we can't rely on reading zeros from equivalent memory or >> >> register values. >> > >> > The main motivation for a change here is to allow catching out-of-bound >> > indices again for VECTOR_CST_ELT, at least for constant nunits because >> > it might be a programming error like fat-fingering the index. I do >> > think it's a regression that we no longer catch those. >> > >> > It's probably also a bit non-obvious how an encoding continues and >> > there might be DImode masks that can be represented by a >> > zero-extended QImode immediate but "continued" it would require >> > a larger immediate. >> > >> > The change also effectively only changes something for 1 byte >> > encodings since nunits is a power of two and so is the element >> > size in bits. >> >> Yeah, but even there, there's an argument that all-1s (0xff) is a more >> obvious value for an all-1s mask. >> >> > A patch restoring the VECTOR_CST_ELT checking might be the >> > following >> > >> > diff --git a/gcc/tree.cc b/gcc/tree.cc >> > index 046a558d1b0..4c9b05167fd 100644 >> > --- a/gcc/tree.cc >> > +++ b/gcc/tree.cc >> > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) >> > if (i < encoded_nelts) >> > return VECTOR_CST_ENCODED_ELT (t, i); >> > >> > + /* Catch out-of-bound element accesses. */ >> > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); >> > + >> > /* If there are no steps, the final encoded value is the right one. */ >> > if (!VECTOR_CST_STEPPED_P (t)) >> > { >> > >> > but it triggers quite a bit via const_binop for, for example >> > >> > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, >> > arg1=, arg2=) >> > (gdb) p debug_generic_expr (arg1) >> > { 12, 13, 14, 15 } >> > $5 = void >> > (gdb) p debug_generic_expr (arg2) >> > { -2, -2, -2, -3 } >> > (gdb) p count >> > $4 = 6 >> > (gdb) l >> > 1711 if (!elts.new_binary_operation (type, arg1, arg2, >> > step_ok_p)) >> > 1712 return NULL_TREE; >> > 1713 unsigned int count = elts.encoded_nelts (); >> > 1714 for (unsigned int i = 0; i < count; ++i) >> > 1715 { >> > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); >> > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); >> > 1718 >> > 1719 tree elt = const_binop (code, elem1, elem2); >> > >> > this seems like an error to me - why would we, for fixed-size >> > vectors and for PLUS ever create a vector encoding with 6 elements?! >> > That seems at least inefficient to me? >> >> It's a case of picking your poison. On the other side, operating >> individually on each element of a V64QI is inefficient when the >> representation says up-front that all elements are equal. > > True, though I wonder why for VLS vectors new_binary_operation > doesn't cap the number of encoded elts on the fixed vector size, > like doing > > encoded_elts = ordered_min (TYPE_VECTOR_SUBPARTS (..), encoded_elts); > > or if there's no good way to write it applying for both VLA and VLS > do it only when TYPE_VECTOR_SUBPARTS is constant. ordered_min can't be used because there's no guarantee that encoded_elts and TYPE_VECTOR_SUBPARTS are well-ordered for the VLA case. E.g. for a stepped (3-element) encoding and a length of 2+2X, the stepped encoding is longer for X==0 and the vector is longer for X>0. But yeah, in general, trying to enforce this for VLS would probably lead to a proliferation of more "if VLA do one thing, if VLS do some other thing". The aim was to avoid that where it didn't seem strictly necessary. >> Fundemantally, operations on VLA vectors are treated as functions >> that map patterns to patterns. The number of elements that are >> consumed isn't really relevant to the function itself. The VLA >> folders therefore rely on being to read an element from a pattern >> even if the index is outside TREE_VECTOR_SUBPARTS. >> >> There were two reasons for using VLA paths for VLS vectors. >> One I mentioned above: it saves time when all elements are equal, >> or have a similarly compact representation. The other is that it >> makes VLA less special and ensures that the code gets more testing. >> >> Maybe one compromise between that and the assert would be: >> >> (1) enforce the assert only for VLS and > > that's what I did by using maybe_gt? > >> (2) add new checks to ensure that a VLA-friendly operation will never >> read out-of-bounds for VLS vectors >> >> But I think this would be awkward. E.g. we now build reversed vectors >> as a 3-element series N-1, N-2, N-3. It would be nice not to have >> to special-case N==2 by suppressing N-3. And the condition for (2) >> might not always be obvious. > > If we know only N <= 2 is problematic then having must_eq () > special cases there makes sense IMO. I'm not sure it's just that. E.g. a zip is { 0, N, 1, N+1, 2, N+2, ... }, which would need to be curtailed for N==2 or N==4. >> Another option would be to have special accessors that are allowed >> to read out of bounds, and add the assert (for both VLA & VLS) to >> VECTOR_CST_ELT. It might take a while to root out all the places that >> need to change though. > > Yes, that's another possibility. > > The point is (and we're seeing it now with the behavior of > encoding QImode bitmasks) that the behavior isn't entirely > obvious when VLS vectors are involved. There's a memset zero > of the QImode so anything other than zero-padding is odd. Which memset do you mean? The one in: /* Zero the buffer and then set bits later where necessary. */ int extract_bytes = MIN (len, total_bytes - off); if (ptr) memset (ptr, 0, extract_bytes); ? If so, that's there for elt_bits>1. It means that the canonical form of a 2-bit active mask element is 01 rather than 11. (And that's something that might end up needing to be target-configurable. The current behaviour is guided by SVE, although it shouldn't matter much in practice there.) > It's also not clear to me we'd get reliable sign extension > (we probably don't?). For which case? The idea isn't that we always sign-extend or always zero-extend, but that we repeat the pattern. So an all-1s mask would give an all-1s QImode. A 1010 mask would give a 10101010 QI, etc. Thanks, Richard