From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4C3BF3858D33 for ; Wed, 2 Aug 2023 13:47:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4C3BF3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D62BF113E; Wed, 2 Aug 2023 06:48:33 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 28F4C3F5A1; Wed, 2 Aug 2023 06:47:50 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Richard Biener via Gcc-patches , richard.sandiford@arm.com Cc: Richard Biener via Gcc-patches Subject: Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts References: <20230731140207.C8FC63858C39@sourceware.org> Date: Wed, 02 Aug 2023 14:47:48 +0100 In-Reply-To: (Richard Biener's message of "Wed, 2 Aug 2023 12:31:46 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-26.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Tue, 1 Aug 2023, Richard Sandiford wrote: > >> Richard Sandiford writes: >> > Richard Biener via Gcc-patches writes: >> >> The following makes sure to limit the shift operand when vectorizing >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift >> >> operand otherwise invokes undefined behavior. When we determine >> >> whether we can demote the operand we know we at most shift in the >> >> sign bit so we can adjust the shift amount. >> >> >> >> Note this has the possibility of un-CSEing common shift operands >> >> as there's no good way to share pattern stmts between patterns. >> >> We'd have to separately pattern recognize the definition. >> >> >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but >> >> the fallback optimistic zero for out-of-range shifts is at least >> >> "corrrect". Not sure we ever try to demote rotates (probably not). >> > >> > I guess you mean "correct" for x86? But that's just a quirk of x86. >> > IMO the behaviour is equally wrong for LSHIFT_EXPR. > > I meant "correct" for the constant folding that evaluates out-of-bound > shifts as zero. > >> Sorry for the multiple messages. Wanted to get something out quickly >> because I wasn't sure how long it would take me to write this... >> >> On rotates, for: >> >> void >> foo (unsigned short *restrict ptr) >> { >> for (int i = 0; i < 200; ++i) >> { >> unsigned int x = ptr[i] & 0xff0; >> ptr[i] = (x << 1) | (x >> 31); >> } >> } >> >> we do get: >> >> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31; >> >> although aarch64 doesn't provide rrotate patterns, so nothing actually >> comes of it. > > I think it's still correct that we only need unsigned:13 for the input, > we know other bits are zero. But of course when actually applying > this as documented > > /* Record that STMT_INFO could be changed from operating on TYPE to > operating on a type with the precision and sign given by PRECISION > and SIGN respectively. > > the operation itself has to be altered (the above doesn't suggest > promoting/demoting the operands to TYPE is the only thing to do). > > So it seems to be the burden is on the consumers of the information? Yeah, textually that seems fair. Not sure I was thinking of it in those terms at the time though. :) >> I think the handling of variable shifts is flawed for other reasons. Given: >> >> void >> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> us (unsigned short *restrict ptr1, short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> su (short *restrict ptr1, unsigned short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> void >> ss (short *restrict ptr1, short *restrict ptr2) >> { >> for (int i = 0; i < 200; ++i) >> ptr1[i] = ptr1[i] >> ptr2[i]; >> } >> >> we only narrow uu and ss, due to: >> >> /* Ignore codes that don't take uniform arguments. */ >> if (!types_compatible_p (TREE_TYPE (op), type)) >> return; > > I suppose that's because we care about the shift operand at all here. > We could possibly use [0 .. precision-1] as known range for it > and only if that doesn't fit 'type' give up (and otherwise simply > ignore the input range of the shift operands here). > >> in vect_determine_precisions_from_range. Maybe we should drop >> the shift handling from there and instead rely on >> vect_determine_precisions_from_users, extending: >> >> if (TREE_CODE (shift) != INTEGER_CST >> || !wi::ltu_p (wi::to_widest (shift), precision)) >> return; >> >> to handle ranges where the max is known to be < precision. >> >> There again, if masking is enough for right shifts and right rotates, >> maybe we should keep the current handling for then (with your fix) >> and skip the types_compatible_p check for those cases. > > I think it should be enough for left-shifts as well? If we lshift > out like 0x100 << 9 so the lhs range is [0,0] the input range from > op0 will still make us use HImode. I think we only ever get overly > conservative answers for left-shifts from this function? But if we have: short x, y; int z = (int) x << (int) y; and at runtime, x == 1, y == 16, (short) z should be 0 (no UB), whereas x << y would invoke UB and x << (y & 15) would be 1. > Whatever works for RROTATE should also work for LROTATE. I think the same problem affects LROTATE. >> So: >> >> - restrict shift handling in vect_determine_precisions_from_range to >> RSHIFT_EXPR and RROTATE_EXPR >> >> - remove types_compatible_p restriction for those cases >> >> - extend vect_determine_precisions_from_users shift handling to check >> for ranges on the shift amount >> >> Does that sound right? > > I'm not sure. This all felt somewhat fragile when looking closer > (I was hoping you would eventually tackle it from the older > referenced bug) ... so my main idea was to perform incremental changes > where I have test coverage (as with the new wrong-code bug). Fair :) > Originally I completely disabled shift support but that regressed > the over-widen testcases a lot which at least have widened shifts > by constants a lot. > > x86 has vector rotates only for AMD XOP (which is dead) plus > some for V1TImode AFAICS, but I think we pattern-match rotates > to shifts, so maybe the precision stuff is interesting for the > case where we match the pattern rotate sequence for widenings? > > So for the types_compatible_p issue something along > the following? We could also exempt the shift operand from > being covered by min_precision so the consumer would have > to make sure it can be represented (I think that's never going > to be an issue in practice until we get 256bit integers vectorized). > It will have to fixup the shift operands anyway. > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index e4ab8c2d65b..cdeeaf98a47 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range > (stmt_vec_info stmt_info, gassign *stmt) > } > else if (TREE_CODE (op) == SSA_NAME) > { > - /* Ignore codes that don't take uniform arguments. */ > - if (!types_compatible_p (TREE_TYPE (op), type)) > + /* Ignore codes that don't take uniform arguments. For shifts > + the shift amount is known to be in-range. */ I guess it's more "we can assume that the amount is in range"? > + if (code == LSHIFT_EXPR > + || code == RSHIFT_EXPR > + || code == LROTATE_EXPR > + || code == RROTATE_EXPR) > + { > + min_value = wi::min (min_value, 0, sign); > + max_value = wi::max (max_value, TYPE_PRECISION (type), > sign); LGTM for shifts right. Because of the above lshift thing, I think we need something like: if (code == LSHIFT_EXPR || code == LROTATE_EXPR) { wide_int op_min_value, op_max_value; if (!vect_get_range_info (op, &op_min_value, op_max_value)) return; /* We can ignore left shifts by negative amounts, which are UB. */ min_value = wi::min (min_value, 0, sign); /* Make sure the highest non-UB shift amount doesn't become UB. */ op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type)); auto mask = wi::mask (TYPE_PRECISION (type), false, op_max_value.to_uhwi ()); max_value = wi::max (max_value, mask, sign); } Does that look right? Thanks, and sorry for the scope creep. Richard > + } > + else if (types_compatible_p (TREE_TYPE (op), type)) > + { > + wide_int op_min_value, op_max_value; > + if (!vect_get_range_info (op, &op_min_value, > &op_max_value)) > + return; > + min_value = wi::min (min_value, op_min_value, sign); > + max_value = wi::max (max_value, op_max_value, sign); > + } > + else > return; > - > - wide_int op_min_value, op_max_value; > - if (!vect_get_range_info (op, &op_min_value, &op_max_value)) > - return; > - > - min_value = wi::min (min_value, op_min_value, sign); > - max_value = wi::max (max_value, op_max_value, sign); > } > else > return;