From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 14EE83858D33; Tue, 31 Jan 2023 12:03:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14EE83858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675166600; bh=oBbg9QjBKCy5+5Hw57y76Qgzkt6ZbyILqvZZ0CyR79w=; h=From:To:Subject:Date:In-Reply-To:References:From; b=KofgUtH5fg4qTb1gatoOY4kLhWgLkIqHwyjKmJ72are2wF7UmsQ5o7dWJ5LaL8rhg hgQZYfFVgBwheH+V0nQjFDKrJRp1y49ZyaVHL34++5OsRg+D9jTYm8QPOvd1W77CY0 SljP4q83DE6Awl1dOSpOtHsDuP8QU13Kr0bHhpSI= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2 Date: Tue, 31 Jan 2023 12:03:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108583 --- Comment #13 from rsandifo at gcc dot gnu.org --- OK, hopefully I understand now. Sorry for being slow. But what specific constraints do we want to apply to the optimisation? (In reply to Tamar Christina from comment #3) > Right, so this is because in the expansion we don't have enough context to > decide how to optimize the division. >=20 > This optimization is only possible when the input is widened because you > need an additional free bit so that the second addition can't overflow. >=20 > The vectorizer has this context but since we didn't want a new IFN the > context should instead be derivable in > targetm.vectorize.can_special_div_by_const hook. >=20 > So my proposal to fix this and keep > targetm.vectorize.can_special_div_by_const as a general divide optimizati= on > hook is to pass the actual `tree` operand0 as well to the hook such that = the > hook has a bit more context. >=20 > I was hoping to use get_nonzero_bits to get what the actual range of the > operand is. But it looks like for widening operations it still reports -= 1. The original motivating example was: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i =3D 0; i < (n & -16); i+=3D1) pixel[i] =3D (pixel[i] * level) / 0xff; } where we then do a 16-bit division. But since level and pixel[i] are unconstrained, the maximum value of pixel[i] * level is 0xfe01. There's no free bit in that sense. It seems more that range of the dividend is [0, N*N] for a divisor of N. If that's the condition we want to test for, it seems like something we need to check in the vectoriser rather than the hook. And it's not something we can easily do in the vector form, since we don't track ranges for vectors (AFAIK).=