From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E17DD385841A; Wed, 1 Feb 2023 07:29:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E17DD385841A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675236590; bh=qN44XwtYKcP00f081av93dX0SX94WIVONovF6EtHroY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=DnI1sudwUQQVc336I20spElnw2XOOEEELP431f8iWwK8V5QPrBySyVm5UoPd0KHgA QlQYGSdvEXOeoWfqH/u1FikziyCzshwopisjJSx9fVQN4PT9vEBNXe2RkrJ10TtELC VFMneAPzpuOfvXsy+fU81ajw8iSYG9dwo19EAkt8= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2 Date: Wed, 01 Feb 2023 07:29:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108583 --- Comment #19 from rguenther at suse dot de --- On Tue, 31 Jan 2023, tnfchris at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108583 >=20 > --- Comment #18 from Tamar Christina --- > > >=20 > > > Ack, that also tracks with what I tried before, we don't indeed track= ranges > > > for vector ops. The general case can still be handled slightly better= (I think) > > > but it doesn't become as clear of a win as this one. > > >=20 > > > > You probably did so elsewhere some time ago, but what exactly are t= hose > > > > four instructions? (pointers to specifications appreciated) > > >=20 > > > For NEON we use: > > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instr= uctions/ADDHN--ADDHN2--Add-returning-High-Narrow- > >=20 > > so thats a add + pack high > >=20 >=20 > Yes, though with no overflow, the addition is done in twice the precision= of > the original type. So it's more a widening add + pack high which narrows = it > back and zero extends. >=20 > > > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instr= uctions/UADDW--UADDW2--Unsigned-Add-Wide- > >=20 > > and that unpacks (zero-extends) the high/low part of one operand of an = add > >=20 > > I wonder if we'd open-code the pack / unpack and use regular add whether > > combine can synthesize uaddw and addhn? The pack and unpack would be > > vec_perms on GIMPLE (plus V_C_E). >=20 > I don't think so for addhn, because it wouldn't truncate the top bits, it > truncates the bottom bits. >=20 > The instruction does > element1 =3D Elem[operand1, e, 2*esize]; > element2 =3D Elem[operand2, e, 2*esize]; >=20 > So it widens on input.=20 OK, so that's an ADD_HIGHPART_EXPR then? Though the highpart of an add is only a single bit, isn't it? For scalar you'd use the carry bit here and instructions like adc to consume it. Is addhn to do such thing on vectors? When writing generic vector code is combine able to synthesize addhn from widen, plus, pack-high? As said in the old discussion I'm not opposed to adding new IFNs, but I'd like to see useful building blocks (that ideally map to ISAs) instead of IFN-for-complex-pattern-X The alternative way was to improve division expansion in general which is the can_div_special_by_const_p thing, but we do not seem to be able to capture the requirements correctly here. > > So the difficulty here will be to decide whether that's in the end > > better than what the pattern handling code does now, right? Because > > I think most targets will be able to do the above but lacking the > > special adds it will be slower because of the extra packing/unpacking? > >=20 > > That said, can we possibly do just that costing (would be a first in > > the pattern code I guess) with a target hook? Or add optabs for > > the addh operations so we can query support? >=20 > We could, the alternative wouldn't be correct for costing I think.. if we > generate *+ , vec_perm that's gonna be more expensive. Well, the target cost model can always detect such patterns ... But sure, using the actual ISA is preferable for costing and also to avoid "breaking" the combination by later "optimization". OTOH at least some basic constant folding for all such ISA IFNs is required to avoid regressing cases where complete unrolling later allows constant evaluation but first vectorizing breaking that.=