From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id C4FAE385C331
 for <gcc-patches@gcc.gnu.org>; Tue, 14 Jun 2022 16:09:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C4FAE385C331
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by smtp-out1.suse.de (Postfix) with ESMTPS id 088F321B8C;
 Tue, 14 Jun 2022 16:09:44 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F1A28139EC;
 Tue, 14 Jun 2022 16:09:43 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
 by imap2.suse-dmz.suse.de with ESMTPSA id 3x31OseyqGLjcAAAMHmgww
 (envelope-from <rguenther@suse.de>); Tue, 14 Jun 2022 16:09:43 +0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
From: Richard Biener <rguenther@suse.de>
Mime-Version: 1.0 (1.0)
Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask 
Date: Tue, 14 Jun 2022 18:09:43 +0200
Message-Id: <540D4924-1B10-457C-B50D-18C76E81BC13@suse.de>
References: <VI1PR08MB53256C0BAB19A02DEF1700E9FFAA9@VI1PR08MB5325.eurprd08.prod.outlook.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>, nd <nd@arm.com>
In-Reply-To: <VI1PR08MB53256C0BAB19A02DEF1700E9FFAA9@VI1PR08MB5325.eurprd08.prod.outlook.com>
To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
X-Mailer: iPhone Mail (19F77)
X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jun 2022 16:09:50 -0000


> Am 14.06.2022 um 17:58 schrieb Tamar Christina via Gcc-patches <gcc-patche=
s@gcc.gnu.org>:
>=20
> =EF=BB=BF
>=20
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Tuesday, June 14, 2022 2:43 PM
>> To: Richard Biener <rguenther@suse.de>
>> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
>> nd <nd@arm.com>
>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
>> bitmask
>>=20
>> Richard Biener <rguenther@suse.de> writes:
>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>=20
>>>>> -----Original Message-----
>>>>> From: Richard Biener <rguenther@suse.de>
>>>>> Sent: Monday, June 13, 2022 12:48 PM
>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
>>>>> <Richard.Sandiford@arm.com>
>>>>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by
>>>>> pow2 bitmask
>>>>>=20
>>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>>>=20
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>>> Sent: Monday, June 13, 2022 10:39 AM
>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
>>>>>>> <Richard.Sandiford@arm.com>
>>>>>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division
>>>>>>> by
>>>>>>> pow2 bitmask
>>>>>>>=20
>>>>>>> On Mon, 13 Jun 2022, Richard Biener wrote:
>>>>>>>=20
>>>>>>>> On Thu, 9 Jun 2022, Tamar Christina wrote:
>>>>>>>>=20
>>>>>>>>> Hi All,
>>>>>>>>>=20
>>>>>>>>> In plenty of image and video processing code it's common to
>>>>>>>>> modify pixel values by a widening operation and then scale
>>>>>>>>> them back into range
>>>>>>> by dividing by 255.
>>>>>>>>>=20
>>>>>>>>> This patch adds an optab to allow us to emit an optimized
>>>>>>>>> sequence when doing an unsigned division that is equivalent to:
>>>>>>>>>=20
>>>>>>>>>   x =3D y / (2 ^ (bitsize (y)/2)-1
>>>>>>>>>=20
>>>>>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>>>>>>>>> x86_64-pc-linux-gnu and no issues.
>>>>>>>>>=20
>>>>>>>>> Ok for master?
>>>>>>>>=20
>>>>>>>> Looking at 2/2 it seems that this is the wrong way to attack
>>>>>>>> the problem.  The ISA doesn't have such instruction so adding
>>>>>>>> an optab looks premature.  I suppose that there's no unsigned
>>>>>>>> vector integer division and thus we open-code that in a different
>> way?
>>>>>>>> Isn't the correct thing then to fixup that open-coding if it
>>>>>>>> is more
>>>>> efficient?
>>>>>>>=20
>>>>>>=20
>>>>>> The problem is that even if you fixup the open-coding it would
>>>>>> need to be something target specific? The sequence of
>>>>>> instructions we generate don't have a GIMPLE representation.  So
>>>>>> whatever is generated I'd have to fixup in RTL then.
>>>>>=20
>>>>> What's the operation that doesn't have a GIMPLE representation?
>>>>=20
>>>> For NEON use two operations:
>>>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(=
a)/2
>>>>    Where the + widens and the >> narrows.  So you give it two
>>>> shorts, get a byte 2. Add widening add of lowpart so basically
>>>> lowpart (a +w b)
>>>>=20
>>>> For SVE2 we use a different sequence, we use two back-to-back
>> sequences of:
>>>> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructio=
ns
>> select
>>>>   Even and odd elements of the vector rather than "top half" and "botto=
m
>> half".
>>>>=20
>>>>   So this instruction does : Add each vector element of the first sourc=
e
>> vector to the
>>>>   corresponding vector element of the second source vector, and place
>> the most
>>>>    significant half of the result in the even-numbered half-width
>> destination elements,
>>>>    while setting the odd-numbered elements to zero.
>>>>=20
>>>> So there's an explicit permute in there. The instructions are
>>>> sufficiently different that there wouldn't be a single GIMPLE
>> representation.
>>>=20
>>> I see.  Are these also useful to express scalar integer division?
>>>=20
>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or
>>> suggest some piecemail things other targets might be able to do as
>>> well.  It does look very special.  I'd also bikeshed it to
>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
>>> interpreted 'bitmask' correctly ;)).  It seems to be even less general
>>> since it is an unary op and the actual divisor is constrained by the
>>> mode itself?
>>=20
>> Yeah, those were my concerns as well.  For n-bit numbers, the same kind o=
f
>> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so=

>> from a target-independent point of view, m=3D=3Dn/2 isn't particularly sp=
ecial.
>> Hard-coding one value of m would make sense if there was an underlying
>> instruction that did exactly this, but like you say, there isn't.
>>=20
>> Would a compromise be to define an optab for ADDHN and then add a vector
>> pattern for this division that (at least initially) prefers ADDHN over th=
e
>> current approach whenever ADDHN is available?  We could then adapt the
>> conditions on the pattern if other targets also provide ADDHN but don't w=
ant
>> this transform.  (I think the other instructions in the pattern already h=
ave
>> optabs.)
>>=20
>> That still leaves open the question about what to do about SVE2, but the
>> underlying problem there is that the vectoriser doesn't know about the B/=
T
>> layout.
>=20
> Wouldn't it be better to just generalize the optab and to pass on the mask=
?

You could implement udivvhiN3 as well, but we=E2=80=99d need to make sure to=
 test predicates which should make sure only supported constants are let thr=
ough.

> I'd prefer to do that than teach the vectorizer about ADDHN (which can't b=
e
> easily done now) let alone teaching it about B/T.   It also seems somewhat=

> unnecessary to diverge the implementation here in the mid-end. After all,
> you can generate better SSE code here as well, so focusing on generating I=
SA
> specific code from here for each ISA seems like the wrong approach to me.
>=20
> Thanks,
> Tamar
>=20
>>=20
>> Thanks,
>> Richard