From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>,
Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>
Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
Date: Wed, 22 Jun 2022 00:34:53 +0000 [thread overview]
Message-ID: <VI1PR08MB53259CC9F66E245E38CBC5DAFFB29@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <VI1PR08MB53256C0BAB19A02DEF1700E9FFAA9@VI1PR08MB5325.eurprd08.prod.outlook.com>
> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, June 14, 2022 4:58 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; Richard Biener
> <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
>
>
>
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Tuesday, June 14, 2022 2:43 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>;
> > gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
> > bitmask
> >
> > Richard Biener <rguenther@suse.de> writes:
> > > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > >
> > >> > -----Original Message-----
> > >> > From: Richard Biener <rguenther@suse.de>
> > >> > Sent: Monday, June 13, 2022 12:48 PM
> > >> > To: Tamar Christina <Tamar.Christina@arm.com>
> > >> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > >> > <Richard.Sandiford@arm.com>
> > >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> > >> > pow2 bitmask
> > >> >
> > >> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > >> >
> > >> > > > -----Original Message-----
> > >> > > > From: Richard Biener <rguenther@suse.de>
> > >> > > > Sent: Monday, June 13, 2022 10:39 AM
> > >> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > >> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard
> > >> > > > Sandiford <Richard.Sandiford@arm.com>
> > >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division
> > >> > > > by
> > >> > > > pow2 bitmask
> > >> > > >
> > >> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > >> > > >
> > >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > >> > > > >
> > >> > > > > > Hi All,
> > >> > > > > >
> > >> > > > > > In plenty of image and video processing code it's common
> > >> > > > > > to modify pixel values by a widening operation and then
> > >> > > > > > scale them back into range
> > >> > > > by dividing by 255.
> > >> > > > > >
> > >> > > > > > This patch adds an optab to allow us to emit an optimized
> > >> > > > > > sequence when doing an unsigned division that is equivalent
> to:
> > >> > > > > >
> > >> > > > > > x = y / (2 ^ (bitsize (y)/2)-1
> > >> > > > > >
> > >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > >> > > > > > x86_64-pc-linux-gnu and no issues.
> > >> > > > > >
> > >> > > > > > Ok for master?
> > >> > > > >
> > >> > > > > Looking at 2/2 it seems that this is the wrong way to
> > >> > > > > attack the problem. The ISA doesn't have such instruction
> > >> > > > > so adding an optab looks premature. I suppose that there's
> > >> > > > > no unsigned vector integer division and thus we open-code
> > >> > > > > that in a different
> > way?
> > >> > > > > Isn't the correct thing then to fixup that open-coding if
> > >> > > > > it is more
> > >> > efficient?
> > >> > > >
> > >> > >
> > >> > > The problem is that even if you fixup the open-coding it would
> > >> > > need to be something target specific? The sequence of
> > >> > > instructions we generate don't have a GIMPLE representation.
> > >> > > So whatever is generated I'd have to fixup in RTL then.
> > >> >
> > >> > What's the operation that doesn't have a GIMPLE representation?
> > >>
> > >> For NEON use two operations:
> > >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n
> bitsize(a)/2
> > >> Where the + widens and the >> narrows. So you give it two
> > >> shorts, get a byte 2. Add widening add of lowpart so basically
> > >> lowpart (a +w b)
> > >>
> > >> For SVE2 we use a different sequence, we use two back-to-back
> > sequences of:
> > >> 1. Add narrow high part (bottom). In SVE the Top and Bottom
> > >> instructions
> > select
> > >> Even and odd elements of the vector rather than "top half" and
> > >> "bottom
> > half".
> > >>
> > >> So this instruction does : Add each vector element of the first
> > >> source
> > vector to the
> > >> corresponding vector element of the second source vector, and
> > >> place
> > the most
> > >> significant half of the result in the even-numbered half-width
> > destination elements,
> > >> while setting the odd-numbered elements to zero.
> > >>
> > >> So there's an explicit permute in there. The instructions are
> > >> sufficiently different that there wouldn't be a single GIMPLE
> > representation.
> > >
> > > I see. Are these also useful to express scalar integer division?
> > >
> > > I'll defer to others to ack the special udiv_pow2_bitmask optab or
> > > suggest some piecemail things other targets might be able to do as
> > > well. It does look very special. I'd also bikeshed it to
> > > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
> > > interpreted 'bitmask' correctly ;)). It seems to be even less
> > > general since it is an unary op and the actual divisor is
> > > constrained by the mode itself?
> >
> > Yeah, those were my concerns as well. For n-bit numbers, the same
> > kind of arithmetic transformation can be used for any 2^m-1 for m in
> > [n/2, n), so from a target-independent point of view, m==n/2 isn't
> particularly special.
> > Hard-coding one value of m would make sense if there was an underlying
> > instruction that did exactly this, but like you say, there isn't.
> >
> > Would a compromise be to define an optab for ADDHN and then add a
> > vector pattern for this division that (at least initially) prefers
> > ADDHN over the current approach whenever ADDHN is available? We
> could
> > then adapt the conditions on the pattern if other targets also provide
> > ADDHN but don't want this transform. (I think the other instructions
> > in the pattern already have
> > optabs.)
> >
> > That still leaves open the question about what to do about SVE2, but
> > the underlying problem there is that the vectoriser doesn't know about
> > the B/T layout.
>
> Wouldn't it be better to just generalize the optab and to pass on the mask?
> I'd prefer to do that than teach the vectorizer about ADDHN (which can't be
> easily done now) let alone teaching it about B/T. It also seems somewhat
> unnecessary to diverge the implementation here in the mid-end. After all,
> you can generate better SSE code here as well, so focusing on generating ISA
> specific code from here for each ISA seems like the wrong approach to me.
Ping, is there any consensus here?
Thanks,
Tamar
>
> Thanks,
> Tamar
>
> >
> > Thanks,
> > Richard
next prev parent reply other threads:[~2022-06-22 0:35 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 4:39 Tamar Christina
2022-06-09 4:40 ` [PATCH 2/2]AArch64 aarch64: Add implementation for pow2 bitmask division Tamar Christina
2022-06-13 9:24 ` [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Richard Biener
2022-06-13 9:39 ` Richard Biener
2022-06-13 10:09 ` Tamar Christina
2022-06-13 11:47 ` Richard Biener
2022-06-13 14:37 ` Tamar Christina
2022-06-14 13:18 ` Richard Biener
2022-06-14 13:38 ` Tamar Christina
2022-06-14 13:42 ` Richard Sandiford
2022-06-14 15:57 ` Tamar Christina
2022-06-14 16:09 ` Richard Biener
2022-06-22 0:34 ` Tamar Christina [this message]
2022-06-26 19:55 ` Jeff Law
2022-09-23 9:33 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Tamar Christina
2022-09-23 9:33 ` [PATCH 2/4]AArch64 Add implementation for pow2 bitmask division Tamar Christina
2022-10-31 11:34 ` Tamar Christina
2022-11-09 8:33 ` Tamar Christina
2022-11-09 16:02 ` Kyrylo Tkachov
2022-09-23 9:33 ` [PATCH 3/4]AArch64 Add SVE2 " Tamar Christina
2022-10-31 11:34 ` Tamar Christina
2022-11-09 8:33 ` Tamar Christina
2022-11-12 12:17 ` Richard Sandiford
2022-09-23 9:34 ` [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Tamar Christina
2022-10-31 11:34 ` Tamar Christina
2022-11-09 8:33 ` Tamar Christina
2022-11-12 12:25 ` Richard Sandiford
2022-11-12 12:33 ` Richard Sandiford
2022-09-26 10:39 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Richard Biener
2022-10-31 11:34 ` Tamar Christina
2022-10-31 17:12 ` Jeff Law
2022-11-08 17:36 ` Tamar Christina
2022-11-09 8:01 ` Richard Biener
2022-11-09 8:26 ` Tamar Christina
2022-11-09 10:37 ` Kyrylo Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB53259CC9F66E245E38CBC5DAFFB29@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).