public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <jeffreyalaw@gmail.com>
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
Date: Sun, 26 Jun 2022 13:55:06 -0600	[thread overview]
Message-ID: <6e9bf9a4-8f9c-e839-84c1-b20dee05ea6d@gmail.com> (raw)
In-Reply-To: <VI1PR08MB53259CC9F66E245E38CBC5DAFFB29@VI1PR08MB5325.eurprd08.prod.outlook.com>



On 6/21/2022 6:34 PM, Tamar Christina via Gcc-patches wrote:
>> -----Original Message-----
>> From: Tamar Christina
>> Sent: Tuesday, June 14, 2022 4:58 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Richard Biener
>> <rguenther@suse.de>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
>> bitmask
>>
>>
>>
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: Tuesday, June 14, 2022 2:43 PM
>>> To: Richard Biener <rguenther@suse.de>
>>> Cc: Tamar Christina <Tamar.Christina@arm.com>;
>>> gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
>>> bitmask
>>>
>>> Richard Biener <rguenther@suse.de> writes:
>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>> Sent: Monday, June 13, 2022 12:48 PM
>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
>>>>>> <Richard.Sandiford@arm.com>
>>>>>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by
>>>>>> pow2 bitmask
>>>>>>
>>>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>>>> Sent: Monday, June 13, 2022 10:39 AM
>>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard
>>>>>>>> Sandiford <Richard.Sandiford@arm.com>
>>>>>>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division
>>>>>>>> by
>>>>>>>> pow2 bitmask
>>>>>>>>
>>>>>>>> On Mon, 13 Jun 2022, Richard Biener wrote:
>>>>>>>>
>>>>>>>>> On Thu, 9 Jun 2022, Tamar Christina wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> In plenty of image and video processing code it's common
>>>>>>>>>> to modify pixel values by a widening operation and then
>>>>>>>>>> scale them back into range
>>>>>>>> by dividing by 255.
>>>>>>>>>> This patch adds an optab to allow us to emit an optimized
>>>>>>>>>> sequence when doing an unsigned division that is equivalent
>> to:
>>>>>>>>>>     x = y / (2 ^ (bitsize (y)/2)-1
>>>>>>>>>>
>>>>>>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>>>>>>>>>> x86_64-pc-linux-gnu and no issues.
>>>>>>>>>>
>>>>>>>>>> Ok for master?
>>>>>>>>> Looking at 2/2 it seems that this is the wrong way to
>>>>>>>>> attack the problem.  The ISA doesn't have such instruction
>>>>>>>>> so adding an optab looks premature.  I suppose that there's
>>>>>>>>> no unsigned vector integer division and thus we open-code
>>>>>>>>> that in a different
>>> way?
>>>>>>>>> Isn't the correct thing then to fixup that open-coding if
>>>>>>>>> it is more
>>>>>> efficient?
>>>>>>> The problem is that even if you fixup the open-coding it would
>>>>>>> need to be something target specific? The sequence of
>>>>>>> instructions we generate don't have a GIMPLE representation.
>>>>>>> So whatever is generated I'd have to fixup in RTL then.
>>>>>> What's the operation that doesn't have a GIMPLE representation?
>>>>> For NEON use two operations:
>>>>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n
>> bitsize(a)/2
>>>>>      Where the + widens and the >> narrows.  So you give it two
>>>>> shorts, get a byte 2. Add widening add of lowpart so basically
>>>>> lowpart (a +w b)
>>>>>
>>>>> For SVE2 we use a different sequence, we use two back-to-back
>>> sequences of:
>>>>> 1. Add narrow high part (bottom).  In SVE the Top and Bottom
>>>>> instructions
>>> select
>>>>>     Even and odd elements of the vector rather than "top half" and
>>>>> "bottom
>>> half".
>>>>>     So this instruction does : Add each vector element of the first
>>>>> source
>>> vector to the
>>>>>     corresponding vector element of the second source vector, and
>>>>> place
>>> the most
>>>>>      significant half of the result in the even-numbered half-width
>>> destination elements,
>>>>>      while setting the odd-numbered elements to zero.
>>>>>
>>>>> So there's an explicit permute in there. The instructions are
>>>>> sufficiently different that there wouldn't be a single GIMPLE
>>> representation.
>>>> I see.  Are these also useful to express scalar integer division?
>>>>
>>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or
>>>> suggest some piecemail things other targets might be able to do as
>>>> well.  It does look very special.  I'd also bikeshed it to
>>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
>>>> interpreted 'bitmask' correctly ;)).  It seems to be even less
>>>> general since it is an unary op and the actual divisor is
>>>> constrained by the mode itself?
>>> Yeah, those were my concerns as well.  For n-bit numbers, the same
>>> kind of arithmetic transformation can be used for any 2^m-1 for m in
>>> [n/2, n), so from a target-independent point of view, m==n/2 isn't
>> particularly special.
>>> Hard-coding one value of m would make sense if there was an underlying
>>> instruction that did exactly this, but like you say, there isn't.
>>>
>>> Would a compromise be to define an optab for ADDHN and then add a
>>> vector pattern for this division that (at least initially) prefers
>>> ADDHN over the current approach whenever ADDHN is available?  We
>> could
>>> then adapt the conditions on the pattern if other targets also provide
>>> ADDHN but don't want this transform.  (I think the other instructions
>>> in the pattern already have
>>> optabs.)
>>>
>>> That still leaves open the question about what to do about SVE2, but
>>> the underlying problem there is that the vectoriser doesn't know about
>>> the B/T layout.
>> Wouldn't it be better to just generalize the optab and to pass on the mask?
>> I'd prefer to do that than teach the vectorizer about ADDHN (which can't be
>> easily done now) let alone teaching it about B/T.   It also seems somewhat
>> unnecessary to diverge the implementation here in the mid-end. After all,
>> you can generate better SSE code here as well, so focusing on generating ISA
>> specific code from here for each ISA seems like the wrong approach to me.
> Ping, is there any consensus here?
Not that I've seen.  The ongoing discussion has clarified a few things 
in my mind, but I'm still wrapping my brain around what you're doing here.

jeff


  reply	other threads:[~2022-06-26 19:55 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09  4:39 Tamar Christina
2022-06-09  4:40 ` [PATCH 2/2]AArch64 aarch64: Add implementation for pow2 bitmask division Tamar Christina
2022-06-13  9:24 ` [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Richard Biener
2022-06-13  9:39   ` Richard Biener
2022-06-13 10:09     ` Tamar Christina
2022-06-13 11:47       ` Richard Biener
2022-06-13 14:37         ` Tamar Christina
2022-06-14 13:18           ` Richard Biener
2022-06-14 13:38             ` Tamar Christina
2022-06-14 13:42             ` Richard Sandiford
2022-06-14 15:57               ` Tamar Christina
2022-06-14 16:09                 ` Richard Biener
2022-06-22  0:34                 ` Tamar Christina
2022-06-26 19:55                   ` Jeff Law [this message]
2022-09-23  9:33 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Tamar Christina
2022-09-23  9:33 ` [PATCH 2/4]AArch64 Add implementation for pow2 bitmask division Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-09 16:02     ` Kyrylo Tkachov
2022-09-23  9:33 ` [PATCH 3/4]AArch64 Add SVE2 " Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-12 12:17   ` Richard Sandiford
2022-09-23  9:34 ` [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Tamar Christina
2022-10-31 11:34   ` Tamar Christina
2022-11-09  8:33     ` Tamar Christina
2022-11-12 12:25   ` Richard Sandiford
2022-11-12 12:33     ` Richard Sandiford
2022-09-26 10:39 ` [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization Richard Biener
2022-10-31 11:34   ` Tamar Christina
2022-10-31 17:12     ` Jeff Law
2022-11-08 17:36     ` Tamar Christina
2022-11-09  8:01       ` Richard Biener
2022-11-09  8:26         ` Tamar Christina
2022-11-09 10:37 ` Kyrylo Tkachov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e9bf9a4-8f9c-e839-84c1-b20dee05ea6d@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).