From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jeffreyalaw@gmail.com>
Received: from mail-il1-x132.google.com (mail-il1-x132.google.com
 [IPv6:2607:f8b0:4864:20::132])
 by sourceware.org (Postfix) with ESMTPS id 41FAB3858281
 for <gcc-patches@gcc.gnu.org>; Sun, 26 Jun 2022 19:55:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 41FAB3858281
Received: by mail-il1-x132.google.com with SMTP id k6so4770090ilq.2
 for <gcc-patches@gcc.gnu.org>; Sun, 26 Jun 2022 12:55:10 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:references:from:in-reply-to
 :content-transfer-encoding;
 bh=fkgE7YL9cXBiPLo8ZChU1LXYbO0GKRkdeU15xta0o+U=;
 b=oG6kTxrGHjeBlYshMFktDRalrTzsJUQXiUQb7Hve7Vjn4LBL94Xw9HKEh7AaX4jOX2
 1xw6N6k8MFbUeCPPB3x/FfH9AevYOYgC84T7gulatz/QDqN7+ajuQoRXXaJOXREPt8wX
 1K9pmDX9EXbKGeWUpsVz0hD4M7tKaD+9DIxIIIJeWs2dis10+gxb+BMK2uH21w1Vgwth
 9AeD4soZYcCgiXcrUyhmxvVccA3yP9SDcb4cxpK+aH9jyjZ+FI+i/cq9kfxSddDoEk1j
 F6hEQCt3WfGK9K1hACOifcK3QTwxBSX/AoGCdFeBjE0I/wc+j2ih6zAHt3cvelePMRDF
 7XfA==
X-Gm-Message-State: AJIora/B/a+oYi6GTyI+KhH6UNvN/XuBirGwt/b4vAlJZXqPdJxC1Kof
 icePEm5NWein1NfE2m4k+HPRFFKt37c=
X-Google-Smtp-Source: AGRyM1trJG14JwzumN5PuPmt/mmcvW2v+jwBYGkHovR2SP5fmlGbOjVKlUACHlxBnNkNvlFxcCxlug==
X-Received: by 2002:a92:c567:0:b0:2d1:6268:2fd5 with SMTP id
 b7-20020a92c567000000b002d162682fd5mr5235967ilj.255.1656273309104; 
 Sun, 26 Jun 2022 12:55:09 -0700 (PDT)
Received: from [192.168.43.42] ([172.58.38.202])
 by smtp.gmail.com with ESMTPSA id
 j7-20020a056e02014700b002d16d5165a5sm3733483ilr.45.2022.06.26.12.55.08
 for <gcc-patches@gcc.gnu.org>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Sun, 26 Jun 2022 12:55:08 -0700 (PDT)
Message-ID: <6e9bf9a4-8f9c-e839-84c1-b20dee05ea6d@gmail.com>
Date: Sun, 26 Jun 2022 13:55:06 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.10.0
Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <patch-15779-tamar@arm.com>
 <n996n0p1-7137-86rn-o8or-sp37s23s87pp@fhfr.qr>
 <q77434s9-7825-n2r-p664-7ssqrn6no5n8@fhfr.qr>
 <VI1PR08MB532574ACE2174CE64460E480FFAB9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <sn5qn872-834o-8149-ns0-46996nsp41q1@fhfr.qr>
 <VI1PR08MB5325B5B8B2AB0A77E9424A59FFAB9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <2p382n54-427o-8q82-6o45-p2nn6869opr5@fhfr.qr> <mptv8t3cqej.fsf@arm.com>
 <VI1PR08MB53256C0BAB19A02DEF1700E9FFAA9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <VI1PR08MB53259CC9F66E245E38CBC5DAFFB29@VI1PR08MB5325.eurprd08.prod.outlook.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <VI1PR08MB53259CC9F66E245E38CBC5DAFFB29@VI1PR08MB5325.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jun 2022 19:55:12 -0000


On 6/21/2022 6:34 PM, Tamar Christina via Gcc-patches wrote:
>> -----Original Message-----
>> From: Tamar Christina
>> Sent: Tuesday, June 14, 2022 4:58 PM
>> To: Richard Sandiford <richard.sandiford@arm.com>; Richard Biener
>> <rguenther@suse.de>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
>> bitmask
>>
>>
>>
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: Tuesday, June 14, 2022 2:43 PM
>>> To: Richard Biener <rguenther@suse.de>
>>> Cc: Tamar Christina <Tamar.Christina@arm.com>;
>>> gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
>>> bitmask
>>>
>>> Richard Biener <rguenther@suse.de> writes:
>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>> Sent: Monday, June 13, 2022 12:48 PM
>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
>>>>>> <Richard.Sandiford@arm.com>
>>>>>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by
>>>>>> pow2 bitmask
>>>>>>
>>>>>> On Mon, 13 Jun 2022, Tamar Christina wrote:
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>>>> Sent: Monday, June 13, 2022 10:39 AM
>>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard
>>>>>>>> Sandiford <Richard.Sandiford@arm.com>
>>>>>>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division
>>>>>>>> by
>>>>>>>> pow2 bitmask
>>>>>>>>
>>>>>>>> On Mon, 13 Jun 2022, Richard Biener wrote:
>>>>>>>>
>>>>>>>>> On Thu, 9 Jun 2022, Tamar Christina wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> In plenty of image and video processing code it's common
>>>>>>>>>> to modify pixel values by a widening operation and then
>>>>>>>>>> scale them back into range
>>>>>>>> by dividing by 255.
>>>>>>>>>> This patch adds an optab to allow us to emit an optimized
>>>>>>>>>> sequence when doing an unsigned division that is equivalent
>> to:
>>>>>>>>>>     x = y / (2 ^ (bitsize (y)/2)-1
>>>>>>>>>>
>>>>>>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu,
>>>>>>>>>> x86_64-pc-linux-gnu and no issues.
>>>>>>>>>>
>>>>>>>>>> Ok for master?
>>>>>>>>> Looking at 2/2 it seems that this is the wrong way to
>>>>>>>>> attack the problem.  The ISA doesn't have such instruction
>>>>>>>>> so adding an optab looks premature.  I suppose that there's
>>>>>>>>> no unsigned vector integer division and thus we open-code
>>>>>>>>> that in a different
>>> way?
>>>>>>>>> Isn't the correct thing then to fixup that open-coding if
>>>>>>>>> it is more
>>>>>> efficient?
>>>>>>> The problem is that even if you fixup the open-coding it would
>>>>>>> need to be something target specific? The sequence of
>>>>>>> instructions we generate don't have a GIMPLE representation.
>>>>>>> So whatever is generated I'd have to fixup in RTL then.
>>>>>> What's the operation that doesn't have a GIMPLE representation?
>>>>> For NEON use two operations:
>>>>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n
>> bitsize(a)/2
>>>>>      Where the + widens and the >> narrows.  So you give it two
>>>>> shorts, get a byte 2. Add widening add of lowpart so basically
>>>>> lowpart (a +w b)
>>>>>
>>>>> For SVE2 we use a different sequence, we use two back-to-back
>>> sequences of:
>>>>> 1. Add narrow high part (bottom).  In SVE the Top and Bottom
>>>>> instructions
>>> select
>>>>>     Even and odd elements of the vector rather than "top half" and
>>>>> "bottom
>>> half".
>>>>>     So this instruction does : Add each vector element of the first
>>>>> source
>>> vector to the
>>>>>     corresponding vector element of the second source vector, and
>>>>> place
>>> the most
>>>>>      significant half of the result in the even-numbered half-width
>>> destination elements,
>>>>>      while setting the odd-numbered elements to zero.
>>>>>
>>>>> So there's an explicit permute in there. The instructions are
>>>>> sufficiently different that there wouldn't be a single GIMPLE
>>> representation.
>>>> I see.  Are these also useful to express scalar integer division?
>>>>
>>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or
>>>> suggest some piecemail things other targets might be able to do as
>>>> well.  It does look very special.  I'd also bikeshed it to
>>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
>>>> interpreted 'bitmask' correctly ;)).  It seems to be even less
>>>> general since it is an unary op and the actual divisor is
>>>> constrained by the mode itself?
>>> Yeah, those were my concerns as well.  For n-bit numbers, the same
>>> kind of arithmetic transformation can be used for any 2^m-1 for m in
>>> [n/2, n), so from a target-independent point of view, m==n/2 isn't
>> particularly special.
>>> Hard-coding one value of m would make sense if there was an underlying
>>> instruction that did exactly this, but like you say, there isn't.
>>>
>>> Would a compromise be to define an optab for ADDHN and then add a
>>> vector pattern for this division that (at least initially) prefers
>>> ADDHN over the current approach whenever ADDHN is available?  We
>> could
>>> then adapt the conditions on the pattern if other targets also provide
>>> ADDHN but don't want this transform.  (I think the other instructions
>>> in the pattern already have
>>> optabs.)
>>>
>>> That still leaves open the question about what to do about SVE2, but
>>> the underlying problem there is that the vectoriser doesn't know about
>>> the B/T layout.
>> Wouldn't it be better to just generalize the optab and to pass on the mask?
>> I'd prefer to do that than teach the vectorizer about ADDHN (which can't be
>> easily done now) let alone teaching it about B/T.   It also seems somewhat
>> unnecessary to diverge the implementation here in the mid-end. After all,
>> you can generate better SSE code here as well, so focusing on generating ISA
>> specific code from here for each ISA seems like the wrong approach to me.
> Ping, is there any consensus here?
Not that I've seen.  The ongoing discussion has clarified a few things 
in my mind, but I'm still wrapping my brain around what you're doing here.

jeff