From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id 41FAB3858281 for ; Sun, 26 Jun 2022 19:55:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 41FAB3858281 Received: by mail-il1-x132.google.com with SMTP id k6so4770090ilq.2 for ; Sun, 26 Jun 2022 12:55:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=fkgE7YL9cXBiPLo8ZChU1LXYbO0GKRkdeU15xta0o+U=; b=oG6kTxrGHjeBlYshMFktDRalrTzsJUQXiUQb7Hve7Vjn4LBL94Xw9HKEh7AaX4jOX2 1xw6N6k8MFbUeCPPB3x/FfH9AevYOYgC84T7gulatz/QDqN7+ajuQoRXXaJOXREPt8wX 1K9pmDX9EXbKGeWUpsVz0hD4M7tKaD+9DIxIIIJeWs2dis10+gxb+BMK2uH21w1Vgwth 9AeD4soZYcCgiXcrUyhmxvVccA3yP9SDcb4cxpK+aH9jyjZ+FI+i/cq9kfxSddDoEk1j F6hEQCt3WfGK9K1hACOifcK3QTwxBSX/AoGCdFeBjE0I/wc+j2ih6zAHt3cvelePMRDF 7XfA== X-Gm-Message-State: AJIora/B/a+oYi6GTyI+KhH6UNvN/XuBirGwt/b4vAlJZXqPdJxC1Kof icePEm5NWein1NfE2m4k+HPRFFKt37c= X-Google-Smtp-Source: AGRyM1trJG14JwzumN5PuPmt/mmcvW2v+jwBYGkHovR2SP5fmlGbOjVKlUACHlxBnNkNvlFxcCxlug== X-Received: by 2002:a92:c567:0:b0:2d1:6268:2fd5 with SMTP id b7-20020a92c567000000b002d162682fd5mr5235967ilj.255.1656273309104; Sun, 26 Jun 2022 12:55:09 -0700 (PDT) Received: from [192.168.43.42] ([172.58.38.202]) by smtp.gmail.com with ESMTPSA id j7-20020a056e02014700b002d16d5165a5sm3733483ilr.45.2022.06.26.12.55.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 26 Jun 2022 12:55:08 -0700 (PDT) Message-ID: <6e9bf9a4-8f9c-e839-84c1-b20dee05ea6d@gmail.com> Date: Sun, 26 Jun 2022 13:55:06 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask Content-Language: en-US To: gcc-patches@gcc.gnu.org References: <2p382n54-427o-8q82-6o45-p2nn6869opr5@fhfr.qr> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jun 2022 19:55:12 -0000 On 6/21/2022 6:34 PM, Tamar Christina via Gcc-patches wrote: >> -----Original Message----- >> From: Tamar Christina >> Sent: Tuesday, June 14, 2022 4:58 PM >> To: Richard Sandiford ; Richard Biener >> >> Cc: gcc-patches@gcc.gnu.org; nd >> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2 >> bitmask >> >> >> >>> -----Original Message----- >>> From: Richard Sandiford >>> Sent: Tuesday, June 14, 2022 2:43 PM >>> To: Richard Biener >>> Cc: Tamar Christina ; >>> gcc-patches@gcc.gnu.org; nd >>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2 >>> bitmask >>> >>> Richard Biener writes: >>>> On Mon, 13 Jun 2022, Tamar Christina wrote: >>>> >>>>>> -----Original Message----- >>>>>> From: Richard Biener >>>>>> Sent: Monday, June 13, 2022 12:48 PM >>>>>> To: Tamar Christina >>>>>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford >>>>>> >>>>>> Subject: RE: [PATCH 1/2]middle-end Support optimized division by >>>>>> pow2 bitmask >>>>>> >>>>>> On Mon, 13 Jun 2022, Tamar Christina wrote: >>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Richard Biener >>>>>>>> Sent: Monday, June 13, 2022 10:39 AM >>>>>>>> To: Tamar Christina >>>>>>>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard >>>>>>>> Sandiford >>>>>>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division >>>>>>>> by >>>>>>>> pow2 bitmask >>>>>>>> >>>>>>>> On Mon, 13 Jun 2022, Richard Biener wrote: >>>>>>>> >>>>>>>>> On Thu, 9 Jun 2022, Tamar Christina wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> In plenty of image and video processing code it's common >>>>>>>>>> to modify pixel values by a widening operation and then >>>>>>>>>> scale them back into range >>>>>>>> by dividing by 255. >>>>>>>>>> This patch adds an optab to allow us to emit an optimized >>>>>>>>>> sequence when doing an unsigned division that is equivalent >> to: >>>>>>>>>> x = y / (2 ^ (bitsize (y)/2)-1 >>>>>>>>>> >>>>>>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu, >>>>>>>>>> x86_64-pc-linux-gnu and no issues. >>>>>>>>>> >>>>>>>>>> Ok for master? >>>>>>>>> Looking at 2/2 it seems that this is the wrong way to >>>>>>>>> attack the problem. The ISA doesn't have such instruction >>>>>>>>> so adding an optab looks premature. I suppose that there's >>>>>>>>> no unsigned vector integer division and thus we open-code >>>>>>>>> that in a different >>> way? >>>>>>>>> Isn't the correct thing then to fixup that open-coding if >>>>>>>>> it is more >>>>>> efficient? >>>>>>> The problem is that even if you fixup the open-coding it would >>>>>>> need to be something target specific? The sequence of >>>>>>> instructions we generate don't have a GIMPLE representation. >>>>>>> So whatever is generated I'd have to fixup in RTL then. >>>>>> What's the operation that doesn't have a GIMPLE representation? >>>>> For NEON use two operations: >>>>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n >> bitsize(a)/2 >>>>> Where the + widens and the >> narrows. So you give it two >>>>> shorts, get a byte 2. Add widening add of lowpart so basically >>>>> lowpart (a +w b) >>>>> >>>>> For SVE2 we use a different sequence, we use two back-to-back >>> sequences of: >>>>> 1. Add narrow high part (bottom). In SVE the Top and Bottom >>>>> instructions >>> select >>>>> Even and odd elements of the vector rather than "top half" and >>>>> "bottom >>> half". >>>>> So this instruction does : Add each vector element of the first >>>>> source >>> vector to the >>>>> corresponding vector element of the second source vector, and >>>>> place >>> the most >>>>> significant half of the result in the even-numbered half-width >>> destination elements, >>>>> while setting the odd-numbered elements to zero. >>>>> >>>>> So there's an explicit permute in there. The instructions are >>>>> sufficiently different that there wouldn't be a single GIMPLE >>> representation. >>>> I see. Are these also useful to express scalar integer division? >>>> >>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or >>>> suggest some piecemail things other targets might be able to do as >>>> well. It does look very special. I'd also bikeshed it to >>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I >>>> interpreted 'bitmask' correctly ;)). It seems to be even less >>>> general since it is an unary op and the actual divisor is >>>> constrained by the mode itself? >>> Yeah, those were my concerns as well. For n-bit numbers, the same >>> kind of arithmetic transformation can be used for any 2^m-1 for m in >>> [n/2, n), so from a target-independent point of view, m==n/2 isn't >> particularly special. >>> Hard-coding one value of m would make sense if there was an underlying >>> instruction that did exactly this, but like you say, there isn't. >>> >>> Would a compromise be to define an optab for ADDHN and then add a >>> vector pattern for this division that (at least initially) prefers >>> ADDHN over the current approach whenever ADDHN is available? We >> could >>> then adapt the conditions on the pattern if other targets also provide >>> ADDHN but don't want this transform. (I think the other instructions >>> in the pattern already have >>> optabs.) >>> >>> That still leaves open the question about what to do about SVE2, but >>> the underlying problem there is that the vectoriser doesn't know about >>> the B/T layout. >> Wouldn't it be better to just generalize the optab and to pass on the mask? >> I'd prefer to do that than teach the vectorizer about ADDHN (which can't be >> easily done now) let alone teaching it about B/T. It also seems somewhat >> unnecessary to diverge the implementation here in the mid-end. After all, >> you can generate better SSE code here as well, so focusing on generating ISA >> specific code from here for each ISA seems like the wrong approach to me. > Ping, is there any consensus here? Not that I've seen.  The ongoing discussion has clarified a few things in my mind, but I'm still wrapping my brain around what you're doing here. jeff