From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44989 invoked by alias); 31 May 2017 08:06:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 44964 invoked by uid 89); 31 May 2017 08:06:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=H*r:AUTH, Jeff X-HELO: mo4-p00-ob.smtp.rzone.de Received: from mo4-p00-ob.smtp.rzone.de (HELO mo4-p00-ob.smtp.rzone.de) (81.169.146.216) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 31 May 2017 08:06:39 +0000 X-RZG-AUTH: :LXoWVUeid/7A29J/hMvvT3ol15ykJcYwR/bcHRirORRW3yMcVao= X-RZG-CLASS-ID: mo00 Received: from [192.168.0.123] (mail.hightec-rt.com [213.135.1.215]) by smtp.strato.de (RZmta 40.7 DYNA|AUTH) with ESMTPSA id 607b07t4V86YbDJ (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Wed, 31 May 2017 10:06:34 +0200 (CEST) Subject: Re: [PATCH] Optimize divmod expansion (PR middle-end/79665) To: Jeff Law , Jakub Jelinek References: <20170222214046.GA1849@tucnak> Cc: gcc-patches@gcc.gnu.org From: Georg-Johann Lay Message-ID: <99048e22-aedc-df95-f1fe-dc1eaffd58b1@gjlay.de> Date: Wed, 31 May 2017 08:15:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2017-05/txt/msg02317.txt.bz2 On 23.02.2017 06:59, Jeff Law wrote: > On 02/22/2017 02:40 PM, Jakub Jelinek wrote: >> Hi! >> >> If both arguments of integer division or modulo are known to be >> non-negative >> in corresponding signed type, then signed as well as unsigned >> division/modulo >> shall have the exact same result and therefore we can choose between >> those >> two depending on which one is faster (or shorter for -Os), which varries >> a lot depending on target and especially for constant divisors on the >> exact >> divisor. expand_divmod itself is too complicated and we don't even have >> the ability to ask about costs e.g. for highpart multiplication without >> actually expanding it, so this patch just in that case tries both >> sequences, >> computes their costs and uses the cheaper (and for equal cost honors the >> actual original signedness of the operation). >> >> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> >> 2017-02-22 Jakub Jelinek >> >> PR middle-end/79665 >> * internal-fn.c (get_range_pos_neg): Moved to ... >> * tree.c (get_range_pos_neg): ... here. No longer static. >> * tree.h (get_range_pos_neg): New prototype. >> * expr.c (expand_expr_real_2) : If both >> arguments >> are known to be in between 0 and signed maximum inclusive, try to >> expand both unsigned and signed divmod and use the cheaper one from >> those. > OK. > jeff Hi, this causes a performance degradation for avr. When optimizing for speed, and with a known denominatior, then v6 uses s/umulMM3_highpart insn to avoid division because no div instruction is available. unsigned scale256 (unsigned val) { return value / 255; } With this patch, v7 now uses __divmodhi4 which is very expensive but the costs are not computed because rtlanal.c:seq_cost assumes a cost of ONE: for (; seq; seq = NEXT_INSN (seq)) { set = single_set (seq); if (set) cost += set_rtx_cost (set, speed); else cost++; } because divmod in not a single_set: (gdb) p seq $10 = (const rtx_insn *) 0x7ffff730d500 (gdb) pr warning: Expression is not an assignment (and might have no effect) (insn 14 13 0 (parallel [ (set (reg:HI 52) (div:HI (reg:HI 47) (reg:HI 54))) (set (reg:HI 53) (mod:HI (reg:HI 47) (reg:HI 54))) (clobber (reg:QI 21 r21)) (clobber (reg:HI 22 r22)) (clobber (reg:HI 24 r24)) (clobber (reg:HI 26 r26)) ]) "scale.c":7 -1 (nil)) (gdb) Hence the divmod appears to be much less expensive than the unsigned variant that computed the costs for mult_highpart. Johann