From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13630 invoked by alias); 31 Oct 2012 12:22:40 -0000 Received: (qmail 13613 invoked by uid 22791); 31 Oct 2012 12:22:37 -0000 X-SWARE-Spam-Status: No, hits=-3.8 required=5.0 tests=AWL,BAYES_00,DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,KHOP_RCVD_TRUST,NML_ADSP_CUSTOM_MED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-ee0-f47.google.com (HELO mail-ee0-f47.google.com) (74.125.83.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 31 Oct 2012 12:22:21 +0000 Received: by mail-ee0-f47.google.com with SMTP id t10so792220eei.20 for ; Wed, 31 Oct 2012 05:22:20 -0700 (PDT) Received: by 10.14.184.2 with SMTP id r2mr84556966eem.43.1351686140056; Wed, 31 Oct 2012 05:22:20 -0700 (PDT) Received: from sandifor-thinkpad.stglab.manchester.uk.ibm.com (gbibp9ph1--blueice2n1.emea.ibm.com. [195.212.29.75]) by mx.google.com with ESMTPS id f3sm7597534eeo.13.2012.10.31.05.22.18 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 31 Oct 2012 05:22:19 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Kenneth Zadeck , Mike Stump , gcc-patches , Lawrence Crowl , rdsandiford@googlemail.com Cc: Kenneth Zadeck , Mike Stump , gcc-patches , Lawrence Crowl Subject: Re: patch to fix constant math - 4th patch - the wide-int class. References: <506C72C7.7090207@naturalbridge.com> <87ehldi2kr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87a9w1hzq1.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <506F0C1A.5010705@naturalbridge.com> <87lifkhlo9.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <506F5B50.2040800@naturalbridge.com> <506F63CC.40507@naturalbridge.com> <50705457.5030505@naturalbridge.com> <5072BAD1.5080701@naturalbridge.com> <87txu4n41o.fsf@talisman.home> <50743E2B.6000104@naturalbridge.com> <50891AAC.8090301@naturalbridge.com> <87y5im3orb.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87pq3y3kyk.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Date: Wed, 31 Oct 2012 12:23:00 -0000 In-Reply-To: (Richard Biener's message of "Wed, 31 Oct 2012 13:11:53 +0100") Message-ID: <87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg02906.txt.bz2 Richard Biener writes: > On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford >>> wrote: >>>> Richard Biener writes: >>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck >>>>> wrote: >>>>>> >>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote: >>>>>>> >>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump >>>>>>> wrote: >>>>>>>> >>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote: >>>>>>>>>>> >>>>>>>>>>> + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / >>>>>>>>>>> HOST_BITS_PER_WIDE_INT]; >>>>>>>>>>> >>>>>>>>>>> are we sure this rounds properly? Consider a port with max byt= e mode >>>>>>>>>>> size 4 on a 64bit host. >>>>>>>>>> >>>>>>>>>> I do not believe that this can happen. The core compiler >>>>>>>>>> includes all >>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits. >>>>>>>>> >>>>>>>>> And mode bitsizes are always power-of-two? I suppose so. >>>>>>>> >>>>>>>> Actually, no, they are not. Partial int modes can have bit sizes = that >>>>>>>> are not power of two, and, if there isn't an int mode that is >>>>>>>> bigger, we'd >>>>>>>> want to round up the partial int bit size. Something like ((2 * >>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / >>>>>>>> HOST_BITS_PER_WIDE_INT should do it. >>>>>>>> >>>>>>>>>>> I still would like to have the ability to provide specializatio= ns of >>>>>>>>>>> wide_int >>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template >>>>>>>>>>> templated >>>>>>>>>>> on the number of HWIs in val. Interface-wise wide_int<2> shoul= d be >>>>>>>>>>> identical to double_int, thus we should be able to do >>>>>>>>>>> >>>>>>>>>>> typedef wide_int<2> double_int; >>>>>>>>>> >>>>>>>>>> If you want to go down this path after the patches get in, go fo= r it. >>>>>>>>>> I >>>>>>>>>> see no use at all for this. >>>>>>>>>> This was not meant to be a plug in replacement for double int. T= his >>>>>>>>>> goal of >>>>>>>>>> this patch is to get the compiler to do the constant math the wa= y that >>>>>>>>>> the >>>>>>>>>> target does it. Any such instantiation is by definition placin= g some >>>>>>>>>> predefined limit that some target may not want. >>>>>>>>> >>>>>>>>> Well, what I don't really like is that we now have two implementa= tions >>>>>>>>> of functions that perform integer math on two-HWI sized integers.= What >>>>>>>>> I also don't like too much is that we have two different interfac= es to >>>>>>>>> operate >>>>>>>>> on them! Can't you see how I come to not liking this? Especiall= y the >>>>>>>>> latter =E2=80=A6 >>>>>>>> >>>>>>>> double_int is logically dead. Reactoring wide-int and double-int = is a >>>>>>>> waste of time, as the time is better spent removing double-int fro= m the >>>>>>>> compiler. All the necessary semantics and code of double-int _has= _ been >>>>>>>> refactored into wide-int already. Changing wide-int in any way to= vend >>>>>>>> anything to double-int is wrong, as once double-int is removed, >>>>>>>> then all the >>>>>>>> api changes to make double-int share from wide-int is wasted and >>>>>>>> must then >>>>>>>> be removed. The path forward is the complete removal of >>>>>>>> double-int; it is >>>>>>>> wrong, has been wrong and always will be wrong, nothing can change= that. >>>>>>> >>>>>>> double_int, compared to wide_int, is fast and lean. I doubt we will >>>>>>> get rid of it - you >>>>>>> will make compile-time math a _lot_ slower. Just profile when you = for >>>>>>> example >>>>>>> change get_inner_reference to use wide_ints. >>>>>>> >>>>>>> To be able to remove double_int in favor of wide_int requires _at l= east_ >>>>>>> templating wide_int on 'len' and providing specializations for 1 an= d 2. >>>>>>> >>>>>>> It might be a non-issue for math that operates on trees or RTXen du= e to >>>>>>> the allocation overhead we pay, but in recent years we transitioned >>>>>>> important >>>>>>> paths away from using tree math to using double_ints _for speed rea= sons_. >>>>>>> >>>>>>> Richard. >>>>>> >>>>>> i do not know why you believe this about the speed. double int a= lways >>>>>> does synthetic math since you do everything at 128 bit precision. >>>>>> >>>>>> the thing about wide int, is that since it does math to the precisio= n's >>>>>> size, it almost never does uses synthetic operations since the sizes= for >>>>>> almost every instance can be done using the native math on the machi= ne. >>>>>> almost every call has a check to see if the operation can be done >>>>>> natively. >>>>>> I seriously doubt that you are going to do TI mode math much faster = than i >>>>>> do it and if you do who cares. >>>>>> >>>>>> the number of calls does not effect the performance in any >>>>>> negative way and >>>>>> it fact is more efficient since common things that require more than= one >>>>>> operation in double in are typically done in a single operation. >>>>> >>>>> Simple double-int operations like >>>>> >>>>> inline double_int >>>>> double_int::and_not (double_int b) const >>>>> { >>>>> double_int result; >>>>> result.low =3D low & ~b.low; >>>>> result.high =3D high & ~b.high; >>>>> return result; >>>>> } >>>>> >>>>> are always going to be faster than conditionally executing only one >>>>> operation >>>>> (but inside an offline function). >>>> >>>> OK, this is really in reply to the 4.8 thing, but it felt more >>>> appropriate here. >>>> >>>> It's interesting that you gave this example, since before you were >>>> complaining about too many fused ops. Clearly this one could be >>>> removed in favour of separate and() and not() operations, but why >>>> not provide a fused one if there are clients who'll make use of it? >>> >>> I was more concerned about fused operations that use precision >>> or bitsize as input. That is for example >>> >>>>> + bool only_sign_bit_p (unsigned int prec) const; >>>>> + bool only_sign_bit_p () const; >>> >>> The first is construct a wide-int with precision prec (and sign- or >>> zero-extend it) and then call only_sign_bit_p on it. Such function >>> should not be necessary and existing callers should be questioned >>> instead of introducing it. >>> >>> In fact wide-int seems to have so many "fused" operations that >>> we run out of sensible recognizable names for them. Which results >>> in a lot of confusion on what the functions actually do (at least for m= e). >> >> Well, I suppose I can't really say anything useful either way on >> that one, since I'm not writing the patch and I'm not reviewing it :-) >> >>>> I think Kenny's API is just taking that to its logical conclusion. >>>> There doesn't seem to be anything sacrosanct about the current choice >>>> of what's fused and what isn't. >>> >>> Maybe. I'd rather have seen an initial small wide-int API and fused >>> operations introduced separately together with the places they are >>> used. In the current way it's way too tedious to go over all of them >>> and match them with callers, lookup enough context and then >>> make up my mind on whether the caller should do sth different or not. >>> >>> Thus, consider the big initial API a reason that all this review takes >>> so long ... >>> >>>> The speed problem we had using trees for internal arithmetic isn't >>>> IMO a good argument for keeping double_int in preference to wide_int. >>>> Allocating and constructing tree objects to hold temporary values, >>>> storing an integer representation in it, then calling tree arithmetic >>>> routines that pull out the integer representation again and create a >>>> tree to hold the result, is going to be much slower than using either >>>> double_int or wide_int. I'd be very surprised if we notice any >>>> measurable difference between double_int and wide_int here. >>>> >>>> I still see no reason to keep double_int around. The width of a host >>>> wide integer really shouldn't have any significance. >>>> >>>> Your main complaint seems to be that the wide_int API is different >>>> from the double_int one, but we can't literally use the same API, since >>>> double_int has an implicit precision and bitsize, and wide_int doesn't. >>>> Having a precision that is separate from the underlying representation >>>> is IMO the most important feature of wide_int, so: >>>> >>>> template wide_int<2> double_int; >>>> >>>> is never going to be a drop-in, API-compatible replacement for double_= int. >>> >>> My reasoning was that if you strip wide-int of precision and bitsize >>> you have a double_int class. >> >> But you don't! Because... >> >>> Thus wide-int should have a base >>> of that kind and just add precision / bitsize ontop of that. It wouldn= 't >>> be a step forward if we end up replacing double_int uses with >>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT, >>> would it? >> >> ...the precision and bitsize isn't an optional extra, either conceptually >> or in implementation. wide_int happens to use N HOST_WIDE_INTS under >> the hood, but the value of N is an internal implementation detail. >> No operations are done to N HWIs, they're done to the number of bits >> in the operands. Whereas a double_int class does everything to N HWI= s. > > If that's the only effect then either bitsize or precision is redundant (= and > we also have len ...). Note I did not mention len above, thus the base > class would retain 'len' and double-int would simply use 2 for it > (if you don't template it but make it variable). But that means that wide_int has to model a P-bit operation as a "normal" len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Richard