From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13701 invoked by alias); 31 Oct 2012 12:45:18 -0000 Received: (qmail 13689 invoked by uid 22791); 31 Oct 2012 12:45:17 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-ob0-f175.google.com (HELO mail-ob0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 31 Oct 2012 12:44:59 +0000 Received: by mail-ob0-f175.google.com with SMTP id eq6so1378867obc.20 for ; Wed, 31 Oct 2012 05:44:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.192.74 with SMTP id he10mr30704975obc.87.1351687498331; Wed, 31 Oct 2012 05:44:58 -0700 (PDT) Received: by 10.76.95.202 with HTTP; Wed, 31 Oct 2012 05:44:58 -0700 (PDT) In-Reply-To: <87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> References: <506C72C7.7090207@naturalbridge.com> <87ehldi2kr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87a9w1hzq1.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <506F0C1A.5010705@naturalbridge.com> <87lifkhlo9.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <506F5B50.2040800@naturalbridge.com> <506F63CC.40507@naturalbridge.com> <50705457.5030505@naturalbridge.com> <5072BAD1.5080701@naturalbridge.com> <87txu4n41o.fsf@talisman.home> <50743E2B.6000104@naturalbridge.com> <50891AAC.8090301@naturalbridge.com> <87y5im3orb.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87pq3y3kyk.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> Date: Wed, 31 Oct 2012 12:50:00 -0000 Message-ID: Subject: Re: patch to fix constant math - 4th patch - the wide-int class. From: Richard Biener To: Richard Biener , Kenneth Zadeck , Mike Stump , gcc-patches , Lawrence Crowl , rdsandiford@googlemail.com Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg02915.txt.bz2 On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford wrote: > Richard Biener writes: >> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford >> wrote: >>> Richard Biener writes: >>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford >>>> wrote: >>>>> Richard Biener writes: >>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck >>>>>> wrote: >>>>>>> >>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote: >>>>>>>> >>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote: >>>>>>>>>>>> >>>>>>>>>>>> + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / >>>>>>>>>>>> HOST_BITS_PER_WIDE_INT]; >>>>>>>>>>>> >>>>>>>>>>>> are we sure this rounds properly? Consider a port with max by= te mode >>>>>>>>>>>> size 4 on a 64bit host. >>>>>>>>>>> >>>>>>>>>>> I do not believe that this can happen. The core compiler >>>>>>>>>>> includes all >>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits. >>>>>>>>>> >>>>>>>>>> And mode bitsizes are always power-of-two? I suppose so. >>>>>>>>> >>>>>>>>> Actually, no, they are not. Partial int modes can have bit sizes= that >>>>>>>>> are not power of two, and, if there isn't an int mode that is >>>>>>>>> bigger, we'd >>>>>>>>> want to round up the partial int bit size. Something like ((2 * >>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / >>>>>>>>> HOST_BITS_PER_WIDE_INT should do it. >>>>>>>>> >>>>>>>>>>>> I still would like to have the ability to provide specializati= ons of >>>>>>>>>>>> wide_int >>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template >>>>>>>>>>>> templated >>>>>>>>>>>> on the number of HWIs in val. Interface-wise wide_int<2> shou= ld be >>>>>>>>>>>> identical to double_int, thus we should be able to do >>>>>>>>>>>> >>>>>>>>>>>> typedef wide_int<2> double_int; >>>>>>>>>>> >>>>>>>>>>> If you want to go down this path after the patches get in, go f= or it. >>>>>>>>>>> I >>>>>>>>>>> see no use at all for this. >>>>>>>>>>> This was not meant to be a plug in replacement for double int. = This >>>>>>>>>>> goal of >>>>>>>>>>> this patch is to get the compiler to do the constant math the w= ay that >>>>>>>>>>> the >>>>>>>>>>> target does it. Any such instantiation is by definition placi= ng some >>>>>>>>>>> predefined limit that some target may not want. >>>>>>>>>> >>>>>>>>>> Well, what I don't really like is that we now have two implement= ations >>>>>>>>>> of functions that perform integer math on two-HWI sized integers= . What >>>>>>>>>> I also don't like too much is that we have two different interfa= ces to >>>>>>>>>> operate >>>>>>>>>> on them! Can't you see how I come to not liking this? Especial= ly the >>>>>>>>>> latter =85 >>>>>>>>> >>>>>>>>> double_int is logically dead. Reactoring wide-int and double-int= is a >>>>>>>>> waste of time, as the time is better spent removing double-int fr= om the >>>>>>>>> compiler. All the necessary semantics and code of double-int _ha= s_ been >>>>>>>>> refactored into wide-int already. Changing wide-int in any way t= o vend >>>>>>>>> anything to double-int is wrong, as once double-int is removed, >>>>>>>>> then all the >>>>>>>>> api changes to make double-int share from wide-int is wasted and >>>>>>>>> must then >>>>>>>>> be removed. The path forward is the complete removal of >>>>>>>>> double-int; it is >>>>>>>>> wrong, has been wrong and always will be wrong, nothing can chang= e that. >>>>>>>> >>>>>>>> double_int, compared to wide_int, is fast and lean. I doubt we wi= ll >>>>>>>> get rid of it - you >>>>>>>> will make compile-time math a _lot_ slower. Just profile when you= for >>>>>>>> example >>>>>>>> change get_inner_reference to use wide_ints. >>>>>>>> >>>>>>>> To be able to remove double_int in favor of wide_int requires _at = least_ >>>>>>>> templating wide_int on 'len' and providing specializations for 1 a= nd 2. >>>>>>>> >>>>>>>> It might be a non-issue for math that operates on trees or RTXen d= ue to >>>>>>>> the allocation overhead we pay, but in recent years we transitioned >>>>>>>> important >>>>>>>> paths away from using tree math to using double_ints _for speed re= asons_. >>>>>>>> >>>>>>>> Richard. >>>>>>> >>>>>>> i do not know why you believe this about the speed. double int = always >>>>>>> does synthetic math since you do everything at 128 bit precision. >>>>>>> >>>>>>> the thing about wide int, is that since it does math to the precisi= on's >>>>>>> size, it almost never does uses synthetic operations since the size= s for >>>>>>> almost every instance can be done using the native math on the mach= ine. >>>>>>> almost every call has a check to see if the operation can be done >>>>>>> natively. >>>>>>> I seriously doubt that you are going to do TI mode math much faster= than i >>>>>>> do it and if you do who cares. >>>>>>> >>>>>>> the number of calls does not effect the performance in any >>>>>>> negative way and >>>>>>> it fact is more efficient since common things that require more tha= n one >>>>>>> operation in double in are typically done in a single operation. >>>>>> >>>>>> Simple double-int operations like >>>>>> >>>>>> inline double_int >>>>>> double_int::and_not (double_int b) const >>>>>> { >>>>>> double_int result; >>>>>> result.low =3D low & ~b.low; >>>>>> result.high =3D high & ~b.high; >>>>>> return result; >>>>>> } >>>>>> >>>>>> are always going to be faster than conditionally executing only one >>>>>> operation >>>>>> (but inside an offline function). >>>>> >>>>> OK, this is really in reply to the 4.8 thing, but it felt more >>>>> appropriate here. >>>>> >>>>> It's interesting that you gave this example, since before you were >>>>> complaining about too many fused ops. Clearly this one could be >>>>> removed in favour of separate and() and not() operations, but why >>>>> not provide a fused one if there are clients who'll make use of it? >>>> >>>> I was more concerned about fused operations that use precision >>>> or bitsize as input. That is for example >>>> >>>>>> + bool only_sign_bit_p (unsigned int prec) const; >>>>>> + bool only_sign_bit_p () const; >>>> >>>> The first is construct a wide-int with precision prec (and sign- or >>>> zero-extend it) and then call only_sign_bit_p on it. Such function >>>> should not be necessary and existing callers should be questioned >>>> instead of introducing it. >>>> >>>> In fact wide-int seems to have so many "fused" operations that >>>> we run out of sensible recognizable names for them. Which results >>>> in a lot of confusion on what the functions actually do (at least for = me). >>> >>> Well, I suppose I can't really say anything useful either way on >>> that one, since I'm not writing the patch and I'm not reviewing it :-) >>> >>>>> I think Kenny's API is just taking that to its logical conclusion. >>>>> There doesn't seem to be anything sacrosanct about the current choice >>>>> of what's fused and what isn't. >>>> >>>> Maybe. I'd rather have seen an initial small wide-int API and fused >>>> operations introduced separately together with the places they are >>>> used. In the current way it's way too tedious to go over all of them >>>> and match them with callers, lookup enough context and then >>>> make up my mind on whether the caller should do sth different or not. >>>> >>>> Thus, consider the big initial API a reason that all this review takes >>>> so long ... >>>> >>>>> The speed problem we had using trees for internal arithmetic isn't >>>>> IMO a good argument for keeping double_int in preference to wide_int. >>>>> Allocating and constructing tree objects to hold temporary values, >>>>> storing an integer representation in it, then calling tree arithmetic >>>>> routines that pull out the integer representation again and create a >>>>> tree to hold the result, is going to be much slower than using either >>>>> double_int or wide_int. I'd be very surprised if we notice any >>>>> measurable difference between double_int and wide_int here. >>>>> >>>>> I still see no reason to keep double_int around. The width of a host >>>>> wide integer really shouldn't have any significance. >>>>> >>>>> Your main complaint seems to be that the wide_int API is different >>>>> from the double_int one, but we can't literally use the same API, sin= ce >>>>> double_int has an implicit precision and bitsize, and wide_int doesn'= t. >>>>> Having a precision that is separate from the underlying representation >>>>> is IMO the most important feature of wide_int, so: >>>>> >>>>> template wide_int<2> double_int; >>>>> >>>>> is never going to be a drop-in, API-compatible replacement for double= _int. >>>> >>>> My reasoning was that if you strip wide-int of precision and bitsize >>>> you have a double_int class. >>> >>> But you don't! Because... >>> >>>> Thus wide-int should have a base >>>> of that kind and just add precision / bitsize ontop of that. It would= n't >>>> be a step forward if we end up replacing double_int uses with >>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT, >>>> would it? >>> >>> ...the precision and bitsize isn't an optional extra, either conceptual= ly >>> or in implementation. wide_int happens to use N HOST_WIDE_INTS under >>> the hood, but the value of N is an internal implementation detail. >>> No operations are done to N HWIs, they're done to the number of bits >>> in the operands. Whereas a double_int class does everything to N HW= Is. >> >> If that's the only effect then either bitsize or precision is redundant = (and >> we also have len ...). Note I did not mention len above, thus the base >> class would retain 'len' and double-int would simply use 2 for it >> (if you don't template it but make it variable). > > But that means that wide_int has to model a P-bit operation as a > "normal" len*HOST_WIDE_INT operation and then fix up the result > after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - = the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). > I still don't > see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision > X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 ho= st). Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipula= te integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize > precision and that the bits > precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know mo= re about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). Richard. > Richard