From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-330588-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13701 invoked by alias); 31 Oct 2012 12:45:18 -0000
Received: (qmail 13689 invoked by uid 22791); 31 Oct 2012 12:45:17 -0000
X-SWARE-Spam-Status: No, hits=-5.0 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE
X-Spam-Check-By: sourceware.org
Received: from mail-ob0-f175.google.com (HELO mail-ob0-f175.google.com) (209.85.214.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 31 Oct 2012 12:44:59 +0000
Received: by mail-ob0-f175.google.com with SMTP id eq6so1378867obc.20        for <gcc-patches@gcc.gnu.org>; Wed, 31 Oct 2012 05:44:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.192.74 with SMTP id he10mr30704975obc.87.1351687498331; Wed, 31 Oct 2012 05:44:58 -0700 (PDT)
Received: by 10.76.95.202 with HTTP; Wed, 31 Oct 2012 05:44:58 -0700 (PDT)
In-Reply-To: <87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
References: <506C72C7.7090207@naturalbridge.com>	<87ehldi2kr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc0B27i23X8Sg+=yOd_SqKedB3RunBDs4b-Jm-fSAD-hwA@mail.gmail.com>	<87a9w1hzq1.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc0Sj3+yBXRjLUG=0=_NGQiU1VqJk35hjsWqHR9bCnqspg@mail.gmail.com>	<506F0C1A.5010705@naturalbridge.com>	<87lifkhlo9.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<506F5B50.2040800@naturalbridge.com>	<506F63CC.40507@naturalbridge.com>	<50705457.5030505@naturalbridge.com>	<5072BAD1.5080701@naturalbridge.com>	<87txu4n41o.fsf@talisman.home>	<50743E2B.6000104@naturalbridge.com>	<CAFiYyc2hQLUNzei2_cFmfDqV8GQORAMSwehrS_HEqg+r7j9Wjw@mail.gmail.com>	<CAFiYyc1=8=LffiZ=EDBOy_uzn_ARdXk1OWxT=abYd8ot+iFp5Q@mail.gmail.com>	<50891AAC.8090301@naturalbridge.com>	<CAFiYyc15kmhRWhN3tsZqJDbZ5Uh6tVa45ssiYdsytLEfqaZ4zw@mail.gmail.com>	<87y5im3orb.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc2buJtu8RMKnLnvvb-A2=aYwopO+RBLPO6iJ3gKnq-hvg@mail.gmail.com>	<87pq3y3kyk.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc3NjOxpQ-Y9GDrQOET+dc3LXWuiuM=DxqmyASE8urRoWw@mail.gmail.com>	<87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
Date: Wed, 31 Oct 2012 12:50:00 -0000
Message-ID: <CAFiYyc1V6qC7-OUfT7_qwje038ibFXJ1FJAEbPfmB8Fxz-eFYw@mail.gmail.com>
Subject: Re: patch to fix constant math - 4th patch - the wide-int class.
From: Richard Biener <richard.guenther@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>, Kenneth Zadeck <zadeck@naturalbridge.com>, 	Mike Stump <mikestump@comcast.net>, gcc-patches <gcc-patches@gcc.gnu.org>, 	Lawrence Crowl <crowl@google.com>, rdsandiford@googlemail.com
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-10/txt/msg02915.txt.bz2

On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>> <rdsandiford@googlemail.com> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>> <rdsandiford@googlemail.com> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>>
>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gma=
il.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>
>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max by=
te mode
>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>
>>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>>> includes all
>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>
>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>
>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes=
 that
>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>> bigger, we'd
>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>
>>>>>>>>>>>> I still would like to have the ability to provide specializati=
ons of
>>>>>>>>>>>> wide_int
>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>> templated
>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> shou=
ld be
>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>
>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>
>>>>>>>>>>> If you want to go down this path after the patches get in, go f=
or it.
>>>>>>>>>>> I
>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>> This was not meant to be a plug in replacement for double int. =
This
>>>>>>>>>>> goal of
>>>>>>>>>>> this patch is to get the compiler to do the constant math the w=
ay that
>>>>>>>>>>> the
>>>>>>>>>>> target does it.   Any such instantiation is by definition placi=
ng some
>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>
>>>>>>>>>> Well, what I don't really like is that we now have two implement=
ations
>>>>>>>>>> of functions that perform integer math on two-HWI sized integers=
.  What
>>>>>>>>>> I also don't like too much is that we have two different interfa=
ces to
>>>>>>>>>> operate
>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especial=
ly the
>>>>>>>>>> latter =85
>>>>>>>>>
>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int=
 is a
>>>>>>>>> waste of time, as the time is better spent removing double-int fr=
om the
>>>>>>>>> compiler.  All the necessary semantics and code of double-int _ha=
s_ been
>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way t=
o vend
>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>> then all the
>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>> must then
>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>> double-int; it is
>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can chang=
e that.
>>>>>>>>
>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we wi=
ll
>>>>>>>> get rid of it - you
>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you=
 for
>>>>>>>> example
>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>
>>>>>>>> To be able to remove double_int in favor of wide_int requires _at =
least_
>>>>>>>> templating wide_int on 'len' and providing specializations for 1 a=
nd 2.
>>>>>>>>
>>>>>>>> It might be a non-issue for math that operates on trees or RTXen d=
ue to
>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>> important
>>>>>>>> paths away from using tree math to using double_ints _for speed re=
asons_.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>
>>>>>>> i do not know why you believe this about the speed.     double int =
always
>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>
>>>>>>> the thing about wide int, is that since it does math to the precisi=
on's
>>>>>>> size, it almost never does uses synthetic operations since the size=
s for
>>>>>>> almost every instance can be done using the native math on the mach=
ine.
>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>> natively.
>>>>>>> I seriously doubt that you are going to do TI mode math much faster=
 than i
>>>>>>> do it and if you do who cares.
>>>>>>>
>>>>>>> the number of calls does not effect the performance in any
>>>>>>> negative way and
>>>>>>> it fact is more efficient since common things that require more tha=
n one
>>>>>>> operation in double in are typically done in a single operation.
>>>>>>
>>>>>> Simple double-int operations like
>>>>>>
>>>>>> inline double_int
>>>>>> double_int::and_not (double_int b) const
>>>>>> {
>>>>>>   double_int result;
>>>>>>   result.low =3D low & ~b.low;
>>>>>>   result.high =3D high & ~b.high;
>>>>>>   return result;
>>>>>> }
>>>>>>
>>>>>> are always going to be faster than conditionally executing only one
>>>>>> operation
>>>>>> (but inside an offline function).
>>>>>
>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>> appropriate here.
>>>>>
>>>>> It's interesting that you gave this example, since before you were
>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>> removed in favour of separate and() and not() operations, but why
>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>
>>>> I was more concerned about fused operations that use precision
>>>> or bitsize as input.  That is for example
>>>>
>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>> +  bool only_sign_bit_p () const;
>>>>
>>>> The first is construct a wide-int with precision prec (and sign- or
>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>> should not be necessary and existing callers should be questioned
>>>> instead of introducing it.
>>>>
>>>> In fact wide-int seems to have so many "fused" operations that
>>>> we run out of sensible recognizable names for them.  Which results
>>>> in a lot of confusion on what the functions actually do (at least for =
me).
>>>
>>> Well, I suppose I can't really say anything useful either way on
>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>
>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>> of what's fused and what isn't.
>>>>
>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>> operations introduced separately together with the places they are
>>>> used.  In the current way it's way too tedious to go over all of them
>>>> and match them with callers, lookup enough context and then
>>>> make up my mind on whether the caller should do sth different or not.
>>>>
>>>> Thus, consider the big initial API a reason that all this review takes
>>>> so long ...
>>>>
>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>> routines that pull out the integer representation again and create a
>>>>> tree to hold the result, is going to be much slower than using either
>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>> measurable difference between double_int and wide_int here.
>>>>>
>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>> wide integer really shouldn't have any significance.
>>>>>
>>>>> Your main complaint seems to be that the wide_int API is different
>>>>> from the double_int one, but we can't literally use the same API, sin=
ce
>>>>> double_int has an implicit precision and bitsize, and wide_int doesn'=
t.
>>>>> Having a precision that is separate from the underlying representation
>>>>> is IMO the most important feature of wide_int, so:
>>>>>
>>>>>    template wide_int<2> double_int;
>>>>>
>>>>> is never going to be a drop-in, API-compatible replacement for double=
_int.
>>>>
>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>> you have a double_int<N> class.
>>>
>>> But you don't!  Because...
>>>
>>>> Thus wide-int should have a base
>>>> of that kind and just add precision / bitsize ontop of that.  It would=
n't
>>>> be a step forward if we end up replacing double_int uses with
>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>> would it?
>>>
>>> ...the precision and bitsize isn't an optional extra, either conceptual=
ly
>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>> the hood, but the value of N is an internal implementation detail.
>>> No operations are done to N HWIs, they're done to the number of bits
>>> in the operands.  Whereas a double_int<N> class does everything to N HW=
Is.
>>
>> If that's the only effect then either bitsize or precision is redundant =
(and
>> we also have len ...).  Note I did not mention len above, thus the base
>> class would retain 'len' and double-int would simply use 2 for it
>> (if you don't template it but make it variable).
>
> But that means that wide_int has to model a P-bit operation as a
> "normal" len*HOST_WIDE_INT operation and then fix up the result
> after the fact, which seems unnecessarily convoluted.

It does that right now.  The operations are carried out in a loop
over len HOST_WIDE_INT parts, the last HWI is then special-treated
to account for precision/size.  (yes, 'len' is also used as optimization - =
the
fact that len ends up being mutable is another thing I dislike about
wide-int.  If wide-ints are cheap then all ops should be non-mutating
(at least to 'len')).

>  I still don't
> see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
> X*HOST_WIDE_INT operation for any X) has any special meaning.

Well, the same reason as a HOST_WIDE_INT variable has a meaning.
We use it to constrain what we (efficiently) want to work on.  For example
CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
doing bit-constant-propagation in loops (for TImode integers on a x86_64 ho=
st).

Oh, and I don't necessary see a use of double_int in its current form
but for an integer representation on the host that is efficient to manipula=
te
integer constants of a target dependent size.  For example the target
detail that we have partial integer modes with bitsize > precision and that
the bits > precision appearantly have a meaning when looking at the
bit-representation of a constant should not be part of the base class
of wide-int (I doubt it belongs to wide-int at all, but I guess you know mo=
re
about the reason we track bitsize in addition to precision - I think it's
abstraction at the wrong level, the tree level does fine without knowing
about bitsize).

Richard.

> Richard