From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-330579-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13630 invoked by alias); 31 Oct 2012 12:22:40 -0000
Received: (qmail 13613 invoked by uid 22791); 31 Oct 2012 12:22:37 -0000
X-SWARE-Spam-Status: No, hits=-3.8 required=5.0	tests=AWL,BAYES_00,DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,KHOP_RCVD_TRUST,NML_ADSP_CUSTOM_MED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE
X-Spam-Check-By: sourceware.org
Received: from mail-ee0-f47.google.com (HELO mail-ee0-f47.google.com) (74.125.83.47)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 31 Oct 2012 12:22:21 +0000
Received: by mail-ee0-f47.google.com with SMTP id t10so792220eei.20        for <gcc-patches@gcc.gnu.org>; Wed, 31 Oct 2012 05:22:20 -0700 (PDT)
Received: by 10.14.184.2 with SMTP id r2mr84556966eem.43.1351686140056;        Wed, 31 Oct 2012 05:22:20 -0700 (PDT)
Received: from sandifor-thinkpad.stglab.manchester.uk.ibm.com (gbibp9ph1--blueice2n1.emea.ibm.com. [195.212.29.75])        by mx.google.com with ESMTPS id f3sm7597534eeo.13.2012.10.31.05.22.18        (version=TLSv1/SSLv3 cipher=OTHER);        Wed, 31 Oct 2012 05:22:19 -0700 (PDT)
From: Richard Sandiford <rdsandiford@googlemail.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,Kenneth Zadeck <zadeck@naturalbridge.com>,  Mike Stump <mikestump@comcast.net>,  gcc-patches <gcc-patches@gcc.gnu.org>,  Lawrence Crowl <crowl@google.com>, rdsandiford@googlemail.com
Cc: Kenneth Zadeck <zadeck@naturalbridge.com>,  Mike Stump <mikestump@comcast.net>,  gcc-patches <gcc-patches@gcc.gnu.org>,  Lawrence Crowl <crowl@google.com>
Subject: Re: patch to fix constant math - 4th patch - the wide-int class.
References: <506C72C7.7090207@naturalbridge.com>	<87ehldi2kr.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc0B27i23X8Sg+=yOd_SqKedB3RunBDs4b-Jm-fSAD-hwA@mail.gmail.com>	<87a9w1hzq1.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc0Sj3+yBXRjLUG=0=_NGQiU1VqJk35hjsWqHR9bCnqspg@mail.gmail.com>	<506F0C1A.5010705@naturalbridge.com>	<87lifkhlo9.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<506F5B50.2040800@naturalbridge.com>	<506F63CC.40507@naturalbridge.com>	<50705457.5030505@naturalbridge.com>	<5072BAD1.5080701@naturalbridge.com> <87txu4n41o.fsf@talisman.home>	<50743E2B.6000104@naturalbridge.com>	<CAFiYyc2hQLUNzei2_cFmfDqV8GQORAMSwehrS_HEqg+r7j9Wjw@mail.gmail.com>	<CAFiYyc1=8=LffiZ=EDBOy_uzn_ARdXk1OWxT=abYd8ot+iFp5Q@mail.gmail.com>	<50891AAC.8090301@naturalbridge.com>	<CAFiYyc15kmhRWhN3tsZqJDbZ5Uh6tVa45ssiYdsytLEfqaZ4zw@mail.gmail.com>	<87y5im3orb.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc2buJtu8RMKnLnvvb-A2=aYwopO+RBLPO6iJ3gKnq-hvg@mail.gmail.com>	<87pq3y3kyk.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>	<CAFiYyc3NjOxpQ-Y9GDrQOET+dc3LXWuiuM=DxqmyASE8urRoWw@mail.gmail.com>
Date: Wed, 31 Oct 2012 12:23:00 -0000
In-Reply-To: <CAFiYyc3NjOxpQ-Y9GDrQOET+dc3LXWuiuM=DxqmyASE8urRoWw@mail.gmail.com>	(Richard Biener's message of "Wed, 31 Oct 2012 13:11:53 +0100")
Message-ID: <87hapa3k7a.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-10/txt/msg02906.txt.bz2

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>> <rdsandiford@googlemail.com> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>
>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>
>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
>>>>>>> <mikestump@comcast.net> wrote:
>>>>>>>>
>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guenther@gmai=
l.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>> <zadeck@naturalbridge.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>
>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>
>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byt=
e mode
>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>
>>>>>>>>>> I do not believe that this can happen.  The core compiler
>>>>>>>>>> includes all
>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>
>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>
>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes =
that
>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>> bigger, we'd
>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>
>>>>>>>>>>> I still would like to have the ability to provide specializatio=
ns of
>>>>>>>>>>> wide_int
>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>> templated
>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> shoul=
d be
>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>
>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>
>>>>>>>>>> If you want to go down this path after the patches get in, go fo=
r it.
>>>>>>>>>> I
>>>>>>>>>> see no use at all for this.
>>>>>>>>>> This was not meant to be a plug in replacement for double int. T=
his
>>>>>>>>>> goal of
>>>>>>>>>> this patch is to get the compiler to do the constant math the wa=
y that
>>>>>>>>>> the
>>>>>>>>>> target does it.   Any such instantiation is by definition placin=
g some
>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>
>>>>>>>>> Well, what I don't really like is that we now have two implementa=
tions
>>>>>>>>> of functions that perform integer math on two-HWI sized integers.=
  What
>>>>>>>>> I also don't like too much is that we have two different interfac=
es to
>>>>>>>>> operate
>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especiall=
y the
>>>>>>>>> latter =E2=80=A6
>>>>>>>>
>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int =
is a
>>>>>>>> waste of time, as the time is better spent removing double-int fro=
m the
>>>>>>>> compiler.  All the necessary semantics and code of double-int _has=
_ been
>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to=
 vend
>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>> then all the
>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>> must then
>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>> double-int; it is
>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change=
 that.
>>>>>>>
>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>> get rid of it - you
>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you =
for
>>>>>>> example
>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>
>>>>>>> To be able to remove double_int in favor of wide_int requires _at l=
east_
>>>>>>> templating wide_int on 'len' and providing specializations for 1 an=
d 2.
>>>>>>>
>>>>>>> It might be a non-issue for math that operates on trees or RTXen du=
e to
>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>> important
>>>>>>> paths away from using tree math to using double_ints _for speed rea=
sons_.
>>>>>>>
>>>>>>> Richard.
>>>>>>
>>>>>> i do not know why you believe this about the speed.     double int a=
lways
>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>
>>>>>> the thing about wide int, is that since it does math to the precisio=
n's
>>>>>> size, it almost never does uses synthetic operations since the sizes=
 for
>>>>>> almost every instance can be done using the native math on the machi=
ne.
>>>>>> almost every call has a check to see if the operation can be done
>>>>>> natively.
>>>>>> I seriously doubt that you are going to do TI mode math much faster =
than i
>>>>>> do it and if you do who cares.
>>>>>>
>>>>>> the number of calls does not effect the performance in any
>>>>>> negative way and
>>>>>> it fact is more efficient since common things that require more than=
 one
>>>>>> operation in double in are typically done in a single operation.
>>>>>
>>>>> Simple double-int operations like
>>>>>
>>>>> inline double_int
>>>>> double_int::and_not (double_int b) const
>>>>> {
>>>>>   double_int result;
>>>>>   result.low =3D low & ~b.low;
>>>>>   result.high =3D high & ~b.high;
>>>>>   return result;
>>>>> }
>>>>>
>>>>> are always going to be faster than conditionally executing only one
>>>>> operation
>>>>> (but inside an offline function).
>>>>
>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>> appropriate here.
>>>>
>>>> It's interesting that you gave this example, since before you were
>>>> complaining about too many fused ops.  Clearly this one could be
>>>> removed in favour of separate and() and not() operations, but why
>>>> not provide a fused one if there are clients who'll make use of it?
>>>
>>> I was more concerned about fused operations that use precision
>>> or bitsize as input.  That is for example
>>>
>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>> +  bool only_sign_bit_p () const;
>>>
>>> The first is construct a wide-int with precision prec (and sign- or
>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>> should not be necessary and existing callers should be questioned
>>> instead of introducing it.
>>>
>>> In fact wide-int seems to have so many "fused" operations that
>>> we run out of sensible recognizable names for them.  Which results
>>> in a lot of confusion on what the functions actually do (at least for m=
e).
>>
>> Well, I suppose I can't really say anything useful either way on
>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>
>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>> of what's fused and what isn't.
>>>
>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>> operations introduced separately together with the places they are
>>> used.  In the current way it's way too tedious to go over all of them
>>> and match them with callers, lookup enough context and then
>>> make up my mind on whether the caller should do sth different or not.
>>>
>>> Thus, consider the big initial API a reason that all this review takes
>>> so long ...
>>>
>>>> The speed problem we had using trees for internal arithmetic isn't
>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>> Allocating and constructing tree objects to hold temporary values,
>>>> storing an integer representation in it, then calling tree arithmetic
>>>> routines that pull out the integer representation again and create a
>>>> tree to hold the result, is going to be much slower than using either
>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>> measurable difference between double_int and wide_int here.
>>>>
>>>> I still see no reason to keep double_int around.  The width of a host
>>>> wide integer really shouldn't have any significance.
>>>>
>>>> Your main complaint seems to be that the wide_int API is different
>>>> from the double_int one, but we can't literally use the same API, since
>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>> Having a precision that is separate from the underlying representation
>>>> is IMO the most important feature of wide_int, so:
>>>>
>>>>    template wide_int<2> double_int;
>>>>
>>>> is never going to be a drop-in, API-compatible replacement for double_=
int.
>>>
>>> My reasoning was that if you strip wide-int of precision and bitsize
>>> you have a double_int<N> class.
>>
>> But you don't!  Because...
>>
>>> Thus wide-int should have a base
>>> of that kind and just add precision / bitsize ontop of that.  It wouldn=
't
>>> be a step forward if we end up replacing double_int uses with
>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>> would it?
>>
>> ...the precision and bitsize isn't an optional extra, either conceptually
>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>> the hood, but the value of N is an internal implementation detail.
>> No operations are done to N HWIs, they're done to the number of bits
>> in the operands.  Whereas a double_int<N> class does everything to N HWI=
s.
>
> If that's the only effect then either bitsize or precision is redundant (=
and
> we also have len ...).  Note I did not mention len above, thus the base
> class would retain 'len' and double-int would simply use 2 for it
> (if you don't template it but make it variable).

But that means that wide_int has to model a P-bit operation as a
"normal" len*HOST_WIDE_INT operation and then fix up the result
after the fact, which seems unnecessarily convoluted.  I still don't
see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
X*HOST_WIDE_INT operation for any X) has any special meaning.

Richard