From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26800 invoked by alias); 3 Apr 2013 16:16:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26762 invoked by uid 89); 3 Apr 2013 16:16:28 -0000 Received: from mail-ie0-f169.google.com (HELO mail-ie0-f169.google.com) (209.85.223.169) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 03 Apr 2013 16:16:28 +0000 Received: by mail-ie0-f169.google.com with SMTP id qd14so1910405ieb.28 for ; Wed, 03 Apr 2013 09:16:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-gm-message-state; bh=BU4eFagtkOYOwj8RcSsBRzEusb8LF+NGY67OXvNS+Mo=; b=o+JlR36FP7lcOjY+ycRLvHgHInenqb+9YZlAe2TFvjrHR5IPt6wXqr6ipObU1ksczZ OjXtE4qctHrRGEJN9NlfF0AIpZbla/TdSqgYWJigIRGY1R8GmBklxKd9CQHmDjAKjffP SBgKl6qLNgfLuph6LobN9cDKXz4NijkbsLJSrDkK5xkCBi5RhhuUZR+2f17mBRNt8ktI 9Zjtoj3DtEgah8lfRjTR004sMl5LN1zklf7MK8l5KbzU3EFF3UH/LxjMSZ8PgsCewSFj VA9yxAAsXLSs8LdX6BMa60Mk9vP+Qloe5J9fjUOy0qEBrdbyOCCB9MkM6b6JUrmN9B1U f9Pw== X-Received: by 10.50.114.37 with SMTP id jd5mr1552680igb.2.1365005786621; Wed, 03 Apr 2013 09:16:26 -0700 (PDT) Received: from ?IPv6:2001:468:913:2044:fc2a:2b36:7614:67b8? ([2001:468:913:2044:fc2a:2b36:7614:67b8]) by mx.google.com with ESMTPS id s8sm5056348igs.0.2013.04.03.09.16.25 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 03 Apr 2013 09:16:26 -0700 (PDT) Message-ID: <515C55D7.7020003@naturalbridge.com> Date: Wed, 03 Apr 2013 19:18:00 -0000 From: Kenneth Zadeck User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Richard Biener CC: Mike Stump , gcc-patches , Lawrence Crowl , rdsandiford@googlemail.com, Ian Lance Taylor Subject: Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1 References: <506C72C7.7090207@naturalbridge.com> <50891AAC.8090301@naturalbridge.com> <87y5im3orb.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <87pq3y3kyk.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com> <50912D85.1070002@naturalbridge.com> <5091331C.3030504@naturalbridge.com> <512D686B.90000@naturalbridge.com> <515B16EB.5020303@naturalbridge.com> <515C1AFB.3080105@naturalbridge.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlt7kU9qqgPcVcmGtjCagk94oT/G3kxNmW7WI0VpHkQEK/iYKslELqeK+jP6p53dVzqC6kN X-SW-Source: 2013-04/txt/msg00195.txt.bz2 On 04/03/2013 09:53 AM, Richard Biener wrote: > On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck wrote: >> On 04/03/2013 05:17 AM, Richard Biener wrote: >> >>> In the end you will have a variable-size storage in TREE_INT_CST thus >>> you will have at least to emit _code_ copying over meta-data and data >>> from the tree representation to the wide-int (similar for RTX >>> CONST_DOUBLE/INT). >>> I'm objecting to the amount of code you emit and agree that the runtime >>> cost is copying the meta-data (hopefully optimizable via CSE / SRA) >>> and in most cases one (or two) iterations of the loop copying the data >>> (not optimizable). >> i did get rid of the bitsize in the wide-int patch so at this point the meta >> data is the precision and the len. >> not really a lot here. As usual we pay a high price in gcc for not pushing >> the tree rep down into the rtl level, then it would have been acceptable to >> have the tree type bleed into the wide-int code. >> >> >> >>>> 2) You present this as if the implementor actually should care about the >>>> implementation and you give 3 alternatives: the double_int, the current >>>> one, and HWI. We have tried to make it so that the client should not >>>> care. Certainly in my experience here, I have not found a place to >>>> care. >>> Well, similar as for the copying overhead for tree your approach requires >>> overloading operations for HOST_WIDE_INT operands to be able to >>> say wi + 1 (which is certainly desirable), or the overhead of using >>> wide_int_one (). >>> >>>> In my opinion double_int needs to go away. That is the main thrust of my >>>> patches. There is no place in a compiler for an abi that depends on >>>> constants fitting into 2 two words whose size is defined by the host. >>> That's true. I'm not arguing to preserve "double-int" - I'm arguing to >>> preserve a way to ask for an integer type on the host with (at least) >>> N bits. Almost all double-int users really ask for an integer type on the >>> host that has at least as many bits as the pointer representation (or >>> word_mode) on >>> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer >>> targets). No double-int user specifically wants 2 * HOST_WIDE_INT >>> precision - that is just what happens to be there. Thus I am providing >>> a way to say get me a host integer with at least N bits (VRP asks for >>> this, for example). >>> >>> What I was asking for is that whatever can provide the above should share >>> the functional interface with wide-int (or the othert way around). And I >>> was claiming that wide-int is too fat, because current users of double-int >>> eventually store double-ints permanently. >> The problem is that, in truth, double int is too fat. 99.something% of all >> constants fit in 1 hwi and that is likely to be true forever (i understand >> that tree vpn may need some thought here). The rtl level, which has, for as >> long as i have known it, had 2 reps for integer constants. So it was >> relatively easy to slide the CONST_WIDE_INT in. It seems like the right >> trickery here rather than adding a storage model for wide-ints might be a >> way to use the c++ to invisibly support several (and by "several" i really >> mean 2) classes of TREE_CSTs. > The truth is that _now_ TREE_INT_CSTs use double-ints and we have > CONST_INT and CONST_DOUBLE. What I (and you) propose would > get us to use variable-size storage for both, allowing to just use a single > HOST_WIDE_INT in the majority of cases. In my view the constant > length of the variable-size storage for TREE_INT_CSTs is determined > by its type (thus, it doesn't have "optimized" variable-size storage > but "unoptimized" fixed-size storage based on the maximum storage > requirement for the type). Similar for RTX CONST_INT which would > have fixed-size storage based on the mode-size of the constant. > Using optimized space (thus using the encoding properties) requires you > to fit the 'short len' somewhere which possibly will not pay off in the end > (for tree we do have that storage available, so we could go with optimized > storage for it, not sure with RTL, I don't see available space there). There are two questions here: one is the fact that you object to the fact that we represent small constants efficiently and the second is that we take advantage of the fact that fixed size stack allocation is effectively free for short lived objects like wide-ints (as i use them). At the rtl level your idea does not work. rtl constants do not have a mode or type. So if you do not compress, how are you going to determine how many words you need for the constant 1. I would love to have a rep that had the mode in it. But it is a huge change that requires a lot of hacking to every port. I understand that this makes me vulnerable to the argument that we should not let the rtl level ever dictate anything about the tree level, but the truth is that a variable len rep is almost always used for big integers. In our code, most constants of large types are small numbers. (Remember i got into this because the tree constant prop thinks that left shifting any number by anything greater than 128 is always 0 and discovered that that was just the tip of the iceberg.) But mostly i support the decision to canonize numbers to the smallest number of HWIs because most of the algorithms to do the math can be short circuited. I admit that if i had to effectively unpack most numbers to do the math, that the canonization would be a waste. However, this is not really relevant to this conversation. Yes, you could get rid of the len, but this such a small part of picture. Furthermore, I am constrained at the rtl level because it is just too dirty to share the storage. We tried that and the amount of whack a mole we were playing was killing us. I am comfortable making big changes at the portable level because i can easily test them, but changes to fundamental data structures inside every port is more than i am willing to do. If you are going to do that, then you are going to have start giving rtl constants modes and that is a huge change (that while desirable i cannot do). At the tree level you could share the storage, I admit that that is easy, i just do not think that it is desirable. The stack allocation inside the wide-int is very cheap and the fact that the number that comes out is almost always one hwi makes this very efficient. Even if I was starting from scratch this would be a strong contender to be the right representation. Fundamentally, i do not believe that the amount of copying that i am proposing at the tree level will be significant. Yes, if you force me to use an uncompressed format you can make what i propose expensive. >>>> This is not a beauty contest argument, we have public ports are beginning >>>> to >>>> use modes that are larger than two x86-64 HWIs and i have a private port >>>> that has such modes and it is my experience that any pass that uses this >>>> interface has one of three behaviors: it silently gets the wrong answer, >>>> it >>>> ices, or it fails to do the transformation. If we leave double_int as an >>>> available option, then any use of it potentially will have one of these >>>> three behaviors. And so one of my strong objections to this direction is >>>> that i do not want to fight this kind of bug for the rest of my life. >>>> Having a single storage model that just always works is in my opinion a >>>> highly desirable option. What you have never answered in a concrete >>>> manner >>>> is, if we decide to provide this generality, what it would be used for. >>>> There is no place in a portable compiler where the right answer for every >>>> target is two HOST wide integers. >>>> >>>> However, i will admit that the HWI option has some merits. We try to >>>> address this in our implementation by dividing what is done inline in >>>> wide-int.h to the cases that fit in an HWI and then only drop into the >>>> heavy >>>> code in wide-int.c if mode is larger (which it rarely will be). >>>> However, a >>>> case could be made that for certain kinds of things like string lengths >>>> and >>>> such, we could use another interface or as you argue, a different storage >>>> model with the same interface. I just do not see that the cost of the >>>> conversion code is really going to show up on anyone's radar. >>> What's the issue with abstracting away the model so a fixed-size 'len' >>> is possible? (let away the argument that this would easily allow an >>> adaptor to tree) >> I have a particularly pessimistic perspective because i have already written >> most of this patch. It is not that i do not want to change that code, it >> is that i have seen a certain set of mistakes that were made and i do not >> want to fix them more than once. At the rtl level you can see the >> transition from only supporting 32 bit ints to supporting 64 bit its to >> finally supporting two HWIs and that transition code is not pretty. My rtl >> patch fixes the places where an optimization was only made if the data type >> was 32 bits or smaller as well as the places where the optimization was made >> only if the data type is smaller than 64 bits (in addition to fixing all of >> the places where the code ices or simply gets the wrong answer if it is >> larger than TImode.) The tree level is only modestly better, I believe only >> because it is newer. I have not seen any 32 bit only code, but it is >> littered with transformations that only work for 64 bits. What is that 64 >> bit only code going to look like in 5 years? > The user interface of wide-int does not depend on whether a storage model > is abstracted or not. If you take advantage of the storage model by > making its interface leaner then it will. But I guess converting everything > before settling on the wide-int interface may not have been the wisest > choice in the end (providing a wide-int that can literally replace double-int > would have got you testing coverage without any change besides > double-int.[ch] and wide-int.[ch]). I think that it was exactly the correct decision. We have made significant changes to the structure as we have gone along. I have basically done most of what you have suggested, (the interface is completely functional, i got rid of the bitsize...) What you are running into is that mike stump, richard sandiford and myself actually believe that the storage model is a fundamentally bad idea. >> I want to make it easier to write the general code than to write the code >> that only solves the problem for the size port that the implementor is >> currently working on. So I perceive the storage model as a way to keep >> having to fight this battle forever because it will allow the implementor to >> make a decision that the optimization only needs to be done for a particular >> sized integer. >> >> However, i get the fact that from your perspective, what you really want is >> a solution to the data structure problem in tree-vrp. > No, that's just a convenient example. What I really want is a wide-int > that is less visibly a replacement for CONST_DOUBLE. I think that that is unfair. Variable length reps are the standard technique for doing wide math. I am just proposing using data structures that are common best practices in the rest of the world and adapting them so that they match the gcc world better than just hacking in a gmp interface. > >> My patch for tree vrp >> scans the entire function to find the largest type used in the function and >> then does all of the math at 2x that size. But i have to admit that i >> thought it was weird that you use tree cst as your long term storage. If >> this were my pass, i would have allocated a 2d array of some type that was >> as large as the function in 1d and twice as large as the largest int used in >> the other dimension and not overloaded tree-cst and then had a set of friend >> functions in double int to get in and out. Of course you do not need friends >> in double int because the rep is exposed, but in wide-int that is now hidden >> since it now is purely functional. >> >> I just have to believe that there is a better way to do tree-vrp than >> messing up wide-int for the rest of the compiler. > It's not "messing up", it's making wide-int a generally useful thing and > not tying it so closely to RTL. Again, this is unfair. > >>>> 3) your trick will work at the tree level, but not at the rtl level. >>>> The >>>> wide-int code cannot share storage with the CONST_INTs. We tried this, >>>> and there are a million bugs that would have to be fixed to make it work. >>>> It could have worked if CONST_INTs had carried a mode around, but since >>>> they >>>> do not, you end up with the same CONST_INT sharing the rep for several >>>> different types and that just did not work unless you are willing to do >>>> substantial cleanups. >>> I don't believe you. I am only asking for the adaptors to tree and RTL to >>> work in an RVALUE-ish way (no modification, as obviously RTL and tree >>> constants may be shared). I think your claim is because you have that >>> precision and bitsize members in your wide-int which I believe is a >>> design red herring. I suppose we should concentrate on addressing that >>> one first. Thus, let me repeat a few questions on your design to >>> eventually >>> let you understand my concerns: >>> >>> Given two wide-ints, a and b, what precision will the result of >>> a + b >>> have? Consider a having precision 32 and b having precision 64 >>> on a 32-bit HWI host. >>> >>> You define wide-int to have no sign information: >>> >>> + The representation does not contain any information about >>> + signedness of the represented value, so it can be used to represent >>> + both signed and unsigned numbers. For operations where the results >>> + depend on signedness (division, comparisons), the signedness must >>> + be specified separately. For operations where the signness >>> + matters, one of the operands to the operation specifies either >>> + wide_int::SIGNED or wide_int::UNSIGNED. >>> >>> but the result of operations on mixed precision operands _does_ depend >>> on the sign, nevertheless most operations do not get a signedness >>> argument. >>> Nor would that help, because it depends on the signedness of the >>> individual >>> operands! >>> >>> double-int get's around this by having a common "precision" to which >>> all smaller precision constants have to be sign-/zero-extended. So >>> does CONST_INT and CONST_DOUBLE. >>> >>> Note that even with same precision you have introduced the same problem >>> with the variable len. >>> >>> My proposal is simple - all wide-ints are signed! wide-int is basically >>> an arbitrary precision signed integer format. The sign is encoded in >>> the most significant bit of the last active HWI of the representation >>> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)). All values >>> with less precision than len * HOST_BITS_PER_WIDE_INT are >>> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT. >>> >>> This let's you define mixed len operations by implicitely >>> sign-/zero-extending >>> the operands to whatever len is required for the operation. >>> >>> Furthermore it allows you to get rid of the precision member (and >>> bitsize anyway). >>> Conversion from tree / RTL requires information on the signedness of the >>> input (trivial for tree, for RTL all constants are signed - well, >>> sign-extended). >>> Whenever you want to transfer the wide-int to tree / RTL you have to >>> sign-/zero-extend according to the desired precision. If you need sth >>> else >>> than arbitrary precision arithmetic you have to explicitely truncate / >>> extend >>> at suitable places - with overflow checking being trivial here. For >>> optimization >>> purposes selected operations may benefit from a combined implementation >>> receiving a target precision and signedness. Whatever extra meta-data >>> RTL requires does not belong in wide-int but in the RTX. Trivially >>> a mode comes to my mind (on tree we have a type), and trivially >>> each RTX has a mode. And each mode has a precision and bitsize. >>> It lacks a sign, so all RTL integer constants are sign-extended for >>> encoding efficiency purposes. mixed-mode operations will not >>> occur (mixed len operations will), mixed-mode ops are exclusively >>> sign-/zero-extensions and truncations. >>> >>> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily >>> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any >>> precision would be { -1 }. >>> >>> That was my proposal. Now, can you please properly specify yours? > And you chose to not answer that fundamental question of how your > wide-int is _supposed_ to work? Ok, maybe I shouldn't have distracted > you with the bits before this. sorry, i missed this question by accident. we did not do infinite precision by design. We looked at the set of programming languages that gcc either compiles or might compile and we looked at the set of machines that gcc targets and neither of these two sets of entities define their operations in terms of infinite precision math. They always do math within a particular precision. Scripting languages do use infinite precision but gcc does not compile any of them. So unless you are going to restrict your set of operations to those that satisfy the properties of a ring, it is generally not strictly safe to do your math in infinite precision. we do all of the math in the precision defined by the types or modes that are passed in. constants are defined to be the size of the precision unless they can be compressed. The len field tells how many words are actually needed to exactly represent the constant if (len-1)*HOST_WIDE_BITS_PER_WIDE_INT is less than the precision. The decompression process is to add a number of words to the high order side of the number. These words must contain a zero if the highest represented bit is 0 and -1 if the highest represented bit is 1. i.e. this looks a lot like sign extension, but the numbers them selves are not inherently signed or unsigned as in your representation. It is up to the operators to imply the signess. So the unsigned multiply operation is different than the signed multiply operation. But these are applied to the bits without an interpretation of their sign. > Richard. > >>> Thanks, >>> Richard. >>> >>>> On 04/02/2013 11:04 AM, Richard Biener wrote: >>>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck >>>>> wrote: >>>>>> This patch contains a large number of the changes requested by Richi. >>>>>> It >>>>>> does not contain any of the changes that he requested to abstract the >>>>>> storage layer. That suggestion appears to be quite unworkable. >>>>> I of course took this claim as a challenge ... with the following >>>>> result. >>>>> It is >>>>> of course quite workable ;) >>>>> >>>>> The attached patch implements the core wide-int class and three storage >>>>> models (fixed size for things like plain HWI and double-int, variable >>>>> size >>>>> similar to how your wide-int works and an adaptor for the double-int as >>>>> contained in trees). With that you can now do >>>>> >>>>> HOST_WIDE_INT >>>>> wi_test (tree x) >>>>> { >>>>> // template argument deduction doesn't do the magic we want it to do >>>>> // to make this kind of implicit conversions work >>>>> // overload resolution considers this kind of conversions so we >>>>> // need some magic that combines both ... but seeding the overload >>>>> // set with some instantiations doesn't seem to be possible :/ >>>>> // wide_int<> w = x + 1; >>>>> wide_int<> w; >>>>> w += x; >>>>> w += 1; >>>>> // template argument deduction doesn't deduce the return value type, >>>>> // not considering the template default argument either ... >>>>> // w = wi (x) + 1; >>>>> // we could support this by providing rvalue-to-lvalue promotion >>>>> // via a traits class? >>>>> // otoh it would lead to sub-optimal code anyway so we should >>>>> // make the result available as reference parameter and only support >>>>> // wide_int <> res; add (res, x, 1); ? >>>>> w = wi (x).operator+ >(1); >>>>> wide_int<>::add(w, x, 1); >>>>> return w.to_hwi (); >>>>> } >>>>> >>>>> we are somewhat limited with C++ unless we want to get really fancy. >>>>> Eventually providing operator+ just doesn't make much sense for >>>>> generic wide-int combinations (though then the issue is its operands >>>>> are no longer commutative which I think is the case with your wide-int >>>>> or double-int as well - they don't suport "1 + wide_int" for obvious >>>>> reasons). >>>>> >>>>> So there are implementation design choices left undecided. >>>>> >>>>> Oh, and the operation implementations are crap (they compute nonsense). >>>>> >>>>> But you should get the idea. >>>>> >>>>> Richard. >>>>