From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id F0F543858C78 for ; Fri, 29 Sep 2023 09:49:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F0F543858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id C9C0A1F38A; Fri, 29 Sep 2023 09:49:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695980946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sUk2q3lB3hLbuRRZmTMxGBHH3HZo1d7ehA9Lqb97k+s=; b=fzakYP8EdJZMArV6gpCz/AD3dZXz9RiCgIz9d6pa1SC+XCel+VPiknF/38ALgpLEr3FwRI tNBMgYlma+bXZNA6kRZA1QUTa44jpLpa3bgjAIYJKRJ19xYTwZAzIjnLoaofm/xUKOJ11E svvQfIiKMiI146JshU1GTS25mYReT9k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695980946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sUk2q3lB3hLbuRRZmTMxGBHH3HZo1d7ehA9Lqb97k+s=; b=9f1W97oIssGiUXc9GTji2uw7ICTrjEmNz2VtpoQfIlrP04/QFBQ2LC3wzSve8rSVBwvyg3 fALlZdzgBFAaIMAg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A51762C142; Fri, 29 Sep 2023 09:49:06 +0000 (UTC) Date: Fri, 29 Sep 2023 09:49:06 +0000 (UTC) From: Richard Biener To: Jakub Jelinek cc: Richard Sandiford , Aldy Hernandez , gcc-patches@gcc.gnu.org Subject: Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 28 Sep 2023, Jakub Jelinek wrote: > Hi! > > On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote: > > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote: > > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int > > > > is really trying to be poor-mans GMP by limiting the maximum precision. > > > > > > I'd characterise widest_int as "a wide_int that is big enough to hold > > > all supported integer types, without losing sign information". It's > > > not big enough to do arbitrary arithmetic without losing precision > > > (in the way that GMP is). > > > > > > If the new limit on integer sizes is 65535 bits for all targets, > > > then I think that means that widest_int needs to become a 65536-bit type. > > > (But not with all bits represented all the time, of course.) > > > > If the widest_int storage would be dependent on the len rather than > > precision for how it is stored, then I think we'd need a new method which > > would be called at the start of filling the limbs where we'd tell how many > > limbs there would be (i.e. what will set_len be called with later on), and > > do nothing for all storages but the new widest_int_storage. > > So, I've spent some time on this. While wide_int is in the patch a fixed/variable > number of limbs (aka len) storage depending on precision (precision > > WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is > inline), widest_int has always very large precision > (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed > limitation of 255 64-bit limbs) but uses inline array for length > corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses > similarly to wide_int a heap allocated array of limbs. > These changes make both wide_int and widest_int obviously non-POD, not > trivially default constructible, nor trivially copy constructible, trivially > destructible, trivially copyable, so not a good fit for GC and some vec > operations. > One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as > large _BitInt constants don't appear in RTL, we really don't need such large > precisions there. > So, for wide_int the patch introduces rwide_int, restricted wide_int, which > acts like the old wide_int (except that it is now trivially default > constructible and has assertions precision isn't set above > WIDE_INT_MAX_PRECISION). > For widest_int, the nastiness is that because it always has huge precision > of 16320 right now, > a) we need to be told upfront in wide-int.h before calling the large > value internal functions in wide-int.cc how many elements we'll need for > the result (some reasonable upper estimate is fine) > b) various of the wide-int.cc functions were lazy and assumed precision is > small enough and often used up to that many elements, which is > undesirable; so, it now tries to decreas that and use xi.len etc. based > estimates instead if possible (sometimes only if precision is above > WIDE_INT_MAX_PRECISION) > c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or > unsigned division with dividend having most significant bit set in > widest_int - while such values were considered to be above or equal to > 1 << (WIDE_INT_MAX_PRECISION - 2), now they are with > WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int > is I think only done in ccp and I'd strongly hope that we treat the > values as unsigned and so usually much smaller length; so it is just > when we call wi::lrshift (-1, 2) or similar that results change. > I've noticed that for wide_int or widest_int references even simple > operations like eq_p liked to allocate and immediately free huge buffers, > which was caused by wide_int doing allocation on creation with a particular > precision and e.g. get_binary_precision running into that. So, I've > duplicated that to avoid the allocations when all we need is just a > precision. > > The patch below doesn't actually build anymore since the vec.h asserts > (which point to useful stuff though), so temporarily I've applied it also > with > --- gcc/vec.h.xx 2023-09-28 12:56:09.055786055 +0200 > +++ gcc/vec.h 2023-09-28 13:15:31.760487111 +0200 > @@ -1197,7 +1197,7 @@ template > inline void > vec::qsort (int (*cmp) (const void *, const void *)) > { > - static_assert (vec_detail::is_trivially_copyable_or_pair ::value, ""); > +// static_assert (vec_detail::is_trivially_copyable_or_pair ::value, ""); > if (length () > 1) > gcc_qsort (address (), length (), sizeof (T), cmp); > } > @@ -1422,7 +1422,7 @@ template > void > gt_ggc_mx (vec *v) > { > - static_assert (std::is_trivially_destructible ::value, ""); > +// static_assert (std::is_trivially_destructible ::value, ""); > extern void gt_ggc_mx (T &); > for (unsigned i = 0; i < v->length (); i++) > gt_ggc_mx ((*v)[i]); > hack. The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on > widest_int vector (to be exact, swapping elements in the vector of For this (besides choosing a fixed smaller widest_int as indicated in the other mail) sorting could be done indirect by sorting a [0, 1, 2 ... n-1 ] vector instead. > And, now the question is what to do about this. I guess for omp_general > I could just use generic_wide_int > or > something similar, after all the widest_int wasn't really great when it > had maximum precision of WIDE_INT_MAX_PRECISION, different values on > different targets, it has very few uses and is easy to change (thinking > about this, makes me wonder what we do for offloading if offload host > has different WIDE_INT_MAX_PRECISION from offload target). > > But the more important question is what to do about loop/niters analysis. > I think for number of iteration analysis it might be ok to punt somehow > (if there is a way to tell that number of iterations is unknown) if we > get some bound which is too large to be expressible in some reasonably small > fixed precision (whether it is WIDE_INT_MAX_PRECISION, or something > different is a question). We could either introduce yet another widest_int > like storage which would have still WIDEST_INT_MAX_PRECISION precision, but > would ICE if length is set to something above its fixed width. One problem > is that the write_val estimations are often just conservatively larger and > could trigger even if the value fits in the end. Or we could use > generic_wide_int > (perhaps > call that rwidest_int), the drawback would be that it would be slightly harder > to use as it has different precision from widest_int, we'd need to do some > from on it or the like. Plus I really don't know the niters code to know > how to punt. I think when widest_int is no longer bound by something like the largest integer mode but now has to cater for arbitrary large _BitInt we have to get rid of widest_int or we have to make it variable-precision and reallocate it like auto_vec. For GC we can have the storage still heap allocated but of course CTOR/DTOR is going to be a pain (so better not use widest_int in GC). > ipa_bits is even worse, because unlike niter analysis, I think it is very > much desirable to support IPA VRP of all supported _BitInt sizes. Shall > we perhaps use trailing_wide_int storage in there, or conditionally > rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else? trailing_wide_int storage is the way to go here > What about slsr? This is after bitint lowering, so it shouldn't be > performing opts on larger BITINT_TYPEs and so could also go with the > rwidest_int. Just to say I don't really like adding another "widest" int, but slsr shouldn't need to GC any of that so widest_int should be fine? Richard.