From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HhHR=FN=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d])
	by sourceware.org (Postfix) with ESMTPS id F0F543858C78
	for <gcc-patches@gcc.gnu.org>; Fri, 29 Sep 2023 09:49:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F0F543858C78
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out2.suse.de (Postfix) with ESMTP id C9C0A1F38A;
	Fri, 29 Sep 2023 09:49:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1695980946; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=sUk2q3lB3hLbuRRZmTMxGBHH3HZo1d7ehA9Lqb97k+s=;
	b=fzakYP8EdJZMArV6gpCz/AD3dZXz9RiCgIz9d6pa1SC+XCel+VPiknF/38ALgpLEr3FwRI
	tNBMgYlma+bXZNA6kRZA1QUTa44jpLpa3bgjAIYJKRJ19xYTwZAzIjnLoaofm/xUKOJ11E
	svvQfIiKMiI146JshU1GTS25mYReT9k=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1695980946;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=sUk2q3lB3hLbuRRZmTMxGBHH3HZo1d7ehA9Lqb97k+s=;
	b=9f1W97oIssGiUXc9GTji2uw7ICTrjEmNz2VtpoQfIlrP04/QFBQ2LC3wzSve8rSVBwvyg3
	fALlZdzgBFAaIMAg==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id A51762C142;
	Fri, 29 Sep 2023 09:49:06 +0000 (UTC)
Date: Fri, 29 Sep 2023 09:49:06 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
cc: Richard Sandiford <richard.sandiford@arm.com>, 
    Aldy Hernandez <aldyh@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int
In-Reply-To: <ZRWHy66tysUqFyQQ@tucnak>
Message-ID: <nycvar.YFH.7.77.849.2309290931300.5561@jbgna.fhfr.qr>
References: <ZOywZN6bAUr/4nqK@tucnak> <nycvar.YFH.7.77.849.2308290940000.12935@jbgna.fhfr.qr> <mptmsya6sev.fsf@arm.com> <ZO4KQKgAa1e0BgHJ@tucnak> <ZRWHy66tysUqFyQQ@tucnak>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote:
> > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
> > > > is really trying to be poor-mans GMP by limiting the maximum precision.
> > > 
> > > I'd characterise widest_int as "a wide_int that is big enough to hold
> > > all supported integer types, without losing sign information".  It's
> > > not big enough to do arbitrary arithmetic without losing precision
> > > (in the way that GMP is).
> > > 
> > > If the new limit on integer sizes is 65535 bits for all targets,
> > > then I think that means that widest_int needs to become a 65536-bit type.
> > > (But not with all bits represented all the time, of course.)
> > 
> > If the widest_int storage would be dependent on the len rather than
> > precision for how it is stored, then I think we'd need a new method which
> > would be called at the start of filling the limbs where we'd tell how many
> > limbs there would be (i.e. what will set_len be called with later on), and
> > do nothing for all storages but the new widest_int_storage.
> 
> So, I've spent some time on this.  While wide_int is in the patch a fixed/variable
> number of limbs (aka len) storage depending on precision (precision >
> WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is
> inline), widest_int has always very large precision
> (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed
> limitation of 255 64-bit limbs) but uses inline array for length
> corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses
> similarly to wide_int a heap allocated array of limbs.
> These changes make both wide_int and widest_int obviously non-POD, not
> trivially default constructible, nor trivially copy constructible, trivially
> destructible, trivially copyable, so not a good fit for GC and some vec
> operations.
> One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as
> large _BitInt constants don't appear in RTL, we really don't need such large
> precisions there.
> So, for wide_int the patch introduces rwide_int, restricted wide_int, which
> acts like the old wide_int (except that it is now trivially default
> constructible and has assertions precision isn't set above
> WIDE_INT_MAX_PRECISION).
> For widest_int, the nastiness is that because it always has huge precision
> of 16320 right now,
> a) we need to be told upfront in wide-int.h before calling the large
>    value internal functions in wide-int.cc how many elements we'll need for
>    the result (some reasonable upper estimate is fine)
> b) various of the wide-int.cc functions were lazy and assumed precision is
>    small enough and often used up to that many elements, which is
>    undesirable; so, it now tries to decreas that and use xi.len etc. based
>    estimates instead if possible (sometimes only if precision is above
>    WIDE_INT_MAX_PRECISION)
> c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or
>    unsigned division with dividend having most significant bit set in
>    widest_int - while such values were considered to be above or equal to
>    1 << (WIDE_INT_MAX_PRECISION - 2), now they are with
>    WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int
>    is I think only done in ccp and I'd strongly hope that we treat the
>    values as unsigned and so usually much smaller length; so it is just
>    when we call wi::lrshift (-1, 2) or similar that results change.
> I've noticed that for wide_int or widest_int references even simple
> operations like eq_p liked to allocate and immediately free huge buffers,
> which was caused by wide_int doing allocation on creation with a particular
> precision and e.g. get_binary_precision running into that.  So, I've
> duplicated that to avoid the allocations when all we need is just a
> precision.
> 
> The patch below doesn't actually build anymore since the vec.h asserts
> (which point to useful stuff though), so temporarily I've applied it also
> with
> --- gcc/vec.h.xx	2023-09-28 12:56:09.055786055 +0200
> +++ gcc/vec.h	2023-09-28 13:15:31.760487111 +0200
> @@ -1197,7 +1197,7 @@ template<typename T, typename A>
>  inline void
>  vec<T, A, vl_embed>::qsort (int (*cmp) (const void *, const void *))
>  {
> -  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
> +//  static_assert (vec_detail::is_trivially_copyable_or_pair <T>::value, "");
>    if (length () > 1)
>      gcc_qsort (address (), length (), sizeof (T), cmp);
>  }
> @@ -1422,7 +1422,7 @@ template<typename T>
>  void
>  gt_ggc_mx (vec<T, va_gc> *v)
>  {
> -  static_assert (std::is_trivially_destructible <T>::value, "");
> +//  static_assert (std::is_trivially_destructible <T>::value, "");
>    extern void gt_ggc_mx (T &);
>    for (unsigned i = 0; i < v->length (); i++)
>      gt_ggc_mx ((*v)[i]);
> hack.  The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on
> widest_int vector (to be exact, swapping elements in the vector of

For this (besides choosing a fixed smaller widest_int as indicated in
the other mail) sorting could be done indirect by sorting a
[0, 1, 2 ... n-1 ] vector instead.

> And, now the question is what to do about this.  I guess for omp_general
> I could just use generic_wide_int <fixed_wide_int_storage <1024> > or
> something similar, after all the widest_int wasn't really great when it
> had maximum precision of WIDE_INT_MAX_PRECISION, different values on
> different targets, it has very few uses and is easy to change (thinking
> about this, makes me wonder what we do for offloading if offload host
> has different WIDE_INT_MAX_PRECISION from offload target).
> 
> But the more important question is what to do about loop/niters analysis.
> I think for number of iteration analysis it might be ok to punt somehow
> (if there is a way to tell that number of iterations is unknown) if we
> get some bound which is too large to be expressible in some reasonably small
> fixed precision (whether it is WIDE_INT_MAX_PRECISION, or something
> different is a question).  We could either introduce yet another widest_int
> like storage which would have still WIDEST_INT_MAX_PRECISION precision, but
> would ICE if length is set to something above its fixed width.  One problem
> is that the write_val estimations are often just conservatively larger and
> could trigger even if the value fits in the end.  Or we could use
> generic_wide_int <fixed_wide_int_storage <WIDE_INT_MAX_PRECISION> > (perhaps
> call that rwidest_int), the drawback would be that it would be slightly harder
> to use as it has different precision from widest_int, we'd need to do some
> from on it or the like.  Plus I really don't know the niters code to know
> how to punt.

I think when widest_int is no longer bound by something like the
largest integer mode but now has to cater for arbitrary large _BitInt
we have to get rid of widest_int or we have to make it variable-precision
and reallocate it like auto_vec<T, n>.

For GC we can have the storage still heap allocated but of course
CTOR/DTOR is going to be a pain (so better not use widest_int in GC).

> ipa_bits is even worse, because unlike niter analysis, I think it is very
> much desirable to support IPA VRP of all supported _BitInt sizes.  Shall
> we perhaps use trailing_wide_int storage in there, or conditionally
> rwidest_int vs. INTEGER_CSTs for stuff that doesn't fit, something else?

trailing_wide_int storage is the way to go here

> What about slsr?  This is after bitint lowering, so it shouldn't be
> performing opts on larger BITINT_TYPEs and so could also go with the
> rwidest_int.

Just to say I don't really like adding another "widest" int, but
slsr shouldn't need to GC any of that so widest_int should be fine?

Richard.