From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 1DF903857031; Tue,  9 May 2023 12:00:23 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1DF903857031
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1683633623;
	bh=KD4pDxHJ25Cx5d1S7wOj1vFwBp+eeWXJ+eTuZilej2Q=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=eYVr/0/X5p1aylVjombHY7ZoCsVNFJ4ZpPdivRYt+ulGUD52tdIu3/td5ABN+om4b
	 8XQGgt1iP+E/wuOK1nNiOvy5BUxZZvpuLh9BJ0RJNZVuS7sEvbV8qTi5Zn3hUSO5DL
	 BzmsL2x3hKZ25IFvzIz0gVV1aEhW5BW4USUrYPls=
From: "aldyh at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109695] [14 Regression] crash in
 gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e
Date: Tue, 09 May 2023 12:00:20 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: ice-on-valid-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: aldyh at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: aldyh at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-109695-4-p8ZgYqOxak@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109695-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109695-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109695
--- Comment #23 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
An update on the int_range_max memory bloat work.

As Andrew mentioned, having int_range<25> solves the problem, but is just
kicking the can down the road.  I ran some stats on what we actually need o=
n a
bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more=
 to
represent switches, etc.

There's no clear winner for choosing <N>, as the distribution for anything =
past
<3> is rather random.  What I did see was that at no point do we need more =
than
125 sub-ranges (on a set of .ii files from a boostrap).

I've implemented various alternatives using a dynamic approach similar to w=
hat
we do for auto_vec.  I played with allocating 2x as much as needed, and
allocating 10 or 20 more than needed, as well going from N to 255 in one go=
.=20
All of it required some shuffling to make sure the penalty isn't much wrt
virtuals, etc, but I think the dynamic approach is the way to go.

The question is how much of a performance hit are we willing take in order =
to
reduce the memory footprint.  Memory to speed is a linear relationship here=
, so
we just have to pick a number we're happy with.

Here are some numbers for various sub-ranges (the sub-ranges grow automatic=
ally
in union, intersect, invert, and assignment, which are the methods that gro=
w in
sub-ranges).

trunk (wide_ints <255>) =3D>  40912 bytes=20=20
GCC 12 (trees <255>)    =3D>   4112 bytes
auto_int_range<2>       =3D>    432 bytes  (5.14% penalty for VRP)
auto_int_range<3>       =3D>    592 bytes  (4.01% penalty)
auto_int_range<8>       =3D>   1392 bytes  (2.68% penalty)
auto_int_range<10>      =3D>   1712 bytes  (2.14% penalty)

As you can see, even at N=3D10, we're still 24X smaller than trunk and 2.4X
smaller than GCC12 for a 2.14% performance drop.

I'm tempted to just pick a number and tweak this later as we have ultimate
flexibility here.  Plus, we can also revert to a very small N, and have pas=
ses
that care about switches use their own temporaries (auto_int_range<20> or
such).

Note that we got a 13.22% improvement for the wide_int+legacy work, so even=
 the
absolute worst case of a 5.14% penalty would have us sitting on a net 8.76%
improvement over GCC12.

Bike shedding welcome ;-)=