From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <aldyh@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 78C203858D37
	for <gcc-patches@gcc.gnu.org>; Wed,  5 Oct 2022 10:14:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 78C203858D37
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1664964878;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=h8Hv3chzJJ0UiIrXr4TUGzUOvsJHVFLg+6DCAVYz+9o=;
	b=JCaHfZBRRO2FNi1nYMV6Ksgw3NOBdtmi3a1qJQUfTclr8qxqgoHT8saO/eP6sTmva99avg
	lYVcyBTlSy9dYz3s/kAeJI7ZVyKqclGhNyR31k+J7mIOUm4TAGmsyYU5jzZBXlTTzsPcmt
	xUgxQ+WeibNnd4ZLlB1OtXVmLKIZSds=
Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com
 [209.85.160.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-607-4Dc1ezEJN8W6Fi99lZV6dQ-1; Wed, 05 Oct 2022 06:14:37 -0400
X-MC-Unique: 4Dc1ezEJN8W6Fi99lZV6dQ-1
Received: by mail-oa1-f70.google.com with SMTP id 586e51a60fabf-13259536f3bso5724587fac.10
        for <gcc-patches@gcc.gnu.org>; Wed, 05 Oct 2022 03:14:37 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date;
        bh=h8Hv3chzJJ0UiIrXr4TUGzUOvsJHVFLg+6DCAVYz+9o=;
        b=XJZqZjWytqR9xKx5MU3u8zjdnA672Yn4YZA8ioUsRIIhQKKUOHzRj7BZ+XxUzAlKOp
         Wc2NsohavI/laneX3e5TUzQgHQI0/3i2fgFe8B53sc1hKbKzkEip5jqKD/SXFN7V982p
         3OlIhXgUp6qWTwJ0d2zIgdi9PxXO8aEsTbfOj/5T0D3W2EdCD8AiQc5PcNyKK37Q02Gs
         EWwGhCjSa4FANsHcFVJrYbZBzmNek1EOscQX/fn5Vs9QKTmy+lAW11mZWHssS/57iYay
         ACZuGCnPMLJy3ILBfLCiaz/NbFDu7rcqMHv1V+djgIr1NB5a3mapdkh2DPXmIRTCAFzI
         UPaQ==
X-Gm-Message-State: ACrzQf3VfDYVPALiieXQPQlm3zyBFJBmKIBfoyZ0NCigObxC0sTd/tdh
	FWjdatvfzdkcz4OQDSq46oEqA5lnSpR+MSNtm7u0hpBS/cZEtbpPIWLxJkgzvpx2LR2lAqHWpyK
	nVlxuW770lqaHB7wlbmHWcm4h/8FIyyT/QA==
X-Received: by 2002:a05:6870:898e:b0:12b:3e64:e86d with SMTP id f14-20020a056870898e00b0012b3e64e86dmr2155211oaq.265.1664964876086;
        Wed, 05 Oct 2022 03:14:36 -0700 (PDT)
X-Google-Smtp-Source: AMsMyM4puFQuc+HBj5bZANybkn2Nu6qKgKeWKHJHz+Bo/RS/qe+AFkTft+5rOx+UQBxYy/lkK7LEjhgcuukfaEPt6KU=
X-Received: by 2002:a05:6870:898e:b0:12b:3e64:e86d with SMTP id
 f14-20020a056870898e00b0012b3e64e86dmr2155199oaq.265.1664964875768; Wed, 05
 Oct 2022 03:14:35 -0700 (PDT)
MIME-Version: 1.0
References: <CAGm3qMU1u_-_4Fy1wWk59FT+LbvsE1vFKdKq8pQ5ePv9uXZzag@mail.gmail.com>
 <07FCA378-7E86-4E06-B506-FED0C60CE31C@gmail.com> <CAGm3qMXQTs-B8mTBbRQYUn3ykRw1EP5TygjYQF_wfLti6_P4aw@mail.gmail.com>
 <974d3399-7eac-803d-2c64-fb7d7bf3f71f@redhat.com>
In-Reply-To: <974d3399-7eac-803d-2c64-fb7d7bf3f71f@redhat.com>
From: Aldy Hernandez <aldyh@redhat.com>
Date: Wed, 5 Oct 2022 12:14:24 +0200
Message-ID: <CAGm3qMW7bfz5mZ=gavAgf4j2jsZcUJLe_NbEWzuFfc81-PDV9w@mail.gmail.com>
Subject: Re: [COMMITTED] Convert nonzero mask in irange to wide_int.
To: Andrew MacLeod <amacleod@redhat.com>
Cc: Richard Biener <richard.guenther@gmail.com>, GCC patches <gcc-patches@gcc.gnu.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, Oct 4, 2022 at 5:42 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 10/4/22 11:14, Aldy Hernandez wrote:
> > On Tue, Oct 4, 2022 at 4:34 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >>
> >>
> >>> Am 04.10.2022 um 16:30 schrieb Aldy Hernandez <aldyh@redhat.com>:
> >>>
> >>> =EF=BB=BFOn Tue, Oct 4, 2022 at 3:27 PM Andrew MacLeod <amacleod@redh=
at.com> wrote:
> >>>>
> >>>>> On 10/4/22 08:13, Aldy Hernandez via Gcc-patches wrote:
> >>>>>> On Tue, Oct 4, 2022, 13:28 Aldy Hernandez <aldyh@redhat.com> wrote=
:
> >>>>>> On Tue, Oct 4, 2022 at 9:55 AM Richard Biener
> >>>>>> <richard.guenther@gmail.com> wrote:
> >>>>>>> Am 04.10.2022 um 09:36 schrieb Aldy Hernandez via Gcc-patches <
> >>>>>> gcc-patches@gcc.gnu.org>:
> >>>>>>>> =EF=BB=BFThe reason the nonzero mask was kept in a tree was basi=
cally inertia,
> >>>>>>>> as everything in irange is a tree.  However, there's no need to =
keep
> >>>>>>>> it in a tree, as the conversions to and from wide ints are very
> >>>>>>>> annoying.  That, plus special casing NULL masks to be -1 is pron=
e
> >>>>>>>> to error.
> >>>>>>>>
> >>>>>>>> I have not only rewritten all the uses to assume a wide int, but
> >>>>>>>> have corrected a few places where we weren't propagating the mas=
ks, or
> >>>>>>>> rather pessimizing them to -1.  This will become more important =
in
> >>>>>>>> upcoming patches where we make better use of the masks.
> >>>>>>>>
> >>>>>>>> Performance testing shows a trivial improvement in VRP, as thing=
s like
> >>>>>>>> irange::contains_p() are tied to a tree.  Ughh, can't wait for t=
rees in
> >>>>>>>> iranges to go away.
> >>>>>>> You want trailing wide int storage though.  A wide_int is quite l=
arge.
> >>>>>> Absolutely, this is only for short term storage.  Any time we need
> >>>>>> long term storage, say global ranges in SSA_NAME_RANGE_INFO, we go
> >>>>>> through vrange_storage which will stream things in a more memory
> >>>>>> efficient manner.  For irange, vrange_storage will stream all the
> >>>>>> sub-ranges, including the nonzero bitmask which is the first entry=
 in
> >>>>>> such storage, as trailing_wide_ints.
> >>>>>>
> >>>>>> See irange_storage_slot to see how it lives in GC memory.
> >>>>>>
> >>>>> That being said, the ranger's internal cache uses iranges, albeit w=
ith a
> >>>>> squished down number of subranges (the minimum amount to represent =
the
> >>>>> range).  So each cache entry will now be bigger by the difference b=
etween
> >>>>> one tree and one wide int.
> >>>>>
> >>>>> I wonder if we should change the cache to use vrange_storage. If no=
t now,
> >>>>> then when we convert all the subranges to wide ints.
> >>>>>
> >>>>> Of course, the memory pressure of the cache is not nearly as proble=
matic as
> >>>>> SSA_NAME_RANGE_INFO. The cache only stores names it cares about.
> >>>> Rangers cache can be a memory bottleneck in pathological cases..
> >>>> Certainly not as bad as it use to be, but I'm sure it can still be
> >>>> problematic.    Its suppose to be a memory efficient representation
> >>>> because of that.  The cache can have an entry for any live ssa-name
> >>>> (which means all of them at some point in the IL) multiplied by a fa=
ctor
> >>>> involving the number of dominator blocks and outgoing edges ranges a=
re
> >>>> calculated on.   So while SSA_NAME_RANGE_INFO is a linear thing, the
> >>>> cache lies somewhere between a logarithmic and exponential factor ba=
sed
> >>>> on the CFG size.
> >>> Hmmm, perhaps the ultimate goal here should be to convert the cache t=
o
> >>> use vrange_storage, which uses trailing wide ints for all of the end
> >>> points plus the masks (known_ones included for the next release).
> >>>
> >>>> if you are growing the common cases of 1 to 2 endpoints to more than
> >>>> double in size (and most of the time not be needed), that would not =
be
> >>>> very appealing :-P  If we have any wide-ints, they would need to be =
a
> >>>> memory efficient version.   The Cache uses an irange_allocator, whic=
h is
> >>>> suppose to provide a memory efficient objects.. hence why it trims t=
he
> >>>> number of ranges down to only what is needed.  It seems like a trail=
ing
> >>>> wide-Int might be in order based on that..
> >>>>
> >>>> Andrew
> >>>>
> >>>>
> >>>> PS. which will be more problematic if you eventually introduce a
> >>>> known_ones wide_int.    I thought the mask tracking was/could be
> >>>> something simple like  HOST_WIDE_INT..  then you only tracks masks i=
n
> >>>> types up to the size of a HOST_WIDE_INT.  then storage and masking i=
s
> >>>> all trivial without going thru a wide_int.    Is that not so/possibl=
e?
> >>> That's certainly easy and cheaper to do.  The hard part was fixing al=
l
> >>> the places where we weren't keeping the masks up to date, and that's
> >>> done (sans any bugs ;-)).
> >>>
> >>> Can we get consensus here on only tracking masks for type sizes less
> >>> than HOST_WIDE_INT?  I'd hate to do all the work only to realize we
> >>> need to track 512 bit masks on a 32-bit host cross :-).
> >> 64bits are not enough, 128 might be.  But there=E2=80=99s trailing wid=
e int storage so I don=E2=80=99t see the point in restricting ourselves?
> > Fair enough.  Perhaps we should bite the bullet and convert the cache
> > to vrange_storage which is all set up for streaming irange's with
> > trailing_wide_ints.  No changes should be necessary for irange, since
> > we never have more than 3-4 live at any one time.  It's the cache that
> > needs twiddling.
> >
> Wouldnt it be irange_allocator that needs twiddling?  It purpose in life
> is to allocate iranges for memory storage...  the cache is just a
> client, as is rangers global cache, etc...  that was the intention of
> irange_allocator to isolate clients from having to worry about memory
> storage issues?

Hmmm, we currently have 2 storage mechanisms, vrange_allocator and
vrange_storage.  The first is used for the cache and the other for
global storage (SSA_NAME_RANGE_INFO).  The vrange allocator returns a
plain irange (trees and all), but with the minimum size needed to
represent the range.  It's seamless to use because it returns an
irange that can be used with the same API we're used to.

On the other hand, vrange_storage returns an opaque pointer (i.e. not
a vrange) and has an API used explicitly to set and get the vrange in
storage:

class vrange_storage
{
public:
  vrange_storage (vrange_allocator *alloc) : m_alloc (alloc) { }
  void *alloc_slot (const vrange &r);
  void free (void *slot) { m_alloc->free (slot); }
  void get_vrange (const void *slot, vrange &r, tree type);
  void set_vrange (void *slot, const vrange &r);
  static bool fits_p (const void *slot, const vrange &r);
};

It also stores a range in the minimum size needed, but is more memory
efficient because it stores the sub-ranges as trailing_wide_ints.
However, it doesn't return a convenient irange we can store in
directly.  You must go through get_vrange() and set_vrange().  I
probably went this route because that's how the old
SSA_NAME_RANGE_INFO stuff worked...it was an API to convert an irange
to something streamable.  But in retrospect, we should've probably
merged everything into vrange_allocator.

My initial thought was to convert the ranger's cache to use
vrange_storage, but I must admit that vrange_storage is not as elegant
as just getting an irange/vrange that can be stored directly.

I think a clean solution would be to modify the vrange_allocator
(currently used by the cache) to have an irange_storage of sorts that
inherits from vrange (instead of irange since it contains a tree
storage pointer, and other cruft).  Then we can have irange_storage
define a minimal set of things to make the cache client work
(set_undefined, set_varying, operator=3D).  Every other method would be
gcc_unreachable.  This way, the cache can continue to work as before
and we can have 2 irange's: one on trees (irange) and one with
trailing_wide_ints for storage (irange_storage).  And we'd provide a
way to copy back and forth seamlessly.

Long term we should have one allocator, for both the cache (with
obstacks) and global storage (with GC).  I'm torn between converting
the cache to vrange_storage since it already uses trailing wide ints,
or implement an irange_storage for vrange_allocator that can be
transparently copy between the actual irange.

However... I don't think I have the stomach to overhaul the allocators
this late in the release.  For this release I may opt to put the
nonzero mask back in a tree, but have it always be set.  The NULL =3D=3D
-1 shortcut was very error prone.  The rest of my fixes in this patch
still apply, as they keep better track of the masks, which we need.

Thoughts?
Aldy


Aldy