public inbox for
 help / color / mirror / Atom feed
From: Aldy Hernandez <>
To: Andrew MacLeod <>
Cc: Richard Biener <>,
	GCC patches <>
Subject: Re: [COMMITTED] Convert nonzero mask in irange to wide_int.
Date: Wed, 5 Oct 2022 12:14:24 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Oct 4, 2022 at 5:42 PM Andrew MacLeod <> wrote:
> On 10/4/22 11:14, Aldy Hernandez wrote:
> > On Tue, Oct 4, 2022 at 4:34 PM Richard Biener
> > <> wrote:
> >>
> >>
> >>> Am 04.10.2022 um 16:30 schrieb Aldy Hernandez <>:
> >>>
> >>> On Tue, Oct 4, 2022 at 3:27 PM Andrew MacLeod <> wrote:
> >>>>
> >>>>> On 10/4/22 08:13, Aldy Hernandez via Gcc-patches wrote:
> >>>>>> On Tue, Oct 4, 2022, 13:28 Aldy Hernandez <> wrote:
> >>>>>> On Tue, Oct 4, 2022 at 9:55 AM Richard Biener
> >>>>>> <> wrote:
> >>>>>>> Am 04.10.2022 um 09:36 schrieb Aldy Hernandez via Gcc-patches <
> >>>>>>>:
> >>>>>>>> The reason the nonzero mask was kept in a tree was basically inertia,
> >>>>>>>> as everything in irange is a tree.  However, there's no need to keep
> >>>>>>>> it in a tree, as the conversions to and from wide ints are very
> >>>>>>>> annoying.  That, plus special casing NULL masks to be -1 is prone
> >>>>>>>> to error.
> >>>>>>>>
> >>>>>>>> I have not only rewritten all the uses to assume a wide int, but
> >>>>>>>> have corrected a few places where we weren't propagating the masks, or
> >>>>>>>> rather pessimizing them to -1.  This will become more important in
> >>>>>>>> upcoming patches where we make better use of the masks.
> >>>>>>>>
> >>>>>>>> Performance testing shows a trivial improvement in VRP, as things like
> >>>>>>>> irange::contains_p() are tied to a tree.  Ughh, can't wait for trees in
> >>>>>>>> iranges to go away.
> >>>>>>> You want trailing wide int storage though.  A wide_int is quite large.
> >>>>>> Absolutely, this is only for short term storage.  Any time we need
> >>>>>> long term storage, say global ranges in SSA_NAME_RANGE_INFO, we go
> >>>>>> through vrange_storage which will stream things in a more memory
> >>>>>> efficient manner.  For irange, vrange_storage will stream all the
> >>>>>> sub-ranges, including the nonzero bitmask which is the first entry in
> >>>>>> such storage, as trailing_wide_ints.
> >>>>>>
> >>>>>> See irange_storage_slot to see how it lives in GC memory.
> >>>>>>
> >>>>> That being said, the ranger's internal cache uses iranges, albeit with a
> >>>>> squished down number of subranges (the minimum amount to represent the
> >>>>> range).  So each cache entry will now be bigger by the difference between
> >>>>> one tree and one wide int.
> >>>>>
> >>>>> I wonder if we should change the cache to use vrange_storage. If not now,
> >>>>> then when we convert all the subranges to wide ints.
> >>>>>
> >>>>> Of course, the memory pressure of the cache is not nearly as problematic as
> >>>>> SSA_NAME_RANGE_INFO. The cache only stores names it cares about.
> >>>> Rangers cache can be a memory bottleneck in pathological cases..
> >>>> Certainly not as bad as it use to be, but I'm sure it can still be
> >>>> problematic.    Its suppose to be a memory efficient representation
> >>>> because of that.  The cache can have an entry for any live ssa-name
> >>>> (which means all of them at some point in the IL) multiplied by a factor
> >>>> involving the number of dominator blocks and outgoing edges ranges are
> >>>> calculated on.   So while SSA_NAME_RANGE_INFO is a linear thing, the
> >>>> cache lies somewhere between a logarithmic and exponential factor based
> >>>> on the CFG size.
> >>> Hmmm, perhaps the ultimate goal here should be to convert the cache to
> >>> use vrange_storage, which uses trailing wide ints for all of the end
> >>> points plus the masks (known_ones included for the next release).
> >>>
> >>>> if you are growing the common cases of 1 to 2 endpoints to more than
> >>>> double in size (and most of the time not be needed), that would not be
> >>>> very appealing :-P  If we have any wide-ints, they would need to be a
> >>>> memory efficient version.   The Cache uses an irange_allocator, which is
> >>>> suppose to provide a memory efficient objects.. hence why it trims the
> >>>> number of ranges down to only what is needed.  It seems like a trailing
> >>>> wide-Int might be in order based on that..
> >>>>
> >>>> Andrew
> >>>>
> >>>>
> >>>> PS. which will be more problematic if you eventually introduce a
> >>>> known_ones wide_int.    I thought the mask tracking was/could be
> >>>> something simple like  HOST_WIDE_INT..  then you only tracks masks in
> >>>> types up to the size of a HOST_WIDE_INT.  then storage and masking is
> >>>> all trivial without going thru a wide_int.    Is that not so/possible?
> >>> That's certainly easy and cheaper to do.  The hard part was fixing all
> >>> the places where we weren't keeping the masks up to date, and that's
> >>> done (sans any bugs ;-)).
> >>>
> >>> Can we get consensus here on only tracking masks for type sizes less
> >>> than HOST_WIDE_INT?  I'd hate to do all the work only to realize we
> >>> need to track 512 bit masks on a 32-bit host cross :-).
> >> 64bits are not enough, 128 might be.  But there’s trailing wide int storage so I don’t see the point in restricting ourselves?
> > Fair enough.  Perhaps we should bite the bullet and convert the cache
> > to vrange_storage which is all set up for streaming irange's with
> > trailing_wide_ints.  No changes should be necessary for irange, since
> > we never have more than 3-4 live at any one time.  It's the cache that
> > needs twiddling.
> >
> Wouldnt it be irange_allocator that needs twiddling?  It purpose in life
> is to allocate iranges for memory storage...  the cache is just a
> client, as is rangers global cache, etc...  that was the intention of
> irange_allocator to isolate clients from having to worry about memory
> storage issues?

Hmmm, we currently have 2 storage mechanisms, vrange_allocator and
vrange_storage.  The first is used for the cache and the other for
global storage (SSA_NAME_RANGE_INFO).  The vrange allocator returns a
plain irange (trees and all), but with the minimum size needed to
represent the range.  It's seamless to use because it returns an
irange that can be used with the same API we're used to.

On the other hand, vrange_storage returns an opaque pointer (i.e. not
a vrange) and has an API used explicitly to set and get the vrange in

class vrange_storage
  vrange_storage (vrange_allocator *alloc) : m_alloc (alloc) { }
  void *alloc_slot (const vrange &r);
  void free (void *slot) { m_alloc->free (slot); }
  void get_vrange (const void *slot, vrange &r, tree type);
  void set_vrange (void *slot, const vrange &r);
  static bool fits_p (const void *slot, const vrange &r);

It also stores a range in the minimum size needed, but is more memory
efficient because it stores the sub-ranges as trailing_wide_ints.
However, it doesn't return a convenient irange we can store in
directly.  You must go through get_vrange() and set_vrange().  I
probably went this route because that's how the old
SSA_NAME_RANGE_INFO stuff was an API to convert an irange
to something streamable.  But in retrospect, we should've probably
merged everything into vrange_allocator.

My initial thought was to convert the ranger's cache to use
vrange_storage, but I must admit that vrange_storage is not as elegant
as just getting an irange/vrange that can be stored directly.

I think a clean solution would be to modify the vrange_allocator
(currently used by the cache) to have an irange_storage of sorts that
inherits from vrange (instead of irange since it contains a tree
storage pointer, and other cruft).  Then we can have irange_storage
define a minimal set of things to make the cache client work
(set_undefined, set_varying, operator=).  Every other method would be
gcc_unreachable.  This way, the cache can continue to work as before
and we can have 2 irange's: one on trees (irange) and one with
trailing_wide_ints for storage (irange_storage).  And we'd provide a
way to copy back and forth seamlessly.

Long term we should have one allocator, for both the cache (with
obstacks) and global storage (with GC).  I'm torn between converting
the cache to vrange_storage since it already uses trailing wide ints,
or implement an irange_storage for vrange_allocator that can be
transparently copy between the actual irange.

However... I don't think I have the stomach to overhaul the allocators
this late in the release.  For this release I may opt to put the
nonzero mask back in a tree, but have it always be set.  The NULL ==
-1 shortcut was very error prone.  The rest of my fixes in this patch
still apply, as they keep better track of the masks, which we need.



  reply	other threads:[~2022-10-05 10:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04  7:35 Aldy Hernandez
2022-10-04  7:55 ` Richard Biener
2022-10-04 11:28   ` Aldy Hernandez
2022-10-04 12:13     ` Aldy Hernandez
2022-10-04 13:27       ` Andrew MacLeod
2022-10-04 14:30         ` Aldy Hernandez
2022-10-04 14:34           ` Richard Biener
2022-10-04 15:14             ` Aldy Hernandez
2022-10-04 15:42               ` Andrew MacLeod
2022-10-05 10:14                 ` Aldy Hernandez [this message]
2022-10-07  9:23                   ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).