From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Vladimir Makarov <vmakarov@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
Date: Thu, 7 Dec 2023 10:49:13 +0100 (CET) [thread overview]
Message-ID: <q5243qq1-4336-961n-1q42-nr90n9058p18@fhfr.qr> (raw)
In-Reply-To: <ZXGEBmzbYlbFIlyv@tucnak>
On Thu, 7 Dec 2023, Jakub Jelinek wrote:
> Hi!
>
> As documented, --param min-nondebug-insn-uid= is very useful in debugging
> -fcompare-debug issues in RTL dumps, without it it is really hard to
> find differences. With it, DEBUG_INSNs generally use low INSN_UIDs
> (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
> For good results, the parameter should be larger than the number of
> DEBUG_INSNs in all or at least problematic functions, so I typically
> use --param min-nondebug-insn-uid=10000 or --param
> min-nondebug-insn-uid=1000.
>
> The PR is about using --param min-nondebug-insn-uid=2147483647 or
> similar behavior can be achieved with that minus some epsilon,
> INSN_UIDs for the non-debug insns then wrap around and as they are signed,
> all kinds of things break. Obviously, that can happen even without that
> option, but functions containing more than 2147483647 insns usually don't
> compile much earlier due to getting out of memory.
> As it is a debugging option, I'd prefer not to impose any drastically small
> limits on it because if a function has a lot of DEBUG_INSNs, it is useful
> to start still above them, otherwise the allocation of uids will DTRT
> even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
> allocations.
>
> So, the following patch uses 0x40000000 limit, half the maximum amount for
> DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
> unlikely overflows in real world.
>
> Note, using large min-nondebug-insn-uid is very expensive for compile time
> memory and compile time, because DF as well as various RTL passes use
> arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
> ditto df (df->insns).
>
> Now, in LRA I've ran into ICEs already with
> --param min-nondebug-insn-uid=0x2aaaaaaa
> on 64-bit host. It uses a custom vector management and wants to grow
> allocation 1.5x when growing, but all this computation is done in int,
> so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value. And
> unlike vec.cc growing which also uses unsigned int type for the above
> (and the + 1 is not there), it also doesn't make sure if there is an
> overflow that it allocates at least as much as needed, vec.cc
> does
> if ...
> else
> /* Grow slower when large. */
> alloc = (alloc * 3 / 2);
>
> /* If this is still too small, set it to the right size. */
> if (alloc < desired)
> alloc = desired;
> so even if there is overflow during the * 1.5 computation, but
> desired is still representable in the range of the alloced counter
> (31-bits in both vec.h and LRA), it doesn't grow exponentially but
> at least works for the current value.
>
> So, one way to fix the LRA issue would be just to use
> lra_insn_recog_data_len = index * 3U / 2;
> if (lra_insn_recog_data_len <= index)
> lra_insn_recog_data_len = index + 1;
> basically do what vec.cc does. I thought we can do better for
> both vec.cc and LRA on 64-bit hosts even without growing the allocated
> counters, but now that I look at it again, perhaps we can't.
> The above overflows already with original alloc or lra_insn_recog_data_len
> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> 1. I thought (and the patch implements it) that we could use
> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> which is still ok in unsigned, but given that vec.h then stores the
> counter into unsigned m_alloc:31; bit-field, it is too much.
>
> The patch below is what I've actually bootstrapped/regtested on
> x86_64-linux and i686-linux, but given the above I think I should drop
> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
>
> With the lra.cc change, one can actually compile simple function
> with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
> (i.e. the new limit), but already needed quite a big part of my 32GB
> RAM + 24GB swap.
> The patch adds a dg-skip-if for that case though, because such option
> is way too much for 32-bit hosts even at -O0 and empty function,
> and with -O3 on a longer function it is too much for average 64-bit host
> as well. Without the dg-skip-if I got on 64-bit host:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
> and
> cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
> on 32-bit host. A test requiring more than 532GB of RAM on 64-bit hosts
> is just too much for our testsuite.
>
> 2023-12-07 Jakub Jelinek <jakub@redhat.com>
>
> PR middle-end/112411
> * params.opt (-param=min-nondebug-insn-uid=): Add
> IntegerRange(0, 1073741824).
> * lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3
> rather than 3 in * 3 / 2 computation and if the result is
> smaller or equal to index, use index + 1.
> * vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3
> rather than 3 in * 3 / 2 computation.
>
> * gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> --param min-nondebug-insn-uid=1073741824.
>
> --- gcc/params.opt.jj 2023-11-02 07:49:18.010852541 +0100
> +++ gcc/params.opt 2023-12-06 18:55:57.045420935 +0100
> @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
> The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
>
> -param=min-nondebug-insn-uid=
> -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
> The minimum UID to be used for a nondebug insn.
This new limit is OK.
> -param=min-size-for-stack-sharing=
> --- gcc/lra.cc.jj 2023-12-05 13:17:29.642260866 +0100
> +++ gcc/lra.cc 2023-12-06 19:52:01.759241999 +0100
> @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
> if (lra_insn_recog_data_len > index)
> return;
> old = lra_insn_recog_data_len;
> - lra_insn_recog_data_len = index * 3 / 2 + 1;
> + lra_insn_recog_data_len = index * (size_t) 3 / 2;
> + if (lra_insn_recog_data_len <= index)
> + lra_insn_recog_data_len = index + 1;
As is this (though I believe 3U would be better, no need for 64bit
multiplies with the later check).
> lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
> lra_insn_recog_data,
> lra_insn_recog_data_len);
> --- gcc/vec.cc.jj 2023-09-27 10:37:39.329838572 +0200
> +++ gcc/vec.cc 2023-12-06 19:53:34.670940078 +0100
> @@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi
> alloc = alloc * 2;
> else
> /* Grow slower when large. */
> - alloc = (alloc * 3 / 2);
> + alloc = (alloc * (size_t) 3 / 2);
I agree this isn't much of an improvement, alloc < desired
catches overflow but we don't catch when desired doesn't fit
the 31 bit limit ...
Richard.
> /* If this is still too small, set it to the right size. */
> if (alloc < desired)
> --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj 2020-01-12 11:54:37.463397567 +0100
> +++ gcc/testsuite/gcc.dg/params/blocksort-part.c 2023-12-07 08:46:11.656974144 +0100
> @@ -1,4 +1,5 @@
> /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
> +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
>
> /*-------------------------------------------------------------*/
> /*--- Block sorting machinery ---*/
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
next prev parent reply other threads:[~2023-12-07 9:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 8:36 Jakub Jelinek
2023-12-07 8:39 ` [PATCH] v2: " Jakub Jelinek
2023-12-07 10:12 ` Richard Biener
2023-12-07 10:32 ` Jakub Jelinek
2023-12-07 12:06 ` Richard Biener
2023-12-08 17:49 ` Vladimir Makarov
2023-12-07 9:49 ` Richard Biener [this message]
2023-12-07 10:54 ` [PATCH] " Jakub Jelinek
2023-12-08 7:37 ` [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411] Jakub Jelinek
2023-12-08 7:39 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=q5243qq1-4336-961n-1q42-nr90n9058p18@fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=vmakarov@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).