public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Vladimir Makarov <vmakarov@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
Date: Thu, 7 Dec 2023 10:49:13 +0100 (CET)	[thread overview]
Message-ID: <q5243qq1-4336-961n-1q42-nr90n9058p18@fhfr.qr> (raw)
In-Reply-To: <ZXGEBmzbYlbFIlyv@tucnak>

On Thu, 7 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> As documented, --param min-nondebug-insn-uid= is very useful in debugging
> -fcompare-debug issues in RTL dumps, without it it is really hard to
> find differences.  With it, DEBUG_INSNs generally use low INSN_UIDs
> (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
> For good results, the parameter should be larger than the number of
> DEBUG_INSNs in all or at least problematic functions, so I typically
> use --param min-nondebug-insn-uid=10000 or --param
> min-nondebug-insn-uid=1000.
> 
> The PR is about using --param min-nondebug-insn-uid=2147483647 or
> similar behavior can be achieved with that minus some epsilon,
> INSN_UIDs for the non-debug insns then wrap around and as they are signed,
> all kinds of things break.  Obviously, that can happen even without that
> option, but functions containing more than 2147483647 insns usually don't
> compile much earlier due to getting out of memory.
> As it is a debugging option, I'd prefer not to impose any drastically small
> limits on it because if a function has a lot of DEBUG_INSNs, it is useful
> to start still above them, otherwise the allocation of uids will DTRT
> even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
> allocations.
> 
> So, the following patch uses 0x40000000 limit, half the maximum amount for
> DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
> unlikely overflows in real world.
> 
> Note, using large min-nondebug-insn-uid is very expensive for compile time
> memory and compile time, because DF as well as various RTL passes use
> arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
> ditto df (df->insns).
> 
> Now, in LRA I've ran into ICEs already with
> --param min-nondebug-insn-uid=0x2aaaaaaa
> on 64-bit host.  It uses a custom vector management and wants to grow
> allocation 1.5x when growing, but all this computation is done in int,
> so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value.  And
> unlike vec.cc growing which also uses unsigned int type for the above
> (and the + 1 is not there), it also doesn't make sure if there is an
> overflow that it allocates at least as much as needed, vec.cc
> does
>   if ...
>   else
>     /* Grow slower when large.  */
>     alloc = (alloc * 3 / 2);
> 
>   /* If this is still too small, set it to the right size. */
>   if (alloc < desired)
>     alloc = desired;
> so even if there is overflow during the * 1.5 computation, but
> desired is still representable in the range of the alloced counter
> (31-bits in both vec.h and LRA), it doesn't grow exponentially but
> at least works for the current value.
> 
> So, one way to fix the LRA issue would be just to use
>   lra_insn_recog_data_len = index * 3U / 2;
>   if (lra_insn_recog_data_len <= index)
>     lra_insn_recog_data_len = index + 1;
> basically do what vec.cc does.  I thought we can do better for
> both vec.cc and LRA on 64-bit hosts even without growing the allocated
> counters, but now that I look at it again, perhaps we can't.
> The above overflows already with original alloc or lra_insn_recog_data_len
> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> 1.  I thought (and the patch implements it) that we could use
> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> which is still ok in unsigned, but given that vec.h then stores the
> counter into unsigned m_alloc:31; bit-field, it is too much.
> 
> The patch below is what I've actually bootstrapped/regtested on
> x86_64-linux and i686-linux, but given the above I think I should drop
> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
> 
> With the lra.cc change, one can actually compile simple function
> with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
> (i.e. the new limit), but already needed quite a big part of my 32GB
> RAM + 24GB swap.
> The patch adds a dg-skip-if for that case though, because such option
> is way too much for 32-bit hosts even at -O0 and empty function,
> and with -O3 on a longer function it is too much for average 64-bit host
> as well.  Without the dg-skip-if I got on 64-bit host:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
> and
> cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
> on 32-bit host.  A test requiring more than 532GB of RAM on 64-bit hosts
> is just too much for our testsuite.
> 
> 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/112411
> 	* params.opt (-param=min-nondebug-insn-uid=): Add
> 	IntegerRange(0, 1073741824).
> 	* lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3
> 	rather than 3 in * 3 / 2 computation and if the result is
> 	smaller or equal to index, use index + 1.
> 	* vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3
> 	rather than 3 in * 3 / 2 computation.
> 
> 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> 	--param min-nondebug-insn-uid=1073741824.
> 
> --- gcc/params.opt.jj	2023-11-02 07:49:18.010852541 +0100
> +++ gcc/params.opt	2023-12-06 18:55:57.045420935 +0100
> @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
>  The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
>  
>  -param=min-nondebug-insn-uid=
> -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
>  The minimum UID to be used for a nondebug insn.

This new limit is OK.

>  -param=min-size-for-stack-sharing=
> --- gcc/lra.cc.jj	2023-12-05 13:17:29.642260866 +0100
> +++ gcc/lra.cc	2023-12-06 19:52:01.759241999 +0100
> @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
>    if (lra_insn_recog_data_len > index)
>      return;
>    old = lra_insn_recog_data_len;
> -  lra_insn_recog_data_len = index * 3 / 2 + 1;
> +  lra_insn_recog_data_len = index * (size_t) 3 / 2;
> +  if (lra_insn_recog_data_len <= index)
> +    lra_insn_recog_data_len = index + 1;

As is this (though I believe 3U would be better, no need for 64bit
multiplies with the later check).

>    lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
>  				    lra_insn_recog_data,
>  				    lra_insn_recog_data_len);
> --- gcc/vec.cc.jj	2023-09-27 10:37:39.329838572 +0200
> +++ gcc/vec.cc	2023-12-06 19:53:34.670940078 +0100
> @@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi
>      alloc = alloc * 2;
>    else
>      /* Grow slower when large.  */
> -    alloc = (alloc * 3 / 2);
> +    alloc = (alloc * (size_t) 3 / 2);

I agree this isn't much of an improvement, alloc < desired
catches overflow but we don't catch when desired doesn't fit
the 31 bit limit ...

Richard.

>    /* If this is still too small, set it to the right size. */
>    if (alloc < desired)
> --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj	2020-01-12 11:54:37.463397567 +0100
> +++ gcc/testsuite/gcc.dg/params/blocksort-part.c	2023-12-07 08:46:11.656974144 +0100
> @@ -1,4 +1,5 @@
>  /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
> +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
>  
>  /*-------------------------------------------------------------*/
>  /*--- Block sorting machinery                               ---*/
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  parent reply	other threads:[~2023-12-07  9:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07  8:36 Jakub Jelinek
2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
2023-12-07 10:12   ` Richard Biener
2023-12-07 10:32     ` Jakub Jelinek
2023-12-07 12:06       ` Richard Biener
2023-12-08 17:49   ` Vladimir Makarov
2023-12-07  9:49 ` Richard Biener [this message]
2023-12-07 10:54 ` [PATCH] " Jakub Jelinek
2023-12-08  7:37   ` [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411] Jakub Jelinek
2023-12-08  7:39     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=q5243qq1-4336-961n-1q42-nr90n9058p18@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=vmakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).