public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
@ 2023-12-07  8:36 Jakub Jelinek
  2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jakub Jelinek @ 2023-12-07  8:36 UTC (permalink / raw)
  To: Richard Biener, Vladimir Makarov; +Cc: gcc-patches

Hi!

As documented, --param min-nondebug-insn-uid= is very useful in debugging
-fcompare-debug issues in RTL dumps, without it it is really hard to
find differences.  With it, DEBUG_INSNs generally use low INSN_UIDs
(1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
For good results, the parameter should be larger than the number of
DEBUG_INSNs in all or at least problematic functions, so I typically
use --param min-nondebug-insn-uid=10000 or --param
min-nondebug-insn-uid=1000.

The PR is about using --param min-nondebug-insn-uid=2147483647 or
similar behavior can be achieved with that minus some epsilon,
INSN_UIDs for the non-debug insns then wrap around and as they are signed,
all kinds of things break.  Obviously, that can happen even without that
option, but functions containing more than 2147483647 insns usually don't
compile much earlier due to getting out of memory.
As it is a debugging option, I'd prefer not to impose any drastically small
limits on it because if a function has a lot of DEBUG_INSNs, it is useful
to start still above them, otherwise the allocation of uids will DTRT
even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
allocations.

So, the following patch uses 0x40000000 limit, half the maximum amount for
DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
unlikely overflows in real world.

Note, using large min-nondebug-insn-uid is very expensive for compile time
memory and compile time, because DF as well as various RTL passes use
arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
ditto df (df->insns).

Now, in LRA I've ran into ICEs already with
--param min-nondebug-insn-uid=0x2aaaaaaa
on 64-bit host.  It uses a custom vector management and wants to grow
allocation 1.5x when growing, but all this computation is done in int,
so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value.  And
unlike vec.cc growing which also uses unsigned int type for the above
(and the + 1 is not there), it also doesn't make sure if there is an
overflow that it allocates at least as much as needed, vec.cc
does
  if ...
  else
    /* Grow slower when large.  */
    alloc = (alloc * 3 / 2);

  /* If this is still too small, set it to the right size. */
  if (alloc < desired)
    alloc = desired;
so even if there is overflow during the * 1.5 computation, but
desired is still representable in the range of the alloced counter
(31-bits in both vec.h and LRA), it doesn't grow exponentially but
at least works for the current value.

So, one way to fix the LRA issue would be just to use
  lra_insn_recog_data_len = index * 3U / 2;
  if (lra_insn_recog_data_len <= index)
    lra_insn_recog_data_len = index + 1;
basically do what vec.cc does.  I thought we can do better for
both vec.cc and LRA on 64-bit hosts even without growing the allocated
counters, but now that I look at it again, perhaps we can't.
The above overflows already with original alloc or lra_insn_recog_data_len
0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
1.  I thought (and the patch implements it) that we could use
alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
which is still ok in unsigned, but given that vec.h then stores the
counter into unsigned m_alloc:31; bit-field, it is too much.

The patch below is what I've actually bootstrapped/regtested on
x86_64-linux and i686-linux, but given the above I think I should drop
the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.

With the lra.cc change, one can actually compile simple function
with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
(i.e. the new limit), but already needed quite a big part of my 32GB
RAM + 24GB swap.
The patch adds a dg-skip-if for that case though, because such option
is way too much for 32-bit hosts even at -O0 and empty function,
and with -O3 on a longer function it is too much for average 64-bit host
as well.  Without the dg-skip-if I got on 64-bit host:
cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
and
cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
on 32-bit host.  A test requiring more than 532GB of RAM on 64-bit hosts
is just too much for our testsuite.

2023-12-07  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112411
	* params.opt (-param=min-nondebug-insn-uid=): Add
	IntegerRange(0, 1073741824).
	* lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3
	rather than 3 in * 3 / 2 computation and if the result is
	smaller or equal to index, use index + 1.
	* vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3
	rather than 3 in * 3 / 2 computation.

	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
	--param min-nondebug-insn-uid=1073741824.

--- gcc/params.opt.jj	2023-11-02 07:49:18.010852541 +0100
+++ gcc/params.opt	2023-12-06 18:55:57.045420935 +0100
@@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
 The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
 
 -param=min-nondebug-insn-uid=
-Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
+Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
 The minimum UID to be used for a nondebug insn.
 
 -param=min-size-for-stack-sharing=
--- gcc/lra.cc.jj	2023-12-05 13:17:29.642260866 +0100
+++ gcc/lra.cc	2023-12-06 19:52:01.759241999 +0100
@@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
   if (lra_insn_recog_data_len > index)
     return;
   old = lra_insn_recog_data_len;
-  lra_insn_recog_data_len = index * 3 / 2 + 1;
+  lra_insn_recog_data_len = index * (size_t) 3 / 2;
+  if (lra_insn_recog_data_len <= index)
+    lra_insn_recog_data_len = index + 1;
   lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
 				    lra_insn_recog_data,
 				    lra_insn_recog_data_len);
--- gcc/vec.cc.jj	2023-09-27 10:37:39.329838572 +0200
+++ gcc/vec.cc	2023-12-06 19:53:34.670940078 +0100
@@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi
     alloc = alloc * 2;
   else
     /* Grow slower when large.  */
-    alloc = (alloc * 3 / 2);
+    alloc = (alloc * (size_t) 3 / 2);
 
   /* If this is still too small, set it to the right size. */
   if (alloc < desired)
--- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj	2020-01-12 11:54:37.463397567 +0100
+++ gcc/testsuite/gcc.dg/params/blocksort-part.c	2023-12-07 08:46:11.656974144 +0100
@@ -1,4 +1,5 @@
 /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
+/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
 
 /*-------------------------------------------------------------*/
 /*--- Block sorting machinery                               ---*/

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07  8:36 [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411] Jakub Jelinek
@ 2023-12-07  8:39 ` Jakub Jelinek
  2023-12-07 10:12   ` Richard Biener
  2023-12-08 17:49   ` Vladimir Makarov
  2023-12-07  9:49 ` [PATCH] " Richard Biener
  2023-12-07 10:54 ` Jakub Jelinek
  2 siblings, 2 replies; 10+ messages in thread
From: Jakub Jelinek @ 2023-12-07  8:39 UTC (permalink / raw)
  To: Richard Biener, Vladimir Makarov; +Cc: gcc-patches

On Thu, Dec 07, 2023 at 09:36:22AM +0100, Jakub Jelinek wrote:
> So, one way to fix the LRA issue would be just to use
>   lra_insn_recog_data_len = index * 3U / 2;
>   if (lra_insn_recog_data_len <= index)
>     lra_insn_recog_data_len = index + 1;
> basically do what vec.cc does.  I thought we can do better for
> both vec.cc and LRA on 64-bit hosts even without growing the allocated
> counters, but now that I look at it again, perhaps we can't.
> The above overflows already with original alloc or lra_insn_recog_data_len
> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> 1.  I thought (and the patch implements it) that we could use
> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> which is still ok in unsigned, but given that vec.h then stores the
> counter into unsigned m_alloc:31; bit-field, it is too much.
> 
> The patch below is what I've actually bootstrapped/regtested on
> x86_64-linux and i686-linux, but given the above I think I should drop
> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.

Here is so far untested adjusted patch, which does the computation
just in unsigned int rather than size_t, because doing it in size_t
wouldn't improve things.

2023-12-07  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112411
	* params.opt (-param=min-nondebug-insn-uid=): Add
	IntegerRange(0, 1073741824).
	* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
	in * 3 / 2 computation and if the result is smaller or equal to
	index, use index + 1.

	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
	--param min-nondebug-insn-uid=1073741824.

--- gcc/params.opt.jj	2023-11-02 07:49:18.010852541 +0100
+++ gcc/params.opt	2023-12-06 18:55:57.045420935 +0100
@@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
 The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
 
 -param=min-nondebug-insn-uid=
-Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
+Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
 The minimum UID to be used for a nondebug insn.
 
 -param=min-size-for-stack-sharing=
--- gcc/lra.cc.jj	2023-12-05 13:17:29.642260866 +0100
+++ gcc/lra.cc	2023-12-06 19:52:01.759241999 +0100
@@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
   if (lra_insn_recog_data_len > index)
     return;
   old = lra_insn_recog_data_len;
-  lra_insn_recog_data_len = index * 3 / 2 + 1;
+  lra_insn_recog_data_len = index * 3U / 2;
+  if (lra_insn_recog_data_len <= index)
+    lra_insn_recog_data_len = index + 1;
   lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
 				    lra_insn_recog_data,
 				    lra_insn_recog_data_len);
--- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj	2020-01-12 11:54:37.463397567 +0100
+++ gcc/testsuite/gcc.dg/params/blocksort-part.c	2023-12-07 08:46:11.656974144 +0100
@@ -1,4 +1,5 @@
 /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
+/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
 
 /*-------------------------------------------------------------*/
 /*--- Block sorting machinery                               ---*/


	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07  8:36 [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411] Jakub Jelinek
  2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
@ 2023-12-07  9:49 ` Richard Biener
  2023-12-07 10:54 ` Jakub Jelinek
  2 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2023-12-07  9:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Makarov, gcc-patches

On Thu, 7 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> As documented, --param min-nondebug-insn-uid= is very useful in debugging
> -fcompare-debug issues in RTL dumps, without it it is really hard to
> find differences.  With it, DEBUG_INSNs generally use low INSN_UIDs
> (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
> For good results, the parameter should be larger than the number of
> DEBUG_INSNs in all or at least problematic functions, so I typically
> use --param min-nondebug-insn-uid=10000 or --param
> min-nondebug-insn-uid=1000.
> 
> The PR is about using --param min-nondebug-insn-uid=2147483647 or
> similar behavior can be achieved with that minus some epsilon,
> INSN_UIDs for the non-debug insns then wrap around and as they are signed,
> all kinds of things break.  Obviously, that can happen even without that
> option, but functions containing more than 2147483647 insns usually don't
> compile much earlier due to getting out of memory.
> As it is a debugging option, I'd prefer not to impose any drastically small
> limits on it because if a function has a lot of DEBUG_INSNs, it is useful
> to start still above them, otherwise the allocation of uids will DTRT
> even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
> allocations.
> 
> So, the following patch uses 0x40000000 limit, half the maximum amount for
> DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
> unlikely overflows in real world.
> 
> Note, using large min-nondebug-insn-uid is very expensive for compile time
> memory and compile time, because DF as well as various RTL passes use
> arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
> ditto df (df->insns).
> 
> Now, in LRA I've ran into ICEs already with
> --param min-nondebug-insn-uid=0x2aaaaaaa
> on 64-bit host.  It uses a custom vector management and wants to grow
> allocation 1.5x when growing, but all this computation is done in int,
> so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value.  And
> unlike vec.cc growing which also uses unsigned int type for the above
> (and the + 1 is not there), it also doesn't make sure if there is an
> overflow that it allocates at least as much as needed, vec.cc
> does
>   if ...
>   else
>     /* Grow slower when large.  */
>     alloc = (alloc * 3 / 2);
> 
>   /* If this is still too small, set it to the right size. */
>   if (alloc < desired)
>     alloc = desired;
> so even if there is overflow during the * 1.5 computation, but
> desired is still representable in the range of the alloced counter
> (31-bits in both vec.h and LRA), it doesn't grow exponentially but
> at least works for the current value.
> 
> So, one way to fix the LRA issue would be just to use
>   lra_insn_recog_data_len = index * 3U / 2;
>   if (lra_insn_recog_data_len <= index)
>     lra_insn_recog_data_len = index + 1;
> basically do what vec.cc does.  I thought we can do better for
> both vec.cc and LRA on 64-bit hosts even without growing the allocated
> counters, but now that I look at it again, perhaps we can't.
> The above overflows already with original alloc or lra_insn_recog_data_len
> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> 1.  I thought (and the patch implements it) that we could use
> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> which is still ok in unsigned, but given that vec.h then stores the
> counter into unsigned m_alloc:31; bit-field, it is too much.
> 
> The patch below is what I've actually bootstrapped/regtested on
> x86_64-linux and i686-linux, but given the above I think I should drop
> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
> 
> With the lra.cc change, one can actually compile simple function
> with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
> (i.e. the new limit), but already needed quite a big part of my 32GB
> RAM + 24GB swap.
> The patch adds a dg-skip-if for that case though, because such option
> is way too much for 32-bit hosts even at -O0 and empty function,
> and with -O3 on a longer function it is too much for average 64-bit host
> as well.  Without the dg-skip-if I got on 64-bit host:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
> and
> cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
> on 32-bit host.  A test requiring more than 532GB of RAM on 64-bit hosts
> is just too much for our testsuite.
> 
> 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/112411
> 	* params.opt (-param=min-nondebug-insn-uid=): Add
> 	IntegerRange(0, 1073741824).
> 	* lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3
> 	rather than 3 in * 3 / 2 computation and if the result is
> 	smaller or equal to index, use index + 1.
> 	* vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3
> 	rather than 3 in * 3 / 2 computation.
> 
> 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> 	--param min-nondebug-insn-uid=1073741824.
> 
> --- gcc/params.opt.jj	2023-11-02 07:49:18.010852541 +0100
> +++ gcc/params.opt	2023-12-06 18:55:57.045420935 +0100
> @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
>  The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
>  
>  -param=min-nondebug-insn-uid=
> -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
>  The minimum UID to be used for a nondebug insn.

This new limit is OK.

>  -param=min-size-for-stack-sharing=
> --- gcc/lra.cc.jj	2023-12-05 13:17:29.642260866 +0100
> +++ gcc/lra.cc	2023-12-06 19:52:01.759241999 +0100
> @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
>    if (lra_insn_recog_data_len > index)
>      return;
>    old = lra_insn_recog_data_len;
> -  lra_insn_recog_data_len = index * 3 / 2 + 1;
> +  lra_insn_recog_data_len = index * (size_t) 3 / 2;
> +  if (lra_insn_recog_data_len <= index)
> +    lra_insn_recog_data_len = index + 1;

As is this (though I believe 3U would be better, no need for 64bit
multiplies with the later check).

>    lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
>  				    lra_insn_recog_data,
>  				    lra_insn_recog_data_len);
> --- gcc/vec.cc.jj	2023-09-27 10:37:39.329838572 +0200
> +++ gcc/vec.cc	2023-12-06 19:53:34.670940078 +0100
> @@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi
>      alloc = alloc * 2;
>    else
>      /* Grow slower when large.  */
> -    alloc = (alloc * 3 / 2);
> +    alloc = (alloc * (size_t) 3 / 2);

I agree this isn't much of an improvement, alloc < desired
catches overflow but we don't catch when desired doesn't fit
the 31 bit limit ...

Richard.

>    /* If this is still too small, set it to the right size. */
>    if (alloc < desired)
> --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj	2020-01-12 11:54:37.463397567 +0100
> +++ gcc/testsuite/gcc.dg/params/blocksort-part.c	2023-12-07 08:46:11.656974144 +0100
> @@ -1,4 +1,5 @@
>  /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
> +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
>  
>  /*-------------------------------------------------------------*/
>  /*--- Block sorting machinery                               ---*/
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
@ 2023-12-07 10:12   ` Richard Biener
  2023-12-07 10:32     ` Jakub Jelinek
  2023-12-08 17:49   ` Vladimir Makarov
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-12-07 10:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Makarov, gcc-patches

On Thu, 7 Dec 2023, Jakub Jelinek wrote:

> On Thu, Dec 07, 2023 at 09:36:22AM +0100, Jakub Jelinek wrote:
> > So, one way to fix the LRA issue would be just to use
> >   lra_insn_recog_data_len = index * 3U / 2;
> >   if (lra_insn_recog_data_len <= index)
> >     lra_insn_recog_data_len = index + 1;
> > basically do what vec.cc does.  I thought we can do better for
> > both vec.cc and LRA on 64-bit hosts even without growing the allocated
> > counters, but now that I look at it again, perhaps we can't.
> > The above overflows already with original alloc or lra_insn_recog_data_len
> > 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> > and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> > 1.  I thought (and the patch implements it) that we could use
> > alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> > that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> > which is still ok in unsigned, but given that vec.h then stores the
> > counter into unsigned m_alloc:31; bit-field, it is too much.
> > 
> > The patch below is what I've actually bootstrapped/regtested on
> > x86_64-linux and i686-linux, but given the above I think I should drop
> > the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
> 
> Here is so far untested adjusted patch, which does the computation
> just in unsigned int rather than size_t, because doing it in size_t
> wouldn't improve things.

OK, but ...

> 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/112411
> 	* params.opt (-param=min-nondebug-insn-uid=): Add
> 	IntegerRange(0, 1073741824).
> 	* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
> 	in * 3 / 2 computation and if the result is smaller or equal to
> 	index, use index + 1.
> 
> 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> 	--param min-nondebug-insn-uid=1073741824.

what's this change for?  Do we test the actual param limit?  Can you
skip for the param without specifying the actual upper bound?

Thanks,
Richard.

> --- gcc/params.opt.jj	2023-11-02 07:49:18.010852541 +0100
> +++ gcc/params.opt	2023-12-06 18:55:57.045420935 +0100
> @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
>  The minimum threshold for probability of semi-invariant condition statement to trigger loop split.
>  
>  -param=min-nondebug-insn-uid=
> -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param IntegerRange(0, 1073741824)
>  The minimum UID to be used for a nondebug insn.
>  
>  -param=min-size-for-stack-sharing=
> --- gcc/lra.cc.jj	2023-12-05 13:17:29.642260866 +0100
> +++ gcc/lra.cc	2023-12-06 19:52:01.759241999 +0100
> @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
>    if (lra_insn_recog_data_len > index)
>      return;
>    old = lra_insn_recog_data_len;
> -  lra_insn_recog_data_len = index * 3 / 2 + 1;
> +  lra_insn_recog_data_len = index * 3U / 2;
> +  if (lra_insn_recog_data_len <= index)
> +    lra_insn_recog_data_len = index + 1;
>    lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
>  				    lra_insn_recog_data,
>  				    lra_insn_recog_data_len);
> --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj	2020-01-12 11:54:37.463397567 +0100
> +++ gcc/testsuite/gcc.dg/params/blocksort-part.c	2023-12-07 08:46:11.656974144 +0100
> @@ -1,4 +1,5 @@
>  /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
> +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit might require hundreds of GB of RAM" { *-*-* } { "--param min-nondebug-insn-uid=1073741824" } } */
>  
>  /*-------------------------------------------------------------*/
>  /*--- Block sorting machinery                               ---*/
> 
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07 10:12   ` Richard Biener
@ 2023-12-07 10:32     ` Jakub Jelinek
  2023-12-07 12:06       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-12-07 10:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Vladimir Makarov, gcc-patches

On Thu, Dec 07, 2023 at 11:12:39AM +0100, Richard Biener wrote:
> > 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR middle-end/112411
> > 	* params.opt (-param=min-nondebug-insn-uid=): Add
> > 	IntegerRange(0, 1073741824).
> > 	* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
> > 	in * 3 / 2 computation and if the result is smaller or equal to
> > 	index, use index + 1.
> > 
> > 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> > 	--param min-nondebug-insn-uid=1073741824.
> 
> what's this change for?  Do we test the actual param limit?  Can you
> skip for the param without specifying the actual upper bound?

params.exp iterates over all params which have a range selected and tries
to compile the testcase(s) with both the minimum and if any maximum of the
range.
I think it is useful to test normally with --param min-nondebug-insn-uid=0
the minimum, that means it is off, it is just the maximum which either
doesn't work or requires those hundreds of gigabytes of memory (guess I
should look at what needs that much).
I don't know how else to skip just the maximum test for the param except
to specify the exact value; if params.opt changes that value, people will
notice FAILs of the test and the test can be adjusted too (unless the
maximum is lowered into something so small that it works well even on low
memory 32-bit hosts).

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07  8:36 [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411] Jakub Jelinek
  2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
  2023-12-07  9:49 ` [PATCH] " Richard Biener
@ 2023-12-07 10:54 ` Jakub Jelinek
  2023-12-08  7:37   ` [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411] Jakub Jelinek
  2 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-12-07 10:54 UTC (permalink / raw)
  To: Richard Biener, Vladimir Makarov, gcc-patches

On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote:
> Without the dg-skip-if I got on 64-bit host:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes

I've looked at this and the problem is in haifa-sched.cc:
9047	      h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000
and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h
then treats the value as unsigned, it attempts to allocate
0xe0000374U * 152UL bytes, i.e. those 532GB.  If the above is fixed to do
3U * get_max_uid () / 2 instead, it will get slightly better and will only
need 0x60000373U * 152UL bytes, i.e. 228GB.
Even better improvement would be change haifa-sched.cc, so that the
vector contains just pointers to _haifa_insn_data rather than the structures
themselves, then we'd just allocate 12GB for the vector itself and then as
needed for the actual data.  But at the expense of slowing the scheduler a
little bit.  And the question is how sparse typically the insn uids are when
scheduling without --param min-nondebug-insn-uid=

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07 10:32     ` Jakub Jelinek
@ 2023-12-07 12:06       ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2023-12-07 12:06 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Makarov, gcc-patches

On Thu, 7 Dec 2023, Jakub Jelinek wrote:

> On Thu, Dec 07, 2023 at 11:12:39AM +0100, Richard Biener wrote:
> > > 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
> > > 
> > > 	PR middle-end/112411
> > > 	* params.opt (-param=min-nondebug-insn-uid=): Add
> > > 	IntegerRange(0, 1073741824).
> > > 	* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
> > > 	in * 3 / 2 computation and if the result is smaller or equal to
> > > 	index, use index + 1.
> > > 
> > > 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> > > 	--param min-nondebug-insn-uid=1073741824.
> > 
> > what's this change for?  Do we test the actual param limit?  Can you
> > skip for the param without specifying the actual upper bound?
> 
> params.exp iterates over all params which have a range selected and tries
> to compile the testcase(s) with both the minimum and if any maximum of the
> range.
> I think it is useful to test normally with --param min-nondebug-insn-uid=0
> the minimum, that means it is off, it is just the maximum which either
> doesn't work or requires those hundreds of gigabytes of memory (guess I
> should look at what needs that much).
> I don't know how else to skip just the maximum test for the param except
> to specify the exact value; if params.opt changes that value, people will
> notice FAILs of the test and the test can be adjusted too (unless the
> maximum is lowered into something so small that it works well even on low
> memory 32-bit hosts).

Ah, OK then.

Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411]
  2023-12-07 10:54 ` Jakub Jelinek
@ 2023-12-08  7:37   ` Jakub Jelinek
  2023-12-08  7:39     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-12-08  7:37 UTC (permalink / raw)
  To: Richard Biener, Vladimir Makarov; +Cc: gcc-patches

On Thu, Dec 07, 2023 at 11:54:01AM +0100, Jakub Jelinek wrote:
> On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote:
> > Without the dg-skip-if I got on 64-bit host:
> > cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
> 
> I've looked at this and the problem is in haifa-sched.cc:
> 9047	      h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
> get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000
> and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h
> then treats the value as unsigned, it attempts to allocate
> 0xe0000374U * 152UL bytes, i.e. those 532GB.  If the above is fixed to do
> 3U * get_max_uid () / 2 instead, it will get slightly better and will only
> need 0x60000373U * 152UL bytes, i.e. 228GB.

Here it is in a patch form.
For the other changes, it would be more work and the question is if it would
be beneficial for average compilation, if the uids aren't sparse enough,
it would waste more memory (8-bytes per uid for the pointer in the array
plus the 152 byte allocation, probably even rounded up for next bucket size
unless we use say pool allocator).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112411
	* haifa-sched.cc (extend_h_i_d): Use 3U instead of 3 in
	3 * get_max_uid () / 2 calculation.

--- gcc/haifa-sched.cc.jj	2023-08-08 15:55:06.705161670 +0200
+++ gcc/haifa-sched.cc	2023-12-07 11:57:17.869611646 +0100
@@ -9044,7 +9044,7 @@ extend_h_i_d (void)
   if (reserve > 0
       && ! h_i_d.space (reserve))
     {
-      h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
+      h_i_d.safe_grow_cleared (3U * get_max_uid () / 2, true);
       sched_extend_target ();
     }
 }

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411]
  2023-12-08  7:37   ` [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411] Jakub Jelinek
@ 2023-12-08  7:39     ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2023-12-08  7:39 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Vladimir Makarov, gcc-patches

On Fri, 8 Dec 2023, Jakub Jelinek wrote:

> On Thu, Dec 07, 2023 at 11:54:01AM +0100, Jakub Jelinek wrote:
> > On Thu, Dec 07, 2023 at 09:36:23AM +0100, Jakub Jelinek wrote:
> > > Without the dg-skip-if I got on 64-bit host:
> > > cc1: out of memory allocating 571230784744 bytes after a total of 2772992 bytes
> > 
> > I've looked at this and the problem is in haifa-sched.cc:
> > 9047	      h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
> > get_max_uid () is 0x4000024d with the --param min-nondebug-insn-uid=0x40000000
> > and so 3 * get_max_uid () / 2 actually overflows to -536870028 but as vec.h
> > then treats the value as unsigned, it attempts to allocate
> > 0xe0000374U * 152UL bytes, i.e. those 532GB.  If the above is fixed to do
> > 3U * get_max_uid () / 2 instead, it will get slightly better and will only
> > need 0x60000373U * 152UL bytes, i.e. 228GB.
> 
> Here it is in a patch form.
> For the other changes, it would be more work and the question is if it would
> be beneficial for average compilation, if the uids aren't sparse enough,
> it would waste more memory (8-bytes per uid for the pointer in the array
> plus the 152 byte allocation, probably even rounded up for next bucket size
> unless we use say pool allocator).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-12-08  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/112411
> 	* haifa-sched.cc (extend_h_i_d): Use 3U instead of 3 in
> 	3 * get_max_uid () / 2 calculation.
> 
> --- gcc/haifa-sched.cc.jj	2023-08-08 15:55:06.705161670 +0200
> +++ gcc/haifa-sched.cc	2023-12-07 11:57:17.869611646 +0100
> @@ -9044,7 +9044,7 @@ extend_h_i_d (void)
>    if (reserve > 0
>        && ! h_i_d.space (reserve))
>      {
> -      h_i_d.safe_grow_cleared (3 * get_max_uid () / 2, true);
> +      h_i_d.safe_grow_cleared (3U * get_max_uid () / 2, true);
>        sched_extend_target ();
>      }
>  }
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] v2: Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411]
  2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
  2023-12-07 10:12   ` Richard Biener
@ 2023-12-08 17:49   ` Vladimir Makarov
  1 sibling, 0 replies; 10+ messages in thread
From: Vladimir Makarov @ 2023-12-08 17:49 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: gcc-patches


On 12/7/23 03:39, Jakub Jelinek wrote:
> On Thu, Dec 07, 2023 at 09:36:22AM +0100, Jakub Jelinek wrote:
>> So, one way to fix the LRA issue would be just to use
>>    lra_insn_recog_data_len = index * 3U / 2;
>>    if (lra_insn_recog_data_len <= index)
>>      lra_insn_recog_data_len = index + 1;
>> basically do what vec.cc does.  I thought we can do better for
>> both vec.cc and LRA on 64-bit hosts even without growing the allocated
>> counters, but now that I look at it again, perhaps we can't.
>> The above overflows already with original alloc or lra_insn_recog_data_len
>> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
>> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
>> 1.  I thought (and the patch implements it) that we could use
>> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
>> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
>> which is still ok in unsigned, but given that vec.h then stores the
>> counter into unsigned m_alloc:31; bit-field, it is too much.
>>
>> The patch below is what I've actually bootstrapped/regtested on
>> x86_64-linux and i686-linux, but given the above I think I should drop
>> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
> Here is so far untested adjusted patch, which does the computation
> just in unsigned int rather than size_t, because doing it in size_t
> wouldn't improve things.
>
> 2023-12-07  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR middle-end/112411
> 	* params.opt (-param=min-nondebug-insn-uid=): Add
> 	IntegerRange(0, 1073741824).
> 	* lra.cc (check_and_expand_insn_recog_data): Use 3U rather than 3
> 	in * 3 / 2 computation and if the result is smaller or equal to
> 	index, use index + 1.
>
> 	* gcc.dg/params/blocksort-part.c: Add dg-skip-if for
> 	--param min-nondebug-insn-uid=1073741824.
>
Jakub, if you are still waiting for an approval,  LRA change is ok for 
me with given max param.

Thank you for fixing this.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-08 17:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-07  8:36 [PATCH] Add IntegerRange for -param=min-nondebug-insn-uid= and fix vector growing in LRA and vec [PR112411] Jakub Jelinek
2023-12-07  8:39 ` [PATCH] v2: " Jakub Jelinek
2023-12-07 10:12   ` Richard Biener
2023-12-07 10:32     ` Jakub Jelinek
2023-12-07 12:06       ` Richard Biener
2023-12-08 17:49   ` Vladimir Makarov
2023-12-07  9:49 ` [PATCH] " Richard Biener
2023-12-07 10:54 ` Jakub Jelinek
2023-12-08  7:37   ` [PATCH] haifa-sched: Avoid overflows in extend_h_i_d [PR112411] Jakub Jelinek
2023-12-08  7:39     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).