[PATCH] Add param for bb limit to invoke fast

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Add param for bb limit to invoke fast_vrp.
@ 2024-06-21 13:02 Andrew MacLeod
  2024-06-22 13:15 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew MacLeod @ 2024-06-21 13:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: hernandez, aldy, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

This patch adds

     --param=vrp-block-limit=N

When the basic block counter for a function exceeded 'N' , VRP is 
invoked with the new fast_vrp algorithm instead.   This algorithm uses a 
lot less memory and processing power, although it does get a few less 
things.

Primary motivation is cases like 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP 
passes consume about 600 seconds of the compile time, and a lot of 
memory.      With fast_vrp, it spends less than 10 seconds total in the 
3 passes of VRP.     This test case has about 400,000 basic blocks.

The default for N in this patch is 150,000,  arbitrarily chosen.

This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0 
as well) on x86_64-pc-linux-gnu, with no regressions.

What do you think, OK for trunk?

Andrew

PS sorry,. it doesn't help the threader in that PR :-(


[-- Attachment #2: 0004-Add-param-for-bb-limit-to-invoke-fast_vrp.patch --]
[-- Type: text/x-patch, Size: 2449 bytes --]

From 3bb9bd3ca8038676e45b0bddcda91cbed7e51662 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 4/5] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	gcc/doc/
	* invoke.texi (vrp-block-limit): Document.

	gcc/
	* params.opt (-param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi | 3 +++
 gcc/params.opt      | 4 ++++
 gcc/tree-vrp.cc     | 4 ++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d7a87fde86..f2f8f6334dc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@ this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
 -param=vrp-sparse-threshold=
 Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 4fc33e63e7d..eef02146ec6 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1330,9 +1330,9 @@ public:
   unsigned int execute (function *fun) final override
     {
       // Check for fast vrp.
-      if (&data == &pass_data_fast_vrp)
+      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+	  &data == &pass_data_fast_vrp)
 	return execute_fast_vrp (fun, final_p);
-
       return execute_ranger_vrp (fun, final_p);
     }
 
-- 
2.45.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-21 13:02 [PATCH] Add param for bb limit to invoke fast_vrp Andrew MacLeod
@ 2024-06-22 13:15 ` Richard Biener
  2024-06-25  2:19   ` Andrew MacLeod
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-22 13:15 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, hernandez, aldy

On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> This patch adds
>
>      --param=vrp-block-limit=N
>
> When the basic block counter for a function exceeded 'N' , VRP is
> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> lot less memory and processing power, although it does get a few less
> things.
>
> Primary motivation is cases like
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> passes consume about 600 seconds of the compile time, and a lot of
> memory.      With fast_vrp, it spends less than 10 seconds total in the
> 3 passes of VRP.     This test case has about 400,000 basic blocks.
>
> The default for N in this patch is 150,000,  arbitrarily chosen.
>
> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> as well) on x86_64-pc-linux-gnu, with no regressions.
>
> What do you think, OK for trunk?

+      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+         &data == &pass_data_fast_vrp)

|| goes to the next line.

Btw, we have -Wdisabled-optimization for these cases which should
say sth like "function has excess of %d number of basic blocks
(--param vrp-block-limit=%d), using fast VRP algorithm"
or so in this case.

As I wrote in the PR the priority should be -O1 compile-time
performance and memory use.

Richard.

> Andrew
>
> PS sorry,. it doesn't help the threader in that PR :-(

It's probably one of the known bottlenecks there - IIRC the path range
queries make O(n^2) work.  I can look at this at some point as I've
dealt with the slowness of this pass in the past.

There is param_max_jump_thread_paths that should limit searching
but there is IIRC no way to limit the work path ranger actually does
when resolving the query.

Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-22 13:15 ` Richard Biener
@ 2024-06-25  2:19   ` Andrew MacLeod
  2024-06-25  2:35     ` Andrew Pinski
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew MacLeod @ 2024-06-25  2:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hernandez, aldy

[-- Attachment #1: Type: text/plain, Size: 3835 bytes --]


On 6/22/24 09:15, Richard Biener wrote:
> On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> This patch adds
>>
>>       --param=vrp-block-limit=N
>>
>> When the basic block counter for a function exceeded 'N' , VRP is
>> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
>> lot less memory and processing power, although it does get a few less
>> things.
>>
>> Primary motivation is cases like
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
>> passes consume about 600 seconds of the compile time, and a lot of
>> memory.      With fast_vrp, it spends less than 10 seconds total in the
>> 3 passes of VRP.     This test case has about 400,000 basic blocks.
>>
>> The default for N in this patch is 150,000,  arbitrarily chosen.
>>
>> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
>> as well) on x86_64-pc-linux-gnu, with no regressions.
>>
>> What do you think, OK for trunk?
> +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> +         &data == &pass_data_fast_vrp)
>
> || goes to the next line.
>
> Btw, we have -Wdisabled-optimization for these cases which should
> say sth like "function has excess of %d number of basic blocks
> (--param vrp-block-limit=%d), using fast VRP algorithm"
> or so in this case.
>
> As I wrote in the PR the priority should be -O1 compile-time
> performance and memory use.


Yeah, I just wanted to use it as a model for "bad" cases for ranger.   
Adjusted patch attached which now issues the warning.

I also found that the transitive relations were causing a small blowup 
in time for relation processing now that I turned relations on for fast 
VRP.  I commited a patch and fast_vrp no longer does transitives.

If you want to experiment with enabling fast VRP at -O1, it should be 
fast all the time now.  I think :-)    This testcase runs in about 95 
seconds on my test machine.  if I turn on VRP, a single VRP pass takes 
about 2.5 seconds.    Its all set up, you can just add:

NEXT_PASS (pass_fast_vrp);

at an appropriate spot.

> Richard.
>
>> Andrew
>>
>> PS sorry,. it doesn't help the threader in that PR :-(
> It's probably one of the known bottlenecks there - IIRC the path range
> queries make O(n^2) work.  I can look at this at some point as I've
> dealt with the slowness of this pass in the past.
>
> There is param_max_jump_thread_paths that should limit searching
> but there is IIRC no way to limit the work path ranger actually does
> when resolving the query.
>
Yeah, Id like to talk to Aldy about revamping the threader now that some 
of the newer facilities are available that fast_vrp uses.

We can calculate all the outgoing ranges for a block at once with :

   // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
   bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = 
NULL);

This is what the fast_vrp routines uses.  We can gather all range 
restrictions generated from an edge efficiently just once and then 
intersect them with a known range as we walk the different paths. We 
don't need the gori exports , nor any of the other on-demand bits where 
we calculate each export range dynamically.. I suspect it would reduce 
the workload and memory impact quite a bit, but I'm not really familiar 
with exactly how the threader uses those things.

It'd require some minor tweaking to the lazy_ssa_cache to make the 
bitmap of names set accessible. This  would provide similar 
functionality to what the gori export () routine provides.  Both 
relations and inferred ranges should only need to be calculated once per 
block as well and could/should/would be applied the same way if they are 
present.   I don't *think* the threader uses any of the def chains, but 
Aldy can chip in.

Andrew

[-- Attachment #2: 0002-Add-param-for-bb-limit-to-invoke-fast_vrp.patch --]
[-- Type: text/x-patch, Size: 3105 bytes --]

From 15f697aad90c35e42a4416d7db6e7289c0f5aae3 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 2/2] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	gcc/doc/
	* invoke.texi (vrp-block-limit): Document.

	gcc/
	* params.opt (-param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi |  3 +++
 gcc/params.opt      |  4 ++++
 gcc/tree-vrp.cc     | 16 ++++++++++++++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c790e2f3518..80da5e9d306 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@ this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
 -param=vrp-sparse-threshold=
 Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 26979b706e5..a81fcc696b7 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-cp.h"
 #include "ipa-prop.h"
 #include "attribs.h"
+#include "diagnostic-core.h"
 
 // This class is utilized by VRP and ranger to remove __builtin_unreachable
 // calls, and reflect any resulting global ranges.
@@ -1331,9 +1332,20 @@ public:
   unsigned int execute (function *fun) final override
     {
       // Check for fast vrp.
-      if (&data == &pass_data_fast_vrp)
-	return execute_fast_vrp (fun, final_p);
+      bool use_fvrp = (&data == &pass_data_fast_vrp);
+      if (!use_fvrp && last_basic_block_for_fn (fun) > param_vrp_block_limit)
+	{
+	  use_fvrp = true;
+	  warning (OPT_Wdisabled_optimization,
+		   "Using fast VRP algorithm. %d basic blocks"
+		   " exceeds %s%d limit",
+		   n_basic_blocks_for_fn (fun),
+		   "--param=vrp-block-limit=",
+		   param_vrp_block_limit);
+	}
 
+      if (use_fvrp)
+	return execute_fast_vrp (fun, final_p);
       return execute_ranger_vrp (fun, final_p);
     }
 
-- 
2.45.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-25  2:19   ` Andrew MacLeod
@ 2024-06-25  2:35     ` Andrew Pinski
  2024-06-25  4:27       ` Andrew Pinski
  2024-06-25 13:23       ` Andrew MacLeod
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Pinski @ 2024-06-25  2:35 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Richard Biener, gcc-patches, hernandez, aldy

On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 6/22/24 09:15, Richard Biener wrote:
> > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >> This patch adds
> >>
> >>       --param=vrp-block-limit=N
> >>
> >> When the basic block counter for a function exceeded 'N' , VRP is
> >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> >> lot less memory and processing power, although it does get a few less
> >> things.
> >>
> >> Primary motivation is cases like
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> >> passes consume about 600 seconds of the compile time, and a lot of
> >> memory.      With fast_vrp, it spends less than 10 seconds total in the
> >> 3 passes of VRP.     This test case has about 400,000 basic blocks.
> >>
> >> The default for N in this patch is 150,000,  arbitrarily chosen.
> >>
> >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> >> as well) on x86_64-pc-linux-gnu, with no regressions.
> >>
> >> What do you think, OK for trunk?
> > +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > +         &data == &pass_data_fast_vrp)
> >
> > || goes to the next line.
> >
> > Btw, we have -Wdisabled-optimization for these cases which should
> > say sth like "function has excess of %d number of basic blocks
> > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > or so in this case.
> >
> > As I wrote in the PR the priority should be -O1 compile-time
> > performance and memory use.
>
>
> Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> Adjusted patch attached which now issues the warning.
>
> I also found that the transitive relations were causing a small blowup
> in time for relation processing now that I turned relations on for fast
> VRP.  I commited a patch and fast_vrp no longer does transitives.
>
> If you want to experiment with enabling fast VRP at -O1, it should be
> fast all the time now.  I think :-)    This testcase runs in about 95
> seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> about 2.5 seconds.    Its all set up, you can just add:
>
> NEXT_PASS (pass_fast_vrp);
>
> at an appropriate spot.
>
> > Richard.
> >
> >> Andrew
> >>
> >> PS sorry,. it doesn't help the threader in that PR :-(
> > It's probably one of the known bottlenecks there - IIRC the path range
> > queries make O(n^2) work.  I can look at this at some point as I've
> > dealt with the slowness of this pass in the past.
> >
> > There is param_max_jump_thread_paths that should limit searching
> > but there is IIRC no way to limit the work path ranger actually does
> > when resolving the query.
> >
> Yeah, Id like to talk to Aldy about revamping the threader now that some
> of the newer facilities are available that fast_vrp uses.
>
> We can calculate all the outgoing ranges for a block at once with :
>
>    // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
>    bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> NULL);
>
> This is what the fast_vrp routines uses.  We can gather all range
> restrictions generated from an edge efficiently just once and then
> intersect them with a known range as we walk the different paths. We
> don't need the gori exports , nor any of the other on-demand bits where
> we calculate each export range dynamically.. I suspect it would reduce
> the workload and memory impact quite a bit, but I'm not really familiar
> with exactly how the threader uses those things.
>
> It'd require some minor tweaking to the lazy_ssa_cache to make the
> bitmap of names set accessible. This  would provide similar
> functionality to what the gori export () routine provides.  Both
> relations and inferred ranges should only need to be calculated once per
> block as well and could/should/would be applied the same way if they are
> present.   I don't *think* the threader uses any of the def chains, but
> Aldy can chip in.

+   warning (OPT_Wdisabled_optimization,
+    "Using fast VRP algorithm. %d basic blocks"
+    " exceeds %s%d limit",
+    n_basic_blocks_for_fn (fun),
+    "--param=vrp-block-limit=",
+    param_vrp_block_limit);

This should be:
warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
    " exceeds %<%--param=vrp-block-limit=d%> limit",
n_basic_blocks_for_fn (fun), param_vrp_block_limit);

I had thought it was mentioned that options should be quoted but it is
not mentioned in the coding conventions:
https://gcc.gnu.org/codingconventions.html#Diagnostics

But it is mentioned in
https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
; This is why you were getting an error as you mentioned on IRC.


>
> Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-25  2:35     ` Andrew Pinski
@ 2024-06-25  4:27       ` Andrew Pinski
  2024-06-25 19:51         ` [COMMITTED] " Andrew MacLeod
  2024-06-25 21:32         ` [PATCH] " David Malcolm
  2024-06-25 13:23       ` Andrew MacLeod
  1 sibling, 2 replies; 8+ messages in thread
From: Andrew Pinski @ 2024-06-25  4:27 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Richard Biener, gcc-patches, hernandez, aldy

On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >
> >
> > On 6/22/24 09:15, Richard Biener wrote:
> > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >> This patch adds
> > >>
> > >>       --param=vrp-block-limit=N
> > >>
> > >> When the basic block counter for a function exceeded 'N' , VRP is
> > >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> > >> lot less memory and processing power, although it does get a few less
> > >> things.
> > >>
> > >> Primary motivation is cases like
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> > >> passes consume about 600 seconds of the compile time, and a lot of
> > >> memory.      With fast_vrp, it spends less than 10 seconds total in the
> > >> 3 passes of VRP.     This test case has about 400,000 basic blocks.
> > >>
> > >> The default for N in this patch is 150,000,  arbitrarily chosen.
> > >>
> > >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> > >> as well) on x86_64-pc-linux-gnu, with no regressions.
> > >>
> > >> What do you think, OK for trunk?
> > > +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > > +         &data == &pass_data_fast_vrp)
> > >
> > > || goes to the next line.
> > >
> > > Btw, we have -Wdisabled-optimization for these cases which should
> > > say sth like "function has excess of %d number of basic blocks
> > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > or so in this case.
> > >
> > > As I wrote in the PR the priority should be -O1 compile-time
> > > performance and memory use.
> >
> >
> > Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> > Adjusted patch attached which now issues the warning.
> >
> > I also found that the transitive relations were causing a small blowup
> > in time for relation processing now that I turned relations on for fast
> > VRP.  I commited a patch and fast_vrp no longer does transitives.
> >
> > If you want to experiment with enabling fast VRP at -O1, it should be
> > fast all the time now.  I think :-)    This testcase runs in about 95
> > seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> > about 2.5 seconds.    Its all set up, you can just add:
> >
> > NEXT_PASS (pass_fast_vrp);
> >
> > at an appropriate spot.
> >
> > > Richard.
> > >
> > >> Andrew
> > >>
> > >> PS sorry,. it doesn't help the threader in that PR :-(
> > > It's probably one of the known bottlenecks there - IIRC the path range
> > > queries make O(n^2) work.  I can look at this at some point as I've
> > > dealt with the slowness of this pass in the past.
> > >
> > > There is param_max_jump_thread_paths that should limit searching
> > > but there is IIRC no way to limit the work path ranger actually does
> > > when resolving the query.
> > >
> > Yeah, Id like to talk to Aldy about revamping the threader now that some
> > of the newer facilities are available that fast_vrp uses.
> >
> > We can calculate all the outgoing ranges for a block at once with :
> >
> >    // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
> >    bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> > NULL);
> >
> > This is what the fast_vrp routines uses.  We can gather all range
> > restrictions generated from an edge efficiently just once and then
> > intersect them with a known range as we walk the different paths. We
> > don't need the gori exports , nor any of the other on-demand bits where
> > we calculate each export range dynamically.. I suspect it would reduce
> > the workload and memory impact quite a bit, but I'm not really familiar
> > with exactly how the threader uses those things.
> >
> > It'd require some minor tweaking to the lazy_ssa_cache to make the
> > bitmap of names set accessible. This  would provide similar
> > functionality to what the gori export () routine provides.  Both
> > relations and inferred ranges should only need to be calculated once per
> > block as well and could/should/would be applied the same way if they are
> > present.   I don't *think* the threader uses any of the def chains, but
> > Aldy can chip in.
>
> +   warning (OPT_Wdisabled_optimization,
> +    "Using fast VRP algorithm. %d basic blocks"
> +    " exceeds %s%d limit",
> +    n_basic_blocks_for_fn (fun),
> +    "--param=vrp-block-limit=",
> +    param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
>     " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> ; This is why you were getting an error as you mentioned on IRC.

Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
add it to the coding conventions. I might take a stab at adding the
rules mentioned in
https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com/
since those are the ones which are detected right now.

Thanks,
Andrew

>
>
> >
> > Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-25  2:35     ` Andrew Pinski
  2024-06-25  4:27       ` Andrew Pinski
@ 2024-06-25 13:23       ` Andrew MacLeod
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew MacLeod @ 2024-06-25 13:23 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Richard Biener, gcc-patches, hernandez, aldy


On 6/24/24 22:35, Andrew Pinski wrote:
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>     // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
>>     bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
>> NULL);
>>
>> This is what the fast_vrp routines uses.  We can gather all range
>> restrictions generated from an edge efficiently just once and then
>> intersect them with a known range as we walk the different paths. We
>> don't need the gori exports , nor any of the other on-demand bits where
>> we calculate each export range dynamically.. I suspect it would reduce
>> the workload and memory impact quite a bit, but I'm not really familiar
>> with exactly how the threader uses those things.
>>
>> It'd require some minor tweaking to the lazy_ssa_cache to make the
>> bitmap of names set accessible. This  would provide similar
>> functionality to what the gori export () routine provides.  Both
>> relations and inferred ranges should only need to be calculated once per
>> block as well and could/should/would be applied the same way if they are
>> present.   I don't *think* the threader uses any of the def chains, but
>> Aldy can chip in.
> +   warning (OPT_Wdisabled_optimization,
> +    "Using fast VRP algorithm. %d basic blocks"
> +    " exceeds %s%d limit",
> +    n_basic_blocks_for_fn (fun),
> +    "--param=vrp-block-limit=",
> +    param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
>      " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> ; This is why you were getting an error as you mentioned on IRC.
>
>
I didnt do that because when I did, everything in bracketed by the %< %> 
in the warning came out in bold text.  is that the intended effect?


Andrew.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [COMMITTED] Add param for bb limit to invoke fast_vrp.
  2024-06-25  4:27       ` Andrew Pinski
@ 2024-06-25 19:51         ` Andrew MacLeod
  2024-06-25 21:32         ` [PATCH] " David Malcolm
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew MacLeod @ 2024-06-25 19:51 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Richard Biener, gcc-patches, hernandez, aldy

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]


On 6/25/24 00:27, Andrew Pinski wrote:
> On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pinskia@gmail.com> wrote:
>>
>>
>> This should be:
>> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
>>      " exceeds %<%--param=vrp-block-limit=d%> limit",
>> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>>
>> I had thought it was mentioned that options should be quoted but it is
>> not mentioned in the coding conventions:
>> https://gcc.gnu.org/codingconventions.html#Diagnostics
>>
>> But it is mentioned in
>> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
>> ; This is why you were getting an error as you mentioned on IRC.
> Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
> add it to the coding conventions. I might take a stab at adding the
> rules mentioned in
> https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com/
> since those are the ones which are detected right now.
>
> Thanks,
> Andrew
>
Thanks.  Final version checked in:

Andrew

[-- Attachment #2: 0001-Add-param-for-bb-limit-to-invoke-fast_vrp.patch --]
[-- Type: text/x-patch, Size: 3035 bytes --]

From 1ea95cc5e099d554764b82df8e972129e9d20885 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	* doc/invoke.texi (vrp-block-limit): Document.
	* params.opt (param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi |  3 +++
 gcc/params.opt      |  4 ++++
 gcc/tree-vrp.cc     | 14 ++++++++++++--
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23d90db2925..729dbc1691e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16849,6 +16849,9 @@ this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
 -param=vrp-sparse-threshold=
 Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 26979b706e5..e184e9af51e 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-cp.h"
 #include "ipa-prop.h"
 #include "attribs.h"
+#include "diagnostic-core.h"
 
 // This class is utilized by VRP and ranger to remove __builtin_unreachable
 // calls, and reflect any resulting global ranges.
@@ -1331,9 +1332,18 @@ public:
   unsigned int execute (function *fun) final override
     {
       // Check for fast vrp.
-      if (&data == &pass_data_fast_vrp)
+      bool use_fvrp = (&data == &pass_data_fast_vrp);
+      if (!use_fvrp && last_basic_block_for_fn (fun) > param_vrp_block_limit)
+	{
+	  use_fvrp = true;
+	  warning (OPT_Wdisabled_optimization,
+		   "Using fast VRP algorithm. %d basic blocks"
+		   " exceeds %<--param=vrp-block-limit=%d%> limit",
+		   n_basic_blocks_for_fn (fun),
+		   param_vrp_block_limit);
+	}
+      if (use_fvrp)
 	return execute_fast_vrp (fun, final_p);
-
       return execute_ranger_vrp (fun, final_p);
     }
 
-- 
2.45.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add param for bb limit to invoke fast_vrp.
  2024-06-25  4:27       ` Andrew Pinski
  2024-06-25 19:51         ` [COMMITTED] " Andrew MacLeod
@ 2024-06-25 21:32         ` David Malcolm
  1 sibling, 0 replies; 8+ messages in thread
From: David Malcolm @ 2024-06-25 21:32 UTC (permalink / raw)
  To: Andrew Pinski, Andrew MacLeod
  Cc: Richard Biener, gcc-patches, hernandez, aldy

On Mon, 2024-06-24 at 21:27 -0700, Andrew Pinski wrote:
> On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pinskia@gmail.com>
> wrote:
> > 
> > On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod
> > <amacleod@redhat.com> wrote:
> > > 
> > > 
> > > On 6/22/24 09:15, Richard Biener wrote:
> > > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod
> > > > <amacleod@redhat.com> wrote:
> > > > > This patch adds
> > > > > 
> > > > >       --param=vrp-block-limit=N
> > > > > 
> > > > > When the basic block counter for a function exceeded 'N' ,
> > > > > VRP is
> > > > > invoked with the new fast_vrp algorithm instead.   This
> > > > > algorithm uses a
> > > > > lot less memory and processing power, although it does get a
> > > > > few less
> > > > > things.
> > > > > 
> > > > > Primary motivation is cases like
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which
> > > > > the 3  VRP
> > > > > passes consume about 600 seconds of the compile time, and a
> > > > > lot of
> > > > > memory.      With fast_vrp, it spends less than 10 seconds
> > > > > total in the
> > > > > 3 passes of VRP.     This test case has about 400,000 basic
> > > > > blocks.
> > > > > 
> > > > > The default for N in this patch is 150,000,  arbitrarily
> > > > > chosen.
> > > > > 
> > > > > This bootstraps, (and I bootstrapped it with --param=vrp-
> > > > > block-limit=0
> > > > > as well) on x86_64-pc-linux-gnu, with no regressions.
> > > > > 
> > > > > What do you think, OK for trunk?
> > > > +      if (last_basic_block_for_fn (fun) >
> > > > param_vrp_block_limit ||
> > > > +         &data == &pass_data_fast_vrp)
> > > > 
> > > > > > goes to the next line.
> > > > 
> > > > Btw, we have -Wdisabled-optimization for these cases which
> > > > should
> > > > say sth like "function has excess of %d number of basic blocks
> > > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > > or so in this case.
> > > > 
> > > > As I wrote in the PR the priority should be -O1 compile-time
> > > > performance and memory use.
> > > 
> > > 
> > > Yeah, I just wanted to use it as a model for "bad" cases for
> > > ranger.
> > > Adjusted patch attached which now issues the warning.
> > > 
> > > I also found that the transitive relations were causing a small
> > > blowup
> > > in time for relation processing now that I turned relations on
> > > for fast
> > > VRP.  I commited a patch and fast_vrp no longer does transitives.
> > > 
> > > If you want to experiment with enabling fast VRP at -O1, it
> > > should be
> > > fast all the time now.  I think :-)    This testcase runs in
> > > about 95
> > > seconds on my test machine.  if I turn on VRP, a single VRP pass
> > > takes
> > > about 2.5 seconds.    Its all set up, you can just add:
> > > 
> > > NEXT_PASS (pass_fast_vrp);
> > > 
> > > at an appropriate spot.
> > > 
> > > > Richard.
> > > > 
> > > > > Andrew
> > > > > 
> > > > > PS sorry,. it doesn't help the threader in that PR :-(
> > > > It's probably one of the known bottlenecks there - IIRC the
> > > > path range
> > > > queries make O(n^2) work.  I can look at this at some point as
> > > > I've
> > > > dealt with the slowness of this pass in the past.
> > > > 
> > > > There is param_max_jump_thread_paths that should limit
> > > > searching
> > > > but there is IIRC no way to limit the work path ranger actually
> > > > does
> > > > when resolving the query.
> > > > 
> > > Yeah, Id like to talk to Aldy about revamping the threader now
> > > that some
> > > of the newer facilities are available that fast_vrp uses.
> > > 
> > > We can calculate all the outgoing ranges for a block at once with
> > > :
> > > 
> > >    // Fill ssa-cache R with any outgoing ranges on edge E, using
> > > QUERY.
> > >    bool gori_on_edge (class ssa_cache &r, edge e, range_query
> > > *query =
> > > NULL);
> > > 
> > > This is what the fast_vrp routines uses.  We can gather all range
> > > restrictions generated from an edge efficiently just once and
> > > then
> > > intersect them with a known range as we walk the different paths.
> > > We
> > > don't need the gori exports , nor any of the other on-demand bits
> > > where
> > > we calculate each export range dynamically.. I suspect it would
> > > reduce
> > > the workload and memory impact quite a bit, but I'm not really
> > > familiar
> > > with exactly how the threader uses those things.
> > > 
> > > It'd require some minor tweaking to the lazy_ssa_cache to make
> > > the
> > > bitmap of names set accessible. This  would provide similar
> > > functionality to what the gori export () routine provides.  Both
> > > relations and inferred ranges should only need to be calculated
> > > once per
> > > block as well and could/should/would be applied the same way if
> > > they are
> > > present.   I don't *think* the threader uses any of the def
> > > chains, but
> > > Aldy can chip in.
> > 
> > +   warning (OPT_Wdisabled_optimization,
> > +    "Using fast VRP algorithm. %d basic blocks"
> > +    " exceeds %s%d limit",
> > +    n_basic_blocks_for_fn (fun),
> > +    "--param=vrp-block-limit=",
> > +    param_vrp_block_limit);
> > 
> > This should be:
> > warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d
> > basic blocks"
> >     " exceeds %<%--param=vrp-block-limit=d%> limit",
> > n_basic_blocks_for_fn (fun), param_vrp_block_limit);
> > 
> > I had thought it was mentioned that options should be quoted but it
> > is
> > not mentioned in the coding conventions:
> > https://gcc.gnu.org/codingconventions.html#Diagnostics
> > 
> > But it is mentioned in
> > https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> > ; This is why you were getting an error as you mentioned on IRC.
> 
> Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
> add it to the coding conventions. I might take a stab at adding the
> rules mentioned in
> https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com/
> since those are the ones which are detected right now.

FWIW there's some overlap between:
https://gcc.gnu.org/codingconventions.html#Diagnostics
and:
https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Quoting

Command-line options ought to be quoted in diagnostic messages.  In
particular as of GCC 14  [1] such quoted strings will automatically get
a URL to the documentation for the option in a sufficiently capable
terminal.  I'm hoping to add something similar for quoted *attributes*
in GCC 15; see [2].  FWIW I'm also looking at something that could
support better HTML output (e.g. div classes) for such quoted output
(e.g. via markdown via SARIF output), but that's more experimental.

Dave
[1] e.g. r14-6920-g9e49746da303b8
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-25 21:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-21 13:02 [PATCH] Add param for bb limit to invoke fast_vrp Andrew MacLeod
2024-06-22 13:15 ` Richard Biener
2024-06-25  2:19   ` Andrew MacLeod
2024-06-25  2:35     ` Andrew Pinski
2024-06-25  4:27       ` Andrew Pinski
2024-06-25 19:51         ` [COMMITTED] " Andrew MacLeod
2024-06-25 21:32         ` [PATCH] " David Malcolm
2024-06-25 13:23       ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).