public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Avoid exponential behavior in scheduler and better logging
@ 2023-11-20 12:06 Maxim Kuvyrkov
  2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 12:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law

Hi,

This patch series fixes exponential behavior in scheduler's find_modifiable_mems().  This fixes PRs [1] and [2], which are compilation time and memory hogs.

The first patch in the series is the actual fix (bootstrapped and regtested on aarch64-linux-gnu), and follow up patches will be improvements to scheduler's logging infrastructure, that enabled me to debug this problem.  As-is, the scheduler has good logging of the actual scheduling process, but calculation of instruction dependencies has almost no logging.

Please don't delay review of the PRs fix to wait for logging patches to be published, as it'll take me another week to clean them up.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554

Maxim Kuvyrkov (1):
  sched-deps.cc (find_modifiable_mems): Avoid exponential behavior

 gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
@ 2023-11-20 12:06 ` Maxim Kuvyrkov
  2023-11-20 13:09   ` Richard Biener
  2023-11-20 13:52   ` Alexander Monakov
  2023-11-22 11:14 ` [PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 12:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
... in this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

	PR rtl-optimization/96388
	PR rtl-optimization/111554
	* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..397bb9fd462 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
    in MII has been filled in, this function is called to find a suitable
    add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-			     backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
     {
       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
       rtx_insn *pro = DEP_PRO (dep);
       rtx_insn *con = DEP_CON (dep);
-      rtx_insn *inc_cand = backwards ? pro : con;
+      rtx_insn *inc_cand;
+      int n_inc_deps;
+
+      if (backwards)
+	{
+	  inc_cand = pro;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+	}
+      else
+	{
+	  inc_cand = con;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+	}
+
+      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+	 for mem_insn.  This by itself is not a problem, since each mem_insn
+	 will have only a few inc_insns associated with it.  However, if
+	 we consider that a single inc_insn may have a lot of mem_insns, AND,
+	 on top of that, a few other inc_insns associated with it --
+	 those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+	 dependencies created for them.  This may cause an exponential
+	 growth of memory usage and scheduling time.
+	 See PR96388 for details.
+	 We [heuristically] use n_inc_deps as a proxy for the number of MEM
+	 insns, and drop opportunities for breaking modifiable_mem dependencies
+	 when dependency lists grow beyond reasonable size.  */
+      if (n_mem_deps * n_inc_deps
+	  >= param_max_pending_list_length * param_max_pending_list_length)
+	goto next;
+
       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
 	goto next;
+
       if (parse_add_or_inc (mii, inc_cand, backwards))
 	{
 	  struct dep_replacement *desc;
@@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
 	  desc->insn = mii->mem_insn;
 	  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 			 INSN_SPEC_BACK_DEPS (con));
+
+	  gcc_assert (mii->inc_insn == inc_cand);
 	  if (backwards)
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
@ 2023-11-20 13:09   ` Richard Biener
  2023-11-20 13:42     ` Maxim Kuvyrkov
  2023-11-20 13:52   ` Alexander Monakov
  1 sibling, 1 reply; 37+ messages in thread
From: Richard Biener @ 2023-11-20 13:09 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>
> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/96388
>         PR rtl-optimization/111554
>         * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..397bb9fd462 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding data
>     in MII has been filled in, this function is called to find a suitable
>     add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies between
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>    sd_iterator_def sd_it;
>    dep_t dep;
> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it = sd_iterator_start (mii->mem_insn,
> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>    while (sd_iterator_cond (&sd_it, &dep))
>      {
>        dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>        rtx_insn *pro = DEP_PRO (dep);
>        rtx_insn *con = DEP_CON (dep);
> -      rtx_insn *inc_cand = backwards ? pro : con;
> +      rtx_insn *inc_cand;
> +      int n_inc_deps;
> +
> +      if (backwards)
> +       {
> +         inc_cand = pro;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> +       }
> +      else
> +       {
> +         inc_cand = con;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> +       }
> +
> +      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
> +        for mem_insn.  This by itself is not a problem, since each mem_insn
> +        will have only a few inc_insns associated with it.  However, if
> +        we consider that a single inc_insn may have a lot of mem_insns, AND,
> +        on top of that, a few other inc_insns associated with it --
> +        those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
> +        dependencies created for them.  This may cause an exponential
> +        growth of memory usage and scheduling time.
> +        See PR96388 for details.
> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
> +        insns, and drop opportunities for breaking modifiable_mem dependencies
> +        when dependency lists grow beyond reasonable size.  */
> +      if (n_mem_deps * n_inc_deps
> +         >= param_max_pending_list_length * param_max_pending_list_length)
> +       goto next;
> +
>        if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))

it looks like this check is a lot cheaper than computing sd_lists_size so
can we keep that first?  sd_lists_size might be even more expensive
than parse_add_or_inc, and in principle you only need to count up
to param_max_pending_list_length * param_max_pending_list_length
for n_mem_deps and that devided by n_mem_deps when counting
n_inc_deps, but I guess that's premature optimization at that point.

So OK from my side with moving the if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
before the list counting.

Richard.

>         goto next;
> +
>        if (parse_add_or_inc (mii, inc_cand, backwards))
>         {
>           struct dep_replacement *desc;
> @@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>           desc->insn = mii->mem_insn;
>           move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
>                          INSN_SPEC_BACK_DEPS (con));
> +
> +         gcc_assert (mii->inc_insn == inc_cand);
>           if (backwards)
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 13:09   ` Richard Biener
@ 2023-11-20 13:42     ` Maxim Kuvyrkov
  2023-11-20 13:45       ` Richard Biener
  0 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 13:42 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

> On Nov 20, 2023, at 17:09, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>> 
>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>> of dependencies, which become memory and compilation time hogs.
>> Consider example (simplified from PR96388) ...
>> ===
>> sp=sp-4 // sp_insnA
>> mem_insnA1[sp+A1]
>> ...
>> mem_insnAN[sp+AN]
>> sp=sp-4 // sp_insnB
>> mem_insnB1[sp+B1]
>> ...
>> mem_insnBM[sp+BM]
>> ===
>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>> to be able to pass sp_insnA, and, while doing this, will create
>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>> backward dependencies.
>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>> dependencies.
>> 
>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>> and compilation time is down to 3-4 minutes, with sched2 no longer
>> standing out on -ftime-report or memory usage.
>> 
>> gcc/ChangeLog:
>> 
>>        PR rtl-optimization/96388
>>        PR rtl-optimization/111554
>>        * sched-deps.cc (find_inc): Avoid exponential behavior.
>> ---
>> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 41 insertions(+), 4 deletions(-)
>> 
>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>> index c23218890f3..397bb9fd462 100644
>> --- a/gcc/sched-deps.cc
>> +++ b/gcc/sched-deps.cc
>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
>> /* Once a suitable mem reference has been found and the corresponding data
>>    in MII has been filled in, this function is called to find a suitable
>>    add or inc insn involving the register we found in the memory
>> -   reference.  */
>> +   reference.
>> +   If successful, this function will create additional dependencies between
>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
>> +*/
>> 
>> static bool
>> find_inc (struct mem_inc_info *mii, bool backwards)
>> {
>>   sd_iterator_def sd_it;
>>   dep_t dep;
>> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
>> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>> 
>> -  sd_it = sd_iterator_start (mii->mem_insn,
>> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
>> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>>   while (sd_iterator_cond (&sd_it, &dep))
>>     {
>>       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>>       rtx_insn *pro = DEP_PRO (dep);
>>       rtx_insn *con = DEP_CON (dep);
>> -      rtx_insn *inc_cand = backwards ? pro : con;
>> +      rtx_insn *inc_cand;
>> +      int n_inc_deps;
>> +
>> +      if (backwards)
>> +       {
>> +         inc_cand = pro;
>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
>> +       }
>> +      else
>> +       {
>> +         inc_cand = con;
>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
>> +       }
>> +
>> +      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
>> +        for mem_insn.  This by itself is not a problem, since each mem_insn
>> +        will have only a few inc_insns associated with it.  However, if
>> +        we consider that a single inc_insn may have a lot of mem_insns, AND,
>> +        on top of that, a few other inc_insns associated with it --
>> +        those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
>> +        dependencies created for them.  This may cause an exponential
>> +        growth of memory usage and scheduling time.
>> +        See PR96388 for details.
>> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
>> +        insns, and drop opportunities for breaking modifiable_mem dependencies
>> +        when dependency lists grow beyond reasonable size.  */
>> +      if (n_mem_deps * n_inc_deps
>> +         >= param_max_pending_list_length * param_max_pending_list_length)
>> +       goto next;
>> +
>>       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
> 
> it looks like this check is a lot cheaper than computing sd_lists_size so
> can we keep that first?  sd_lists_size might be even more expensive
> than parse_add_or_inc,

sd_lists_size() is cheap, it doesn't walk or count dependencies; it just adds 2-3 integers together.

The reason why sd_lists_size() has a loop is to be able to transparently handle dependencies split among sub-lists, e.g., sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total number of backward dependencies.

--
Maxim Kuvyrkov
https://www.linaro.org



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 13:42     ` Maxim Kuvyrkov
@ 2023-11-20 13:45       ` Richard Biener
  2023-11-20 14:48         ` [PATCH v2] " Maxim Kuvyrkov
  2023-11-20 18:59         ` [PATCH 1/1] " Maxim Kuvyrkov
  0 siblings, 2 replies; 37+ messages in thread
From: Richard Biener @ 2023-11-20 13:45 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>
> > On Nov 20, 2023, at 17:09, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
> > <maxim.kuvyrkov@linaro.org> wrote:
> >>
> >> This patch avoids sched-deps.cc:find_inc() creating exponential number
> >> of dependencies, which become memory and compilation time hogs.
> >> Consider example (simplified from PR96388) ...
> >> ===
> >> sp=sp-4 // sp_insnA
> >> mem_insnA1[sp+A1]
> >> ...
> >> mem_insnAN[sp+AN]
> >> sp=sp-4 // sp_insnB
> >> mem_insnB1[sp+B1]
> >> ...
> >> mem_insnBM[sp+BM]
> >> ===
> >> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> >> to be able to pass sp_insnA, and, while doing this, will create
> >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> >> is a consumer of sp_insnA.  After this sp_insnB will have N new
> >> backward dependencies.
> >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> >> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> >> dependencies.
> >>
> >> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> >> 30GB and compilation time of 30 minutes, with sched2 accounting for
> >> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> >> and compilation time is down to 3-4 minutes, with sched2 no longer
> >> standing out on -ftime-report or memory usage.
> >>
> >> gcc/ChangeLog:
> >>
> >>        PR rtl-optimization/96388
> >>        PR rtl-optimization/111554
> >>        * sched-deps.cc (find_inc): Avoid exponential behavior.
> >> ---
> >> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
> >> 1 file changed, 41 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> >> index c23218890f3..397bb9fd462 100644
> >> --- a/gcc/sched-deps.cc
> >> +++ b/gcc/sched-deps.cc
> >> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
> >> /* Once a suitable mem reference has been found and the corresponding data
> >>    in MII has been filled in, this function is called to find a suitable
> >>    add or inc insn involving the register we found in the memory
> >> -   reference.  */
> >> +   reference.
> >> +   If successful, this function will create additional dependencies between
> >> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
> >> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
> >> +*/
> >>
> >> static bool
> >> find_inc (struct mem_inc_info *mii, bool backwards)
> >> {
> >>   sd_iterator_def sd_it;
> >>   dep_t dep;
> >> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
> >> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
> >>
> >> -  sd_it = sd_iterator_start (mii->mem_insn,
> >> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> >> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
> >>   while (sd_iterator_cond (&sd_it, &dep))
> >>     {
> >>       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
> >>       rtx_insn *pro = DEP_PRO (dep);
> >>       rtx_insn *con = DEP_CON (dep);
> >> -      rtx_insn *inc_cand = backwards ? pro : con;
> >> +      rtx_insn *inc_cand;
> >> +      int n_inc_deps;
> >> +
> >> +      if (backwards)
> >> +       {
> >> +         inc_cand = pro;
> >> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> >> +       }
> >> +      else
> >> +       {
> >> +         inc_cand = con;
> >> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> >> +       }
> >> +
> >> +      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
> >> +        for mem_insn.  This by itself is not a problem, since each mem_insn
> >> +        will have only a few inc_insns associated with it.  However, if
> >> +        we consider that a single inc_insn may have a lot of mem_insns, AND,
> >> +        on top of that, a few other inc_insns associated with it --
> >> +        those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
> >> +        dependencies created for them.  This may cause an exponential
> >> +        growth of memory usage and scheduling time.
> >> +        See PR96388 for details.
> >> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
> >> +        insns, and drop opportunities for breaking modifiable_mem dependencies
> >> +        when dependency lists grow beyond reasonable size.  */
> >> +      if (n_mem_deps * n_inc_deps
> >> +         >= param_max_pending_list_length * param_max_pending_list_length)
> >> +       goto next;
> >> +
> >>       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
> >
> > it looks like this check is a lot cheaper than computing sd_lists_size so
> > can we keep that first?  sd_lists_size might be even more expensive
> > than parse_add_or_inc,
>
> sd_lists_size() is cheap, it doesn't walk or count dependencies; it just adds 2-3 integers together.
>
> The reason why sd_lists_size() has a loop is to be able to transparently handle dependencies split among sub-lists, e.g., sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total number of backward dependencies.

I see.  It still better to move the DEP_NONREG/DEP_MULTIPLE checks,
they do not depend on any
of the code above it, no?

Richard.

> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
  2023-11-20 13:09   ` Richard Biener
@ 2023-11-20 13:52   ` Alexander Monakov
  2023-11-20 14:39     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 37+ messages in thread
From: Alexander Monakov @ 2023-11-20 13:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: gcc-patches, Bernd Schmidt, Vladimir Makarov, Jeff Law


On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:

> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.

It's a bit hard to read this without knowing which value of 'backwards'
is assumed.

Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
This is a true dependency. We know we can break it by adjusting B1 by -4, but
we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
we need to iterate over *incoming true dependencies* of sp_insnB and add them.

But the code seems to be iterating over *all incoming dependencies*, so it
will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
understanding is correct.

Alexander

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 13:52   ` Alexander Monakov
@ 2023-11-20 14:39     ` Maxim Kuvyrkov
  2023-11-20 16:30       ` Alexander Monakov
  0 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 14:39 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

> On Nov 20, 2023, at 17:52, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>> of dependencies, which become memory and compilation time hogs.
>> Consider example (simplified from PR96388) ...
>> ===
>> sp=sp-4 // sp_insnA
>> mem_insnA1[sp+A1]
>> ...
>> mem_insnAN[sp+AN]
>> sp=sp-4 // sp_insnB
>> mem_insnB1[sp+B1]
>> ...
>> mem_insnBM[sp+BM]
>> ===
>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>> to be able to pass sp_insnA, and, while doing this, will create
>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>> backward dependencies.
>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>> dependencies.

[For avoidance of doubt, below discussion is about the general implementation of find_modifiable_mems() and not about the patch.]

> 
> It's a bit hard to read this without knowing which value of 'backwards'
> is assumed.
> 
> Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
> This is a true dependency. We know we can break it by adjusting B1 by -4, but
> we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
> we need to iterate over *incoming true dependencies* of sp_insnB and add them.
> 
> But the code seems to be iterating over *all incoming dependencies*, so it
> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
> understanding is correct.

Yeap, your understanding is correct.  However, this is what find_modifiable_mems() has to do to avoid complicated analysis of second-level dependencies.

I think, the optimization that find_modifiable_mems() does can be implemented as part of main dependency analysis, and I'm going to discuss that with Bernd.  I might be missing something, but it seems that instruction transformations that find_modifiable_mems() is doing are simpler than what ia64 speculation is doing.  So, I think, we should be able to leverage speculation infrastructure, rather than doing this optimization "on the side" of normal scheduling.

--
Maxim Kuvyrkov
https://www.linaro.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 13:45       ` Richard Biener
@ 2023-11-20 14:48         ` Maxim Kuvyrkov
  2023-11-20 18:59         ` [PATCH 1/1] " Maxim Kuvyrkov
  1 sibling, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 14:48 UTC (permalink / raw)
  To: gcc-patches, Richard Guenther
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===
... in this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

	PR rtl-optimization/96388
	PR rtl-optimization/111554
	* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..a286ff319e2 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
    in MII has been filled in, this function is called to find a suitable
    add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-			     backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
     {
       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
       rtx_insn *pro = DEP_PRO (dep);
       rtx_insn *con = DEP_CON (dep);
-      rtx_insn *inc_cand = backwards ? pro : con;
+      rtx_insn *inc_cand;
+      int n_inc_deps;
+
       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
 	goto next;
+
+      if (backwards)
+	{
+	  inc_cand = pro;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+	}
+      else
+	{
+	  inc_cand = con;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+	}
+
+      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+	 for mem_insn.  This by itself is not a problem, since each mem_insn
+	 will have only a few inc_insns associated with it.  However, if
+	 we consider that a single inc_insn may have a lot of mem_insns, AND,
+	 on top of that, a few other inc_insns associated with it --
+	 those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+	 dependencies created for them.  This may cause an exponential
+	 growth of memory usage and scheduling time.
+	 See PR96388 for details.
+	 We [heuristically] use n_inc_deps as a proxy for the number of MEM
+	 insns, and drop opportunities for breaking modifiable_mem dependencies
+	 when dependency lists grow beyond reasonable size.  */
+      if (n_mem_deps * n_inc_deps
+	  >= param_max_pending_list_length * param_max_pending_list_length)
+	goto next;
+
       if (parse_add_or_inc (mii, inc_cand, backwards))
 	{
 	  struct dep_replacement *desc;
@@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
 	  desc->insn = mii->mem_insn;
 	  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 			 INSN_SPEC_BACK_DEPS (con));
+
+	  gcc_assert (mii->inc_insn == inc_cand);
 	  if (backwards)
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 14:39     ` Maxim Kuvyrkov
@ 2023-11-20 16:30       ` Alexander Monakov
  2023-11-21 10:32         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 37+ messages in thread
From: Alexander Monakov @ 2023-11-20 16:30 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law


On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:

> > On Nov 20, 2023, at 17:52, Alexander Monakov <amonakov@ispras.ru> wrote:
> > 
> > 
> > On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> > 
> >> This patch avoids sched-deps.cc:find_inc() creating exponential number
> >> of dependencies, which become memory and compilation time hogs.
> >> Consider example (simplified from PR96388) ...
> >> ===
> >> sp=sp-4 // sp_insnA
> >> mem_insnA1[sp+A1]
> >> ...
> >> mem_insnAN[sp+AN]
> >> sp=sp-4 // sp_insnB
> >> mem_insnB1[sp+B1]
> >> ...
> >> mem_insnBM[sp+BM]
> >> ===
> >> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> >> to be able to pass sp_insnA, and, while doing this, will create
> >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> >> is a consumer of sp_insnA.  After this sp_insnB will have N new
> >> backward dependencies.
> >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> >> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> >> dependencies.
> 
> [For avoidance of doubt, below discussion is about the general implementation
> of find_modifiable_mems() and not about the patch.]

I was saying the commit message is hard to read (unless it's just me).

> > It's a bit hard to read this without knowing which value of 'backwards'
> > is assumed.
> > 
> > Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
> > This is a true dependency. We know we can break it by adjusting B1 by -4, but
> > we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
> > we need to iterate over *incoming true dependencies* of sp_insnB and add them.
> > 
> > But the code seems to be iterating over *all incoming dependencies*, so it
> > will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
> > dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
> > understanding is correct.
> 
> Yeap, your understanding is correct.  However, this is what
> find_modifiable_mems() has to do to avoid complicated analysis of second-level
> dependencies.

What is the reason it cannot simply skip anti-dependencies in the
'if (backwards)' loop, and true dependencies in the 'else' loop?

Alexander

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 13:45       ` Richard Biener
  2023-11-20 14:48         ` [PATCH v2] " Maxim Kuvyrkov
@ 2023-11-20 18:59         ` Maxim Kuvyrkov
  1 sibling, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-20 18:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

> On Nov 20, 2023, at 17:45, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Mon, Nov 20, 2023 at 2:42 PM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>> 
>>> On Nov 20, 2023, at 17:09, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Mon, Nov 20, 2023 at 1:08 PM Maxim Kuvyrkov
>>> <maxim.kuvyrkov@linaro.org> wrote:
>>>> 
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> ===
>>>> sp=sp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=sp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> ===
>>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>>>> 
>>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
>>>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>>>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>>>> and compilation time is down to 3-4 minutes, with sched2 no longer
>>>> standing out on -ftime-report or memory usage.
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>>       PR rtl-optimization/96388
>>>>       PR rtl-optimization/111554
>>>>       * sched-deps.cc (find_inc): Avoid exponential behavior.
>>>> ---
>>>> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
>>>> 1 file changed, 41 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>>>> index c23218890f3..397bb9fd462 100644
>>>> --- a/gcc/sched-deps.cc
>>>> +++ b/gcc/sched-deps.cc
>>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
>>>> /* Once a suitable mem reference has been found and the corresponding data
>>>>   in MII has been filled in, this function is called to find a suitable
>>>>   add or inc insn involving the register we found in the memory
>>>> -   reference.  */
>>>> +   reference.
>>>> +   If successful, this function will create additional dependencies between
>>>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
>>>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
>>>> +*/
>>>> 
>>>> static bool
>>>> find_inc (struct mem_inc_info *mii, bool backwards)
>>>> {
>>>>  sd_iterator_def sd_it;
>>>>  dep_t dep;
>>>> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
>>>> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>>>> 
>>>> -  sd_it = sd_iterator_start (mii->mem_insn,
>>>> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
>>>> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>>>>  while (sd_iterator_cond (&sd_it, &dep))
>>>>    {
>>>>      dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>>>>      rtx_insn *pro = DEP_PRO (dep);
>>>>      rtx_insn *con = DEP_CON (dep);
>>>> -      rtx_insn *inc_cand = backwards ? pro : con;
>>>> +      rtx_insn *inc_cand;
>>>> +      int n_inc_deps;
>>>> +
>>>> +      if (backwards)
>>>> +       {
>>>> +         inc_cand = pro;
>>>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
>>>> +       }
>>>> +      else
>>>> +       {
>>>> +         inc_cand = con;
>>>> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
>>>> +       }
>>>> +
>>>> +      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
>>>> +        for mem_insn.  This by itself is not a problem, since each mem_insn
>>>> +        will have only a few inc_insns associated with it.  However, if
>>>> +        we consider that a single inc_insn may have a lot of mem_insns, AND,
>>>> +        on top of that, a few other inc_insns associated with it --
>>>> +        those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
>>>> +        dependencies created for them.  This may cause an exponential
>>>> +        growth of memory usage and scheduling time.
>>>> +        See PR96388 for details.
>>>> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
>>>> +        insns, and drop opportunities for breaking modifiable_mem dependencies
>>>> +        when dependency lists grow beyond reasonable size.  */
>>>> +      if (n_mem_deps * n_inc_deps
>>>> +         >= param_max_pending_list_length * param_max_pending_list_length)
>>>> +       goto next;
>>>> +
>>>>      if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
>>> 
>>> it looks like this check is a lot cheaper than computing sd_lists_size so
>>> can we keep that first?  sd_lists_size might be even more expensive
>>> than parse_add_or_inc,
>> 
>> sd_lists_size() is cheap, it doesn't walk or count dependencies; it just adds 2-3 integers together.
>> 
>> The reason why sd_lists_size() has a loop is to be able to transparently handle dependencies split among sub-lists, e.g., sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total number of backward dependencies.
> 
> I see.  It still better to move the DEP_NONREG/DEP_MULTIPLE checks,
> they do not depend on any
> of the code above it, no?

Yes.  Adjusted in v2.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 16:30       ` Alexander Monakov
@ 2023-11-21 10:32         ` Maxim Kuvyrkov
  2023-11-21 11:05           ` Alexander Monakov
  0 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-21 10:32 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law

> On Nov 20, 2023, at 20:30, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
> 
>>> On Nov 20, 2023, at 17:52, Alexander Monakov <amonakov@ispras.ru> wrote:
>>> 
>>> 
>>> On Mon, 20 Nov 2023, Maxim Kuvyrkov wrote:
>>> 
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> ===
>>>> sp=sp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=sp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> ===
>>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>> 
>> [For avoidance of doubt, below discussion is about the general implementation
>> of find_modifiable_mems() and not about the patch.]
> 
> I was saying the commit message is hard to read (unless it's just me).
> 
>>> It's a bit hard to read this without knowing which value of 'backwards'
>>> is assumed.

Oh, sorry, I misunderstood your comment.

In the above example I want to describe outcome that current code generates, without going into details about exactly how it does it.  I'm not sure how to make it more readable, and would appreciate suggestions.


>>> 
>>> Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
>>> This is a true dependency. We know we can break it by adjusting B1 by -4, but
>>> we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
>>> we need to iterate over *incoming true dependencies* of sp_insnB and add them.
>>> 
>>> But the code seems to be iterating over *all incoming dependencies*, so it
>>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
>>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
>>> understanding is correct.
>> 
>> Yeap, your understanding is correct.  However, this is what
>> find_modifiable_mems() has to do to avoid complicated analysis of second-level
>> dependencies.
> 
> What is the reason it cannot simply skip anti-dependencies in the
> 'if (backwards)' loop, and true dependencies in the 'else' loop?

I /think/, this should be possible.  However, rather than improving current implementation my preference is to rework it by integrating with the main dependency analysis.  This should provide both faster and more precise dependency analysis, which would generate breakable addr/mem dependencies.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-21 10:32         ` Maxim Kuvyrkov
@ 2023-11-21 11:05           ` Alexander Monakov
  0 siblings, 0 replies; 37+ messages in thread
From: Alexander Monakov @ 2023-11-21 11:05 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Bernd Schmidt, Vladimir Makarov, Jeff Law


On Tue, 21 Nov 2023, Maxim Kuvyrkov wrote:

> >>>> This patch avoids sched-deps.cc:find_inc() creating exponential number
> >>>> of dependencies, which become memory and compilation time hogs.
> >>>> Consider example (simplified from PR96388) ...
> >>>> ===
> >>>> sp=sp-4 // sp_insnA
> >>>> mem_insnA1[sp+A1]
> >>>> ...
> >>>> mem_insnAN[sp+AN]
> >>>> sp=sp-4 // sp_insnB
> >>>> mem_insnB1[sp+B1]
> >>>> ...
> >>>> mem_insnBM[sp+BM]
> >>>> ===
> >>>> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> >>>> to be able to pass sp_insnA, and, while doing this, will create
> >>>> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> >>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
> >>>> backward dependencies.
> >>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> >>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> >>>> dependencies.
> >> 
> >> [For avoidance of doubt, below discussion is about the general implementation
> >> of find_modifiable_mems() and not about the patch.]
> > 
> > I was saying the commit message is hard to read (unless it's just me).
> > 
> >>> It's a bit hard to read this without knowing which value of 'backwards'
> >>> is assumed.
> 
> Oh, sorry, I misunderstood your comment.
> 
> In the above example I want to describe outcome that current code generates,
> without going into details about exactly how it does it.  I'm not sure how to
> make it more readable, and would appreciate suggestions.

I think it would be easier to follow if you could fix a specific value of
'backwards' up front, and then ensure all following statements are consistent
with that, like I did in my explanation. Please feel free to pick up my text
into the commit message, if it helps.

> >>> Say 'backwards' is true and we are inspecting producer sp_insnB of mem_insnB1.
> >>> This is a true dependency. We know we can break it by adjusting B1 by -4, but
> >>> we need to be careful not to move such modified mem_insnB1 above sp_insnA, so
> >>> we need to iterate over *incoming true dependencies* of sp_insnB and add them.
> >>> 
> >>> But the code seems to be iterating over *all incoming dependencies*, so it
> >>> will e.g. take anti-dependency mem_insnA1 -> sp_insnB and create a true
> >>> dependency mem_insnA1 -> mem_insnB1'. This seems utterly inefficient, if my
> >>> understanding is correct.
> >> 
> >> Yeap, your understanding is correct.  However, this is what
> >> find_modifiable_mems() has to do to avoid complicated analysis of second-level
> >> dependencies.
> > 
> > What is the reason it cannot simply skip anti-dependencies in the
> > 'if (backwards)' loop, and true dependencies in the 'else' loop?
> 
> I /think/, this should be possible.  However, rather than improving current
> implementation my preference is to rework it by integrating with the main
> dependency analysis.  This should provide both faster and more precise
> dependency analysis, which would generate breakable addr/mem dependencies.

I see, thank you.

Alexander

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
  2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

Compared to v1 and v2 this is a complete patch series.

The debugging/dumping improvements gently touch IRA, RTL lists, and sel-sched bits to avoid re-inventing or copy-pasting the wheel.

Bootstrapping and regtesting these patches on aarch64-linux-gnu.  OK to merge?

Thanks!

===

This patch series fixes exponential behavior in scheduler's find_modifiable_mems(). This fixes PRs [1] and [2], which are compilation time and memory hogs.

The first patch in the series is the actual fix (bootstrapped and regtested on aarch64-linux-gnu), and follow up patches are improvements to scheduler's logging infrastructure, that enabled me to debug this problem.  As-is, the scheduler has good logging of the actual scheduling process, but calculation of instruction dependencies has almost no logging.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96388
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111554

Maxim Kuvyrkov (8):
  sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  Unify implementations of print_hard_reg_set()
  Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
  Improve and fix sched-deps.cc: dump_dep() and dump_lists().
  Add a bit more logging scheduler's dependency analysis
  sched_deps.cc: Simplify initialization of dependency contexts
  Improve logging of register data in scheduler dependency analysis
  Improve logging of scheduler dependency analysis context

 gcc/hard-reg-set.h    |   3 +
 gcc/ira-color.cc      |  17 +-
 gcc/ira-conflicts.cc  |  39 +----
 gcc/lists.cc          |  30 +++-
 gcc/rtl.h             |   4 +-
 gcc/sched-deps.cc     | 399 +++++++++++++++++++++++++++++++++++++++---
 gcc/sched-int.h       |   9 +-
 gcc/sched-rgn.cc      |  56 +++---
 gcc/sel-sched-dump.cc |  21 +--
 gcc/sel-sched-dump.h  |   2 +-
 10 files changed, 467 insertions(+), 113 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
  2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 12:56   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 2/8] Unify implementations of print_hard_reg_set() Maxim Kuvyrkov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

This patch avoids sched-deps.cc:find_inc() creating exponential number
of dependencies, which become memory and compilation time hogs.
Consider example (simplified from PR96388) ...
===
sp=sp-4 // sp_insnA
mem_insnA1[sp+A1]
...
mem_insnAN[sp+AN]
sp=sp-4 // sp_insnB
mem_insnB1[sp+B1]
...
mem_insnBM[sp+BM]
===

[For simplicity, let's assume find_inc(backwards==true)].
In this example find_modifiable_mems() will arrange for mem_insnA*
to be able to pass sp_insnA, and, while doing this, will create
dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
is a consumer of sp_insnA.  After this sp_insnB will have N new
backward dependencies.
Then find_modifiable_mems() gets to mem_insnB*s and starts to create
N new dependencies for _every_ mem_insnB*.  This gets us N*M new
dependencies.

In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
30GB and compilation time of 30 minutes, with sched2 accounting for
95% of both metrics.  After this patch the RAM usage is down to 1GB
and compilation time is down to 3-4 minutes, with sched2 no longer
standing out on -ftime-report or memory usage.

gcc/ChangeLog:

	PR rtl-optimization/96388
	PR rtl-optimization/111554
	* sched-deps.cc (find_inc): Avoid exponential behavior.
---
 gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index c23218890f3..005fc0f567e 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_insn *insn, bool before_mem)
 /* Once a suitable mem reference has been found and the corresponding data
    in MII has been filled in, this function is called to find a suitable
    add or inc insn involving the register we found in the memory
-   reference.  */
+   reference.
+   If successful, this function will create additional dependencies between
+   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backwards)
+   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !backwards).
+*/
 
 static bool
 find_inc (struct mem_inc_info *mii, bool backwards)
 {
   sd_iterator_def sd_it;
   dep_t dep;
+  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW;
+  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
 
-  sd_it = sd_iterator_start (mii->mem_insn,
-			     backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
+  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
   while (sd_iterator_cond (&sd_it, &dep))
     {
       dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
       rtx_insn *pro = DEP_PRO (dep);
       rtx_insn *con = DEP_CON (dep);
-      rtx_insn *inc_cand = backwards ? pro : con;
+      rtx_insn *inc_cand;
+      int n_inc_deps;
+
       if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
 	goto next;
+
+      if (backwards)
+	{
+	  inc_cand = pro;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
+	}
+      else
+	{
+	  inc_cand = con;
+	  n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
+	}
+
+      /* In the FOR_EACH_DEP loop below we will create additional n_inc_deps
+	 for mem_insn.  This by itself is not a problem, since each mem_insn
+	 will have only a few inc_insns associated with it.  However, if
+	 we consider that a single inc_insn may have a lot of mem_insns, AND,
+	 on top of that, a few other inc_insns associated with it --
+	 those _other inc_insns_ will get (n_mem_deps * number of MEM insns)
+	 dependencies created for them.  This may cause an exponential
+	 growth of memory usage and scheduling time.
+	 See PR96388 for details.
+	 We [heuristically] use n_inc_deps as a proxy for the number of MEM
+	 insns, and drop opportunities for breaking modifiable_mem dependencies
+	 when dependency lists grow beyond reasonable size.  */
+      if (n_mem_deps * n_inc_deps
+	  >= param_max_pending_list_length * param_max_pending_list_length)
+	goto next;
+
       if (parse_add_or_inc (mii, inc_cand, backwards))
 	{
 	  struct dep_replacement *desc;
@@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
 	  desc->insn = mii->mem_insn;
 	  move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
 			 INSN_SPEC_BACK_DEPS (con));
+
+	  /* Make sure that n_inc_deps above is consistent with dependencies
+	     we create.  */
+	  gcc_assert (mii->inc_insn == inc_cand);
+
 	  if (backwards)
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 2/8] Unify implementations of print_hard_reg_set()
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (2 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2023-11-22 15:04   ` Vladimir Makarov
  2023-11-22 11:14 ` [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc Maxim Kuvyrkov
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

We currently have 3 implementations of print_hard_reg_set()
(all with the same name!) in ira-color.cc, ira-conflicts.cc, and
sel-sched-dump.cc.  This patch generalizes implementation in
ira-color.cc, and uses it in all other places.  The declaration
is added to hard-reg-set.h.

The motivation for this patch is the [upcoming] need for
print_hard_reg_set() in sched-deps.cc.

gcc/ChangeLog:

	* hard-reg-set.h (print_hard_reg_set): Declare.
	* ira-color.cc (print_hard_reg_set): Generalize a bit.
	(debug_hard_reg_set, print_hard_regs_subforest,)
	(setup_allocno_available_regs_num): Update.
	* ira-conflicts.cc (print_hard_reg_set): Remove.
	(print_allocno_conflicts): Use global print_hard_reg_set().
	* sel-sched-dump.cc (print_hard_reg_set): Remove.
	(dump_hard_reg_set): Use global print_hard_reg_set().
	* sel-sched-dump.h (dump_hard_reg_set): Mark as DEBUG_FUNCTION.
---
 gcc/hard-reg-set.h    |  3 +++
 gcc/ira-color.cc      | 17 ++++++++++-------
 gcc/ira-conflicts.cc  | 39 ++++-----------------------------------
 gcc/sel-sched-dump.cc | 21 ++++-----------------
 gcc/sel-sched-dump.h  |  2 +-
 5 files changed, 22 insertions(+), 60 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..81bca6df0e5 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -524,4 +524,7 @@ call_used_or_fixed_reg_p (unsigned int regno)
 }
 #endif
 
+/* ira-color.cc */
+extern void print_hard_reg_set (FILE *, HARD_REG_SET, const char *, bool);
+
 #endif /* ! GCC_HARD_REG_SET_H */
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index f2e8ea34152..43564b44933 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -482,11 +482,14 @@ first_common_ancestor_node (allocno_hard_regs_node_t first,
 }
 
 /* Print hard reg set SET to F.  */
-static void
-print_hard_reg_set (FILE *f, HARD_REG_SET set, bool new_line_p)
+void
+print_hard_reg_set (FILE *f, HARD_REG_SET set,
+		    const char *title, bool new_line_p)
 {
   int i, start, end;
 
+  if (title)
+    fprintf (f, "%s", title);
   for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++)
     {
       bool reg_included = TEST_HARD_REG_BIT (set, i);
@@ -516,7 +519,7 @@ print_hard_reg_set (FILE *f, HARD_REG_SET set, bool new_line_p)
 DEBUG_FUNCTION void
 debug_hard_reg_set (HARD_REG_SET set)
 {
-  print_hard_reg_set (stderr, set, true);
+  print_hard_reg_set (stderr, set, NULL, true);
 }
 
 /* Print allocno hard register subforest given by ROOTS and its LEVEL
@@ -534,7 +537,7 @@ print_hard_regs_subforest (FILE *f, allocno_hard_regs_node_t roots,
       for (i = 0; i < level * 2; i++)
 	fprintf (f, " ");
       fprintf (f, "%d:(", node->preorder_num);
-      print_hard_reg_set (f, node->hard_regs->set, false);
+      print_hard_reg_set (f, node->hard_regs->set, NULL, false);
       fprintf (f, ")@%" PRId64"\n", node->hard_regs->cost);
       print_hard_regs_subforest (f, node->first, level + 1);
     }
@@ -2982,12 +2985,12 @@ setup_allocno_available_regs_num (ira_allocno_t a)
      "      Allocno a%dr%d of %s(%d) has %d avail. regs ",
      ALLOCNO_NUM (a), ALLOCNO_REGNO (a),
      reg_class_names[aclass], ira_class_hard_regs_num[aclass], n);
-  print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, false);
+  print_hard_reg_set (ira_dump_file, data->profitable_hard_regs, NULL, false);
   fprintf (ira_dump_file, ", %snode: ",
 	   data->profitable_hard_regs == data->hard_regs_node->hard_regs->set
 	   ? "" : "^");
   print_hard_reg_set (ira_dump_file,
-		      data->hard_regs_node->hard_regs->set, false);
+		      data->hard_regs_node->hard_regs->set, NULL, false);
   for (i = 0; i < nwords; i++)
     {
       ira_object_t obj = ALLOCNO_OBJECT (a, i);
@@ -3000,7 +3003,7 @@ setup_allocno_available_regs_num (ira_allocno_t a)
 	}
       fprintf (ira_dump_file, " (confl regs = ");
       print_hard_reg_set (ira_dump_file, OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
-			  false);
+			  NULL, false);
       fprintf (ira_dump_file, ")");
     }
   fprintf (ira_dump_file, "\n");
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index a4d93c8d734..69806b1a15b 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -670,37 +670,6 @@ build_conflicts (void)
 
 \f
 
-/* Print hard reg set SET with TITLE to FILE.  */
-static void
-print_hard_reg_set (FILE *file, const char *title, HARD_REG_SET set)
-{
-  int i, start, end;
-
-  fputs (title, file);
-  for (start = end = -1, i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    {
-      bool reg_included = TEST_HARD_REG_BIT (set, i);
-
-      if (reg_included)
-	{
-	  if (start == -1)
-	    start = i;
-	  end = i;
-	}
-      if (start >= 0 && (!reg_included || i == FIRST_PSEUDO_REGISTER - 1))
-	{
-	  if (start == end)
-	    fprintf (file, " %d", start);
-	  else if (start == end + 1)
-	    fprintf (file, " %d %d", start, end);
-	  else
-	    fprintf (file, " %d-%d", start, end);
-	  start = -1;
-	}
-    }
-  putc ('\n', file);
-}
-
 static void
 print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
 {
@@ -759,14 +728,14 @@ print_allocno_conflicts (FILE * file, bool reg_p, ira_allocno_t a)
       conflicting_hard_regs = (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
 			       & ~ira_no_alloc_regs
 			       & reg_class_contents[ALLOCNO_CLASS (a)]);
-      print_hard_reg_set (file, "\n;;     total conflict hard regs:",
-			  conflicting_hard_regs);
+      print_hard_reg_set (file, conflicting_hard_regs,
+			  "\n;;     total conflict hard regs:", true);
 
       conflicting_hard_regs = (OBJECT_CONFLICT_HARD_REGS (obj)
 			       & ~ira_no_alloc_regs
 			       & reg_class_contents[ALLOCNO_CLASS (a)]);
-      print_hard_reg_set (file, ";;     conflict hard regs:",
-			  conflicting_hard_regs);
+      print_hard_reg_set (file, conflicting_hard_regs,
+			  ";;     conflict hard regs:", true);
       putc ('\n', file);
     }
 
diff --git a/gcc/sel-sched-dump.cc b/gcc/sel-sched-dump.cc
index 05de9840937..5b2439cf4e2 100644
--- a/gcc/sel-sched-dump.cc
+++ b/gcc/sel-sched-dump.cc
@@ -535,26 +535,13 @@ dump_insn_vector (rtx_vec_t succs)
       sel_print ("NULL ");
 }
 
-/* Dumps a hard reg set SET to FILE using PREFIX.  */
-static void
-print_hard_reg_set (FILE *file, const char *prefix, HARD_REG_SET set)
-{
-  int i;
-
-  fprintf (file, "%s{ ", prefix);
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    {
-      if (TEST_HARD_REG_BIT (set, i))
-	fprintf (file, "%d ", i);
-    }
-  fprintf (file, "}\n");
-}
-
 /* Dumps a hard reg set SET using PREFIX.  */
-void
+DEBUG_FUNCTION void
 dump_hard_reg_set (const char *prefix, HARD_REG_SET set)
 {
-  print_hard_reg_set (sched_dump, prefix, set);
+  fprintf (sched_dump, "%s{", prefix);
+  print_hard_reg_set (sched_dump, set, NULL, false);
+  fprintf (sched_dump, "}\n");
 }
 
 /* Pretty print INSN.  This is used as a hook.  */
diff --git a/gcc/sel-sched-dump.h b/gcc/sel-sched-dump.h
index 2a207cefda2..e5095b2e986 100644
--- a/gcc/sel-sched-dump.h
+++ b/gcc/sel-sched-dump.h
@@ -214,7 +214,7 @@ extern void dump_av_set (av_set_t);
 extern void dump_lv_set (regset);
 extern void dump_blist (blist_t);
 extern void dump_flist (flist_t);
-extern void dump_hard_reg_set (const char *, HARD_REG_SET);
+DEBUG_FUNCTION extern void dump_hard_reg_set (const char *, HARD_REG_SET);
 extern void sel_debug_cfg_1 (int);
 extern void sel_debug_cfg (void);
 extern void setup_dump_cfg_params (void);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (3 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 2/8] Unify implementations of print_hard_reg_set() Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 12:59   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists() Maxim Kuvyrkov
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

This patch simplifies logic behind deps_join(), which will be
important for the upcoming improvements of sched-deps.cc logging.

The only functional change is that when deps_join() is called with
empty state for the 2nd argument, it will not reverse INSN_ and
EXPR_LISTs in the 1st argument.  Before this patch the lists were
reversed due to use of concat_*_LIST().  Now, with copy_*_LIST()
used for this case, the lists will remain in the original order.

gcc/ChangeLog:

	* lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions.
	* rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare.
	* sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers.
	(concat_insn_mem_list): Simplify.
	(deps_join): Update
---
 gcc/lists.cc     | 30 +++++++++++++++++++++++++++-
 gcc/rtl.h        |  4 +++-
 gcc/sched-rgn.cc | 51 +++++++++++++++++++++++++++---------------------
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/gcc/lists.cc b/gcc/lists.cc
index 2cdf37ad533..83e7bf32176 100644
--- a/gcc/lists.cc
+++ b/gcc/lists.cc
@@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp)
   free_list ((rtx *)listp, &unused_insn_list);
 }
 
+/* Make a copy of the EXPR_LIST list LINK and return it.  */
+rtx_expr_list *
+copy_EXPR_LIST (rtx_expr_list *link)
+{
+  rtx_expr_list *new_queue;
+  rtx_expr_list **pqueue = &new_queue;
+
+  for (; link; link = link->next ())
+    {
+      rtx x = link->element ();
+      rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x, NULL);
+      *pqueue = newlink;
+      pqueue = (rtx_expr_list **)&XEXP (newlink, 1);
+    }
+  *pqueue = NULL;
+  return new_queue;
+}
+
 /* Make a copy of the INSN_LIST list LINK and return it.  */
 rtx_insn_list *
 copy_INSN_LIST (rtx_insn_list *link)
@@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link)
   return new_queue;
 }
 
+/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD.  */
+rtx_expr_list *
+concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old)
+{
+  rtx_expr_list *new_rtx = old;
+  for (; copy; copy = copy->next ())
+    new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (), new_rtx);
+  return new_rtx;
+}
+
 /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD.  */
 rtx_insn_list *
 concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old)
 {
   rtx_insn_list *new_rtx = old;
-  for (; copy ; copy = copy->next ())
+  for (; copy; copy = copy->next ())
     {
       new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx);
       PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy));
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..7e952d7cbeb 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **);
 extern void free_INSN_LIST_list (rtx_insn_list **);
 extern void free_EXPR_LIST_node (rtx);
 extern void free_INSN_LIST_node (rtx);
+extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
 extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx);
+extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *);
 extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *);
+extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *);
 extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *);
-extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
 extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **);
 extern rtx remove_list_elem (rtx, rtx *);
 extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **);
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..da3ec0458ff 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail)
 
 static class deps_desc *bb_deps;
 
+/* Return a new insn_list with all the elements from the two input lists.  */
+static rtx_insn_list *
+concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
+{
+  if (!old)
+    return copy_INSN_LIST (copy);
+  return concat_INSN_LIST (copy, old);
+}
+
+/* Return a new expr_list with all the elements from the two input lists.  */
+static rtx_expr_list *
+concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
+{
+  if (!old)
+    return copy_EXPR_LIST (copy);
+  return concat_EXPR_LIST (copy, old);
+}
+
 static void
 concat_insn_mem_list (rtx_insn_list *copy_insns,
 		      rtx_expr_list *copy_mems,
 		      rtx_insn_list **old_insns_p,
 		      rtx_expr_list **old_mems_p)
 {
-  rtx_insn_list *new_insns = *old_insns_p;
-  rtx_expr_list *new_mems = *old_mems_p;
-
-  while (copy_insns)
-    {
-      new_insns = alloc_INSN_LIST (copy_insns->insn (), new_insns);
-      new_mems = alloc_EXPR_LIST (VOIDmode, copy_mems->element (), new_mems);
-      copy_insns = copy_insns->next ();
-      copy_mems = copy_mems->next ();
-    }
-
-  *old_insns_p = new_insns;
-  *old_mems_p = new_mems;
+  *old_insns_p = concat_insn_list (copy_insns, *old_insns_p);
+  *old_mems_p = concat_expr_list (copy_mems, *old_mems_p);
 }
 
 /* Join PRED_DEPS to the SUCC_DEPS.  */
@@ -2619,11 +2626,11 @@ deps_join (class deps_desc *succ_deps, class deps_desc *pred_deps)
       struct deps_reg *pred_rl = &pred_deps->reg_last[reg];
       struct deps_reg *succ_rl = &succ_deps->reg_last[reg];
 
-      succ_rl->uses = concat_INSN_LIST (pred_rl->uses, succ_rl->uses);
-      succ_rl->sets = concat_INSN_LIST (pred_rl->sets, succ_rl->sets);
+      succ_rl->uses = concat_insn_list (pred_rl->uses, succ_rl->uses);
+      succ_rl->sets = concat_insn_list (pred_rl->sets, succ_rl->sets);
       succ_rl->implicit_sets
-	= concat_INSN_LIST (pred_rl->implicit_sets, succ_rl->implicit_sets);
-      succ_rl->clobbers = concat_INSN_LIST (pred_rl->clobbers,
+	= concat_insn_list (pred_rl->implicit_sets, succ_rl->implicit_sets);
+      succ_rl->clobbers = concat_insn_list (pred_rl->clobbers,
                                             succ_rl->clobbers);
       succ_rl->uses_length += pred_rl->uses_length;
       succ_rl->clobbers_length += pred_rl->clobbers_length;
@@ -2641,10 +2648,10 @@ deps_join (class deps_desc *succ_deps, class deps_desc *pred_deps)
                         &succ_deps->pending_write_mems);
 
   succ_deps->pending_jump_insns
-    = concat_INSN_LIST (pred_deps->pending_jump_insns,
+    = concat_insn_list (pred_deps->pending_jump_insns,
                         succ_deps->pending_jump_insns);
   succ_deps->last_pending_memory_flush
-    = concat_INSN_LIST (pred_deps->last_pending_memory_flush,
+    = concat_insn_list (pred_deps->last_pending_memory_flush,
                         succ_deps->last_pending_memory_flush);
 
   succ_deps->pending_read_list_length += pred_deps->pending_read_list_length;
@@ -2653,17 +2660,17 @@ deps_join (class deps_desc *succ_deps, class deps_desc *pred_deps)
 
   /* last_function_call is inherited by successor.  */
   succ_deps->last_function_call
-    = concat_INSN_LIST (pred_deps->last_function_call,
+    = concat_insn_list (pred_deps->last_function_call,
                         succ_deps->last_function_call);
 
   /* last_function_call_may_noreturn is inherited by successor.  */
   succ_deps->last_function_call_may_noreturn
-    = concat_INSN_LIST (pred_deps->last_function_call_may_noreturn,
+    = concat_insn_list (pred_deps->last_function_call_may_noreturn,
                         succ_deps->last_function_call_may_noreturn);
 
   /* sched_before_next_call is inherited by successor.  */
   succ_deps->sched_before_next_call
-    = concat_INSN_LIST (pred_deps->sched_before_next_call,
+    = concat_insn_list (pred_deps->sched_before_next_call,
                         succ_deps->sched_before_next_call);
 }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (4 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 13:01   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis Maxim Kuvyrkov
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

Better propagate flags from dump_lists() into dump_dep() and
add a missing "]" in dump_lists().

gcc/ChangeLog:

	* sched-deps.cc (DUMP_DEP_PRO): Improve comment.
	(dump_dep_flags): Remove.
	(DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue
	numbering from DUMP_DEP_* flags.
	(dump_lists): Update and fix.
---
 gcc/sched-deps.cc | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 005fc0f567e..4d357079a7a 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t);
 /* Define flags for dump_dep ().  */
 
 /* Dump producer of the dependence.  */
-#define DUMP_DEP_PRO (2)
+#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and
+			    DUMP_LISTS_ALL.  */
 
 /* Dump consumer of the dependence.  */
 #define DUMP_DEP_CON (4)
@@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags)
   fprintf (dump, ">");
 }
 
-/* Default flags for dump_dep ().  */
-static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON);
-
 /* Dump all fields of DEP to STDERR.  */
 void
 sd_debug_dep (dep_t dep)
@@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it)
 }
 
 /* Dump size of the lists.  */
-#define DUMP_LISTS_SIZE (2)
+#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1)  */
 
 /* Dump dependencies of the lists.  */
-#define DUMP_LISTS_DEPS (4)
+#define DUMP_LISTS_DEPS (64)
 
 /* Dump all information about the lists.  */
 #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS)
 
 /* Dump deps_lists of INSN specified by TYPES to DUMP.
-   FLAGS is a bit mask specifying what information about the lists needs
-   to be printed.
+   FLAGS is a bit mask specifying what information about the lists and
+   the individual deps needs to be printed, this is a combination of
+   DUMP_DEP_* and DUMP_LISTS_* flags.
    If FLAGS has the very first bit set, then dump all information about
-   the lists and propagate this bit into the callee dump functions.  */
+   the lists and deps propagate this bit into the callee dump functions.  */
 static void
 dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
 {
@@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
     {
       FOR_EACH_DEP (insn, types, sd_it, dep)
 	{
-	  dump_dep (dump, dep, dump_dep_flags | all);
+	  dump_dep (dump, dep, flags | all);
 	  fprintf (dump, " ");
 	}
     }
+
+  fprintf (dump, "]");
 }
 
 /* Dump all information about deps_lists of INSN specified by TYPES
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (5 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists() Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 13:04   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts Maxim Kuvyrkov
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

gcc/ChangeLog:

	* sched-deps.cc (sd_add_dep, find_inc): Add logging about
	dependency creation.
---
 gcc/sched-deps.cc | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 4d357079a7a..2a87158ba4b 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p)
      in the bitmap caches of dependency information.  */
   if (true_dependency_cache != NULL)
     set_dependency_caches (dep);
+
+  if (sched_verbose >= 9)
+    {
+      fprintf (sched_dump, "created dependency ");
+      dump_dep (sched_dump, dep, 1);
+      fprintf (sched_dump, "\n");
+    }
 }
 
 /* Add or update backward dependence between INSN and ELEM
@@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards)
 	     we create.  */
 	  gcc_assert (mii->inc_insn == inc_cand);
 
+	  int n_deps_created = 0;
 	  if (backwards)
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
-		add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
-				  REG_DEP_TRUE);
+		{
+		  add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
+				    REG_DEP_TRUE);
+		  ++n_deps_created;
+		}
 	    }
 	  else
 	    {
 	      FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
-		add_dependence_1 (DEP_CON (dep), mii->mem_insn,
-				  REG_DEP_ANTI);
+		{
+		  add_dependence_1 (DEP_CON (dep), mii->mem_insn,
+				    REG_DEP_ANTI);
+		  ++n_deps_created;
+		}
 	    }
+	  if (sched_verbose >= 6)
+	    fprintf (sched_dump,
+		     "created %d deps for mem_insn %d due to "
+		     "inc_insn %d %s deps\n",
+		     n_deps_created, INSN_UID (mii->mem_insn),
+		     INSN_UID (mii->inc_insn),
+		     backwards ? "backward" : "forward");
+
 	  return true;
 	}
     next:
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (6 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 13:05   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 8/8] Improve logging of scheduler dependency analysis context Maxim Kuvyrkov
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

gcc/ChangeLog:

	* sched-deps.cc (init_deps, init_deps_reg_last): Simplify.
	(free_deps): Remove useless code.
---
 gcc/sched-deps.cc | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 2a87158ba4b..e0d3c97d935 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
   int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num ());
 
   deps->max_reg = max_reg;
-  if (lazy_reg_last)
-    deps->reg_last = NULL;
-  else
-    deps->reg_last = XCNEWVEC (struct deps_reg, max_reg);
+  deps->reg_last = NULL;
+  if (!lazy_reg_last)
+    init_deps_reg_last (deps);
   INIT_REG_SET (&deps->reg_last_in_use);
 
   deps->pending_read_insns = 0;
@@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
 void
 init_deps_reg_last (class deps_desc *deps)
 {
-  gcc_assert (deps && deps->max_reg > 0);
-  gcc_assert (deps->reg_last == NULL);
-
+  gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL);
   deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg);
 }
 
@@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps)
      it at all.  */
   free (deps->reg_last);
   deps->reg_last = NULL;
-
-  deps = NULL;
 }
 
 /* Remove INSN from dependence contexts DEPS.  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (7 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 13:06   ` Maxim Kuvyrkov
  2023-11-22 11:14 ` [PATCH v3 8/8] Improve logging of scheduler dependency analysis context Maxim Kuvyrkov
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

Scheduler dependency analysis uses two main data structures:
1. reg_pending_* data contains effects of INSN on the register file,
   which is then incorporated into ...
2. deps_desc object, which contains commulative information about all
   instructions processed from deps_desc object's initialization.

This patch adds debug dumping of (1).

gcc/ChangeLog:

	* sched-deps.cc (print-rtl.h): Include for str_pattern_slim().
	(dump_reg_pending_data): New function.
	(sched_analyze_insn): Use it.
---
 gcc/sched-deps.cc | 90 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index e0d3c97d935..f9290c82fd2 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -432,10 +433,24 @@ dep_spec_p (dep_t dep)
   return false;
 }
 
+/* These regsets describe how a single instruction affects registers.
+   Their "life-time" is restricted to a single call of sched_analyze_insn().
+   They are populated by sched_analyze_1() and sched_analyze_2(), and
+   then sched_analyze_insn() transfers data from these into deps->reg_last[i].
+   Near the end sched_analyze_insn() clears these regsets for the next
+   insn.  */
 static regset reg_pending_sets;
 static regset reg_pending_clobbers;
 static regset reg_pending_uses;
 static regset reg_pending_control_uses;
+
+/* Similar to reg_pending_* regsets, this variable specifies whether
+   the current insn analyzed by sched_analyze_insn() is a scheduling
+   barrier that should "split" dependencies inside a block.  Internally
+   sched-deps.cc does this by pretending that the barrier insn uses and
+   sets all registers.
+   Near the end sched_analyze_insn() transfers barrier info from this variable
+   into deps->last_reg_pending_barrier.  */
 static enum reg_pending_barrier_mode reg_pending_barrier;
 
 /* Hard registers implicitly clobbered or used (or may be implicitly
@@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET *temp, rtx_insn *insn)
   *temp &= ~ira_no_alloc_regs;
 }
 
-/* Analyze an INSN with pattern X to find all dependencies.  */
+/* Dump state of reg_pending_* data for debug purposes.
+   Dump only non-empty data to reduce log clobber.  */
+static void
+dump_reg_pending_data (FILE *file, rtx_insn *insn)
+{
+  fprintf (file, "\n");
+  fprintf (file, ";; sched_analysis after insn %d: %s\n",
+	   INSN_UID (insn), str_pattern_slim (PATTERN (insn)));
+
+  if (!REG_SET_EMPTY_P (reg_pending_sets)
+      || !REG_SET_EMPTY_P (reg_pending_clobbers)
+      || !REG_SET_EMPTY_P (reg_pending_uses)
+      || !REG_SET_EMPTY_P (reg_pending_control_uses))
+    {
+      fprintf (file, ";; insn reg");
+      if (!REG_SET_EMPTY_P (reg_pending_sets))
+	{
+	  fprintf (file, " sets(");
+	  dump_regset (reg_pending_sets, file);
+	  fprintf (file, ")");
+	}
+      if (!REG_SET_EMPTY_P (reg_pending_clobbers))
+	{
+	  fprintf (file, " clobbers(");
+	  dump_regset (reg_pending_clobbers, file);
+	  fprintf (file, ")");
+	}
+      if (!REG_SET_EMPTY_P (reg_pending_uses))
+	{
+	  fprintf (file, " uses(");
+	  dump_regset (reg_pending_uses, file);
+	  fprintf (file, ")");
+	}
+      if (!REG_SET_EMPTY_P (reg_pending_control_uses))
+	{
+	  fprintf (file, " control(");
+	  dump_regset (reg_pending_control_uses, file);
+	  fprintf (file, ")");
+	}
+      fprintf (file, "\n");
+    }
+
+  if (reg_pending_barrier)
+    fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier);
+
+  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)
+      || !hard_reg_set_empty_p (implicit_reg_pending_uses))
+    {
+      fprintf (file, ";; insn reg");
+      if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers))
+	{
+	  print_hard_reg_set (file, implicit_reg_pending_clobbers,
+			      " implicit clobbers(", false);
+	  fprintf (file, ")");
+	}
+      if (!hard_reg_set_empty_p (implicit_reg_pending_uses))
+	{
+	  print_hard_reg_set (file, implicit_reg_pending_uses,
+			      " implicit uses(", false);
+	  fprintf (file, ")");
+	}
+      fprintf (file, "\n");
+    }
+}
+
+/* Analyze an INSN with pattern X to find all dependencies.
+   This analysis uses two main data structures:
+   1. reg_pending_* data contains effects of INSN on the register file,
+      which is then incorporated into ...
+   2. deps_desc object, which contains commulative information about all
+      instructions processed from deps_desc object's initialization.  */
 static void
 sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn)
 {
@@ -3328,6 +3413,9 @@ sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn)
       deps->last_reg_pending_barrier = reg_pending_barrier;
     }
 
+  if (sched_verbose >= 8)
+    dump_reg_pending_data (sched_dump, insn);
+
   CLEAR_REG_SET (reg_pending_uses);
   CLEAR_REG_SET (reg_pending_clobbers);
   CLEAR_REG_SET (reg_pending_sets);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 8/8] Improve logging of scheduler dependency analysis context
  2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
                   ` (8 preceding siblings ...)
  2023-11-22 11:14 ` [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis Maxim Kuvyrkov
@ 2023-11-22 11:14 ` Maxim Kuvyrkov
  2024-01-15 13:08   ` Maxim Kuvyrkov
  9 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2023-11-22 11:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: Maxim Kuvyrkov, Bernd Schmidt, Vladimir Makarov, Jeff Law,
	Alexander Monakov, Richard Guenther

Scheduler dependency analysis uses two main data structures:
1. reg_pending_* data contains effects of INSN on the register file,
   which is then incorporated into ...
2. deps_desc object, which contains commulative information about all
   instructions processed from deps_desc object's initialization.

This patch adds debug dumping of (2).

Dependency analysis contexts (aka deps_desc objects) are huge, but
each instruction affects only a small amount of data in these objects.
Therefore, it is most useful to dump differential information
compared to the dependency state after previous instruction.

gcc/ChangeLog:

	* sched-deps.cc (reset_deps, dump_rtx_insn_list)
	(rtx_insn_list_same_p):	New helper functions.
	(dump_deps_desc_diff): New function to dump dependency information.
	(sched_analysis_prev_deps): New static variable.
	(sched_analyze_insn): Dump dependency information.
	(init_deps_global, finish_deps_global): Handle sched_analysis_prev_deps.
	* sched-int.h (struct deps_reg): Update comments.
	* sched-rgn.cc (concat_insn_list, concat_expr_list): Update comments.
---
 gcc/sched-deps.cc | 197 ++++++++++++++++++++++++++++++++++++++++++++++
 gcc/sched-int.h   |   9 ++-
 gcc/sched-rgn.cc  |   5 ++
 3 files changed, 210 insertions(+), 1 deletion(-)

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index f9290c82fd2..edca9927e23 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn)
     sd_delete_dep (sd_it);
 }
 
+/* Re-initialize existing dependency context DEPS to be a copy of FROM.  */
+static void
+reset_deps (class deps_desc *deps, class deps_desc *from)
+{
+  free_deps (deps);
+  init_deps (deps, false);
+  deps_join (deps, from);
+}
+
 /* All insns in a scheduling group except the first should only have
    dependencies on the previous insn in the group.  So we find the
    first instruction in the scheduling group by walking the dependence
@@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn)
     }
 }
 
+/* Dump rtx_insn_list LIST.
+   Consider moving to lists.cc if there are users outside of sched-deps.cc.  */
+static void
+dump_rtx_insn_list (FILE *file, rtx_insn_list *list)
+{
+  for (; list; list = list->next ())
+    fprintf (file, " %d", INSN_UID (list->insn ()));
+}
+
+/* Return TRUE if lists A and B have same elements in the same order.  */
+static bool
+rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b)
+{
+  for (; a && b; a = a->next (), b = b->next ())
+    if (a->insn () != b->insn ())
+      return false;
+
+  if (a || b)
+    return false;
+
+  return true;
+}
+
+/* Dump parts of DEPS that are different from PREV.
+   Dumping all information from dependency context produces huge
+   hard-to-analize logs; differential dumping is way more managable.  */
+static void
+dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc *prev)
+{
+  /* Each "paragraph" is a single line of output.  */
+
+  /* Note on param_max_pending_list_length:
+     During normal dependency analysis various lists should not exceed this
+     limit.  Searching for "!!!" in scheduler logs can point to potential bugs
+     or poorly-handled corner-cases.  */
+
+  if (!rtx_insn_list_same_p (deps->pending_read_insns,
+			     prev->pending_read_insns))
+    {
+      fprintf (file, ";; deps pending mem reads length(%d):",
+	       deps->pending_read_list_length);
+      if ((deps->pending_read_list_length + deps->pending_write_list_length)
+	  >= param_max_pending_list_length)
+	fprintf (file, "%d insns!!!", deps->pending_read_list_length);
+      else
+	dump_rtx_insn_list (file, deps->pending_read_insns);
+      fprintf (file, "\n");
+    }
+
+  if (!rtx_insn_list_same_p (deps->pending_write_insns,
+			     prev->pending_write_insns))
+    {
+      fprintf (file, ";; deps pending mem writes length(%d):",
+	       deps->pending_write_list_length);
+      if ((deps->pending_read_list_length + deps->pending_write_list_length)
+	  >= param_max_pending_list_length)
+	fprintf (file, "%d insns!!!", deps->pending_write_list_length);
+      else
+	dump_rtx_insn_list (file, deps->pending_write_insns);
+      fprintf (file, "\n");
+    }
+
+  if (!rtx_insn_list_same_p (deps->pending_jump_insns,
+			     prev->pending_jump_insns))
+    {
+      fprintf (file, ";; deps pending jump length(%d):",
+	       deps->pending_flush_length);
+      if (deps->pending_flush_length >= param_max_pending_list_length)
+	fprintf (file, "%d insns!!!", deps->pending_flush_length);
+      else
+	dump_rtx_insn_list (file, deps->pending_jump_insns);
+      fprintf (file, "\n");
+    }
+
+  fprintf (file, ";; last");
+  if (!rtx_insn_list_same_p (deps->last_pending_memory_flush,
+			     prev->last_pending_memory_flush))
+    {
+      fprintf (file, " memory_flush(");
+      dump_rtx_insn_list (file, deps->last_pending_memory_flush);
+      fprintf (file, ")");
+    }
+  if (!rtx_insn_list_same_p (deps->last_function_call,
+			     prev->last_function_call))
+    {
+      fprintf (file, " call(");
+      dump_rtx_insn_list (file, deps->last_function_call);
+      fprintf (file, ")");
+    }
+  if (!rtx_insn_list_same_p (deps->last_function_call_may_noreturn,
+			     prev->last_function_call_may_noreturn))
+    {
+      fprintf (file, " noreturn(");
+      dump_rtx_insn_list (file, deps->last_function_call_may_noreturn);
+      fprintf (file, ")");
+    }
+  fprintf (file, "\n");
+
+  fprintf (file, ";; sched_before_next");
+  if (!rtx_insn_list_same_p (deps->sched_before_next_call,
+			     prev->sched_before_next_call))
+    {
+      fprintf (file, " call(");
+      dump_rtx_insn_list (file, deps->sched_before_next_call);
+      fprintf (file, ")");
+    }
+  if (!rtx_insn_list_same_p (deps->sched_before_next_jump,
+			     prev->sched_before_next_jump))
+    {
+      fprintf (file, " jump(");
+      dump_rtx_insn_list (file, deps->sched_before_next_jump);
+      fprintf (file, ")");
+    }
+  fprintf (file, " in_post_call_group_p(%d)\n", deps->in_post_call_group_p);
+
+  fprintf (file, ";; last_debug_insn(%d) last_args_size(%d) last_prologue(",
+	   deps->last_debug_insn ? INSN_UID (deps->last_debug_insn) : -1,
+	   deps->last_args_size ? INSN_UID (deps->last_args_size) : -1);
+  dump_rtx_insn_list (file, deps->last_prologue);
+  fprintf (file, ") last_epilogue(");
+  dump_rtx_insn_list (file, deps->last_epilogue);
+  fprintf (file, ") last_logue_was_epilogue(%d)\n",
+	   deps->last_logue_was_epilogue);
+
+  int i;
+
+  gcc_assert (deps->max_reg == prev->max_reg);
+  for (i = 0; i < deps->max_reg; ++i)
+    {
+      struct deps_reg *reg_last = &deps->reg_last[i];
+      struct deps_reg *reg_prev = &prev->reg_last[i];
+
+      if (rtx_insn_list_same_p (reg_last->uses, reg_prev->uses)
+	  && rtx_insn_list_same_p (reg_last->sets, reg_prev->sets)
+	  && rtx_insn_list_same_p (reg_last->implicit_sets,
+				   reg_prev->implicit_sets)
+	  && rtx_insn_list_same_p (reg_last->control_uses,
+				   reg_prev->control_uses)
+	  && rtx_insn_list_same_p (reg_last->clobbers,
+				   reg_prev->clobbers))
+	continue;
+
+      fprintf (file, ";; reg %u: uses(", i);
+      if (reg_last->uses_length >= param_max_pending_list_length)
+	fprintf (file, "%d insns!!!", reg_last->uses_length);
+      else
+	dump_rtx_insn_list (file, reg_last->uses);
+      if (reg_last->clobbers_length >= param_max_pending_list_length)
+	{
+	  fprintf (file, ") clobbers(");
+	  fprintf (file, "%d insns!!!", reg_last->clobbers_length);
+	}
+      else
+	{
+	  fprintf (file, ") sets(");
+	  dump_rtx_insn_list (file, reg_last->sets);
+	  fprintf (file, ") implicit_sets(");
+	  dump_rtx_insn_list (file, reg_last->implicit_sets);
+	  fprintf (file, ") control_uses(");
+	  dump_rtx_insn_list (file, reg_last->control_uses);
+	  fprintf (file, ") clobbers(");
+	  dump_rtx_insn_list (file, reg_last->clobbers);
+	}
+      fprintf (file, ")\n");
+    }
+}
+
+/* Dependency analysis state before current insn.
+   Used by dump_deps_desc_diff().  */
+static class deps_desc *sched_analysis_prev_deps;
+
 /* Analyze an INSN with pattern X to find all dependencies.
    This analysis uses two main data structures:
    1. reg_pending_* data contains effects of INSN on the register file,
@@ -3627,6 +3807,16 @@ sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn)
 	  deps->last_logue_was_epilogue = true;
 	}
     }
+
+    if (sched_verbose >= 8)
+      {
+	dump_deps_desc_diff (sched_dump, deps, sched_analysis_prev_deps);
+	reset_deps (sched_analysis_prev_deps, deps);
+	fprintf (sched_dump, ";; ");
+	dump_lists (sched_dump, insn, SD_LIST_BACK,
+		    DUMP_LISTS_ALL | DUMP_DEP_PRO);
+	fprintf (sched_dump, "\n");
+      }
 }
 
 /* Return TRUE if INSN might not always return normally (e.g. call exit,
@@ -4305,6 +4495,9 @@ init_deps_global (void)
       sched_deps_info->note_mem_dep = haifa_note_mem_dep;
       sched_deps_info->note_dep = haifa_note_dep;
    }
+
+  sched_analysis_prev_deps = XNEW (class deps_desc);
+  init_deps (sched_analysis_prev_deps, false);
 }
 
 /* Free everything used by the dependency analysis code.  */
@@ -4316,6 +4509,10 @@ finish_deps_global (void)
   FREE_REG_SET (reg_pending_clobbers);
   FREE_REG_SET (reg_pending_uses);
   FREE_REG_SET (reg_pending_control_uses);
+
+  free_deps (sched_analysis_prev_deps);
+  free (sched_analysis_prev_deps);
+  sched_analysis_prev_deps = NULL;
 }
 
 /* Estimate the weakness of dependence between MEM1 and MEM2.  */
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 64a2f0bcff9..8a3109ce86e 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -455,7 +455,9 @@ struct deps_reg
   int clobbers_length;
 };
 
-/* Describe state of dependencies used during sched_analyze phase.  */
+/* Describe state of dependencies used during sched_analyze phase.
+   Please consider updating sched-deps.cc:dump_deps_desc_diff() when adding
+   new fields.  */
 class deps_desc
 {
 public:
@@ -499,6 +501,11 @@ public:
      large lists.  */
   int pending_flush_length;
 
+  /* Below are lists (and not just a single instructions) to allow inter-block
+     dependency analysis.  Each block
+     may add a single insn to below lists, but when merging dependency
+     analysis context from multiple predecessor blocks, we may get a list.  */
+
   /* The last insn upon which all memory references must depend.
      This is an insn which flushed the pending lists, creating a dependency
      between it and all previously pending memory references.  This creates
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index da3ec0458ff..72f0a2de66e 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -2590,6 +2590,10 @@ static rtx_insn_list *
 concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
 {
   if (!old)
+    /* concat_*_LIST() will return a reversed copy of COPY.  Working with
+       reversed copies would complicate dependency dumping in
+       dump_deps_desc_diff(), which internally uses reset_deps() and
+       deps_join().  Therefore, use copy_INSN_LIST() when OLD is NULL.  */
     return copy_INSN_LIST (copy);
   return concat_INSN_LIST (copy, old);
 }
@@ -2599,6 +2603,7 @@ static rtx_expr_list *
 concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
 {
   if (!old)
+    /* See comment in concat_insn_list() above.  */
     return copy_EXPR_LIST (copy);
   return concat_EXPR_LIST (copy, old);
 }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/8] Unify implementations of print_hard_reg_set()
  2023-11-22 11:14 ` [PATCH v3 2/8] Unify implementations of print_hard_reg_set() Maxim Kuvyrkov
@ 2023-11-22 15:04   ` Vladimir Makarov
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Makarov @ 2023-11-22 15:04 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches
  Cc: Bernd Schmidt, Jeff Law, Alexander Monakov, Richard Guenther


On 11/22/23 06:14, Maxim Kuvyrkov wrote:
> We currently have 3 implementations of print_hard_reg_set()
> (all with the same name!) in ira-color.cc, ira-conflicts.cc, and
> sel-sched-dump.cc.  This patch generalizes implementation in
> ira-color.cc, and uses it in all other places.  The declaration
> is added to hard-reg-set.h.
>
> The motivation for this patch is the [upcoming] need for
> print_hard_reg_set() in sched-deps.cc.
>
> gcc/ChangeLog:
>
> 	* hard-reg-set.h (print_hard_reg_set): Declare.
> 	* ira-color.cc (print_hard_reg_set): Generalize a bit.
> 	(debug_hard_reg_set, print_hard_regs_subforest,)
> 	(setup_allocno_available_regs_num): Update.
> 	* ira-conflicts.cc (print_hard_reg_set): Remove.
> 	(print_allocno_conflicts): Use global print_hard_reg_set().
> 	* sel-sched-dump.cc (print_hard_reg_set): Remove.
> 	(dump_hard_reg_set): Use global print_hard_reg_set().
> 	* sel-sched-dump.h (dump_hard_reg_set): Mark as DEBUG_FUNCTION.
>
OK for me.  Thank you for consolidation of the print code, Maxim.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2023-11-22 11:14 ` [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
@ 2024-01-15 12:56   ` Maxim Kuvyrkov
  2024-01-15 18:26     ` Vladimir Makarov
  2024-01-16 14:52     ` Jeff Law
  0 siblings, 2 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 12:56 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 5256 bytes --]

Hi Vladimir,
Hi Jeff,

Richard and Alexander have reviewed this patch and [I assume] have no
further comments.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> ===
> sp=sp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=sp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> ===
>
> [For simplicity, let's assume find_inc(backwards==true)].
> In this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/96388
>         PR rtl-optimization/111554
>         * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 48 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..005fc0f567e 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii,
> rtx_insn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding data
>     in MII has been filled in, this function is called to find a suitable
>     add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies
> between
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if
> backwards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if
> !backwards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>    sd_iterator_def sd_it;
>    dep_t dep;
> +  sd_list_types_def mem_deps = backwards ? SD_LIST_HARD_BACK :
> SD_LIST_FORW;
> +  int n_mem_deps = sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it = sd_iterator_start (mii->mem_insn,
> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW);
> +  sd_it = sd_iterator_start (mii->mem_insn, mem_deps);
>    while (sd_iterator_cond (&sd_it, &dep))
>      {
>        dep_node_t node = DEP_LINK_NODE (*sd_it.linkp);
>        rtx_insn *pro = DEP_PRO (dep);
>        rtx_insn *con = DEP_CON (dep);
> -      rtx_insn *inc_cand = backwards ? pro : con;
> +      rtx_insn *inc_cand;
> +      int n_inc_deps;
> +
>        if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
>         goto next;
> +
> +      if (backwards)
> +       {
> +         inc_cand = pro;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_BACK);
> +       }
> +      else
> +       {
> +         inc_cand = con;
> +         n_inc_deps = sd_lists_size (inc_cand, SD_LIST_FORW);
> +       }
> +
> +      /* In the FOR_EACH_DEP loop below we will create additional
> n_inc_deps
> +        for mem_insn.  This by itself is not a problem, since each
> mem_insn
> +        will have only a few inc_insns associated with it.  However, if
> +        we consider that a single inc_insn may have a lot of mem_insns,
> AND,
> +        on top of that, a few other inc_insns associated with it --
> +        those _other inc_insns_ will get (n_mem_deps * number of MEM
> insns)
> +        dependencies created for them.  This may cause an exponential
> +        growth of memory usage and scheduling time.
> +        See PR96388 for details.
> +        We [heuristically] use n_inc_deps as a proxy for the number of MEM
> +        insns, and drop opportunities for breaking modifiable_mem
> dependencies
> +        when dependency lists grow beyond reasonable size.  */
> +      if (n_mem_deps * n_inc_deps
> +         >= param_max_pending_list_length * param_max_pending_list_length)
> +       goto next;
> +
>        if (parse_add_or_inc (mii, inc_cand, backwards))
>         {
>           struct dep_replacement *desc;
> @@ -4838,6 +4873,11 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>           desc->insn = mii->mem_insn;
>           move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
>                          INSN_SPEC_BACK_DEPS (con));
> +
> +         /* Make sure that n_inc_deps above is consistent with
> dependencies
> +            we create.  */
> +         gcc_assert (mii->inc_insn == inc_cand);
> +
>           if (backwards)
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc
  2023-11-22 11:14 ` [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc Maxim Kuvyrkov
@ 2024-01-15 12:59   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 12:59 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 8379 bytes --]

Dear RTL maintainers,

Gently ping.  This patch adds a couple of new functions to lists.cc, which
then are used to simplify logic in the scheduler.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> This patch simplifies logic behind deps_join(), which will be
> important for the upcoming improvements of sched-deps.cc logging.
>
> The only functional change is that when deps_join() is called with
> empty state for the 2nd argument, it will not reverse INSN_ and
> EXPR_LISTs in the 1st argument.  Before this patch the lists were
> reversed due to use of concat_*_LIST().  Now, with copy_*_LIST()
> used for this case, the lists will remain in the original order.
>
> gcc/ChangeLog:
>
>         * lists.cc (copy_EXPR_LIST, concat_EXPR_LIST): New functions.
>         * rtl.h (copy_EXPR_LIST, concat_EXPR_LIST): Declare.
>         * sched-rgn.cc (concat_insn_list, concat_expr_list): New helpers.
>         (concat_insn_mem_list): Simplify.
>         (deps_join): Update
> ---
>  gcc/lists.cc     | 30 +++++++++++++++++++++++++++-
>  gcc/rtl.h        |  4 +++-
>  gcc/sched-rgn.cc | 51 +++++++++++++++++++++++++++---------------------
>  3 files changed, 61 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/lists.cc b/gcc/lists.cc
> index 2cdf37ad533..83e7bf32176 100644
> --- a/gcc/lists.cc
> +++ b/gcc/lists.cc
> @@ -160,6 +160,24 @@ free_INSN_LIST_list (rtx_insn_list **listp)
>    free_list ((rtx *)listp, &unused_insn_list);
>  }
>
> +/* Make a copy of the EXPR_LIST list LINK and return it.  */
> +rtx_expr_list *
> +copy_EXPR_LIST (rtx_expr_list *link)
> +{
> +  rtx_expr_list *new_queue;
> +  rtx_expr_list **pqueue = &new_queue;
> +
> +  for (; link; link = link->next ())
> +    {
> +      rtx x = link->element ();
> +      rtx_expr_list *newlink = alloc_EXPR_LIST (REG_NOTE_KIND (link), x,
> NULL);
> +      *pqueue = newlink;
> +      pqueue = (rtx_expr_list **)&XEXP (newlink, 1);
> +    }
> +  *pqueue = NULL;
> +  return new_queue;
> +}
> +
>  /* Make a copy of the INSN_LIST list LINK and return it.  */
>  rtx_insn_list *
>  copy_INSN_LIST (rtx_insn_list *link)
> @@ -178,12 +196,22 @@ copy_INSN_LIST (rtx_insn_list *link)
>    return new_queue;
>  }
>
> +/* Duplicate the EXPR_LIST elements of COPY and prepend them to OLD.  */
> +rtx_expr_list *
> +concat_EXPR_LIST (rtx_expr_list *copy, rtx_expr_list *old)
> +{
> +  rtx_expr_list *new_rtx = old;
> +  for (; copy; copy = copy->next ())
> +    new_rtx = alloc_EXPR_LIST (REG_NOTE_KIND (copy), copy->element (),
> new_rtx);
> +  return new_rtx;
> +}
> +
>  /* Duplicate the INSN_LIST elements of COPY and prepend them to OLD.  */
>  rtx_insn_list *
>  concat_INSN_LIST (rtx_insn_list *copy, rtx_insn_list *old)
>  {
>    rtx_insn_list *new_rtx = old;
> -  for (; copy ; copy = copy->next ())
> +  for (; copy; copy = copy->next ())
>      {
>        new_rtx = alloc_INSN_LIST (copy->insn (), new_rtx);
>        PUT_REG_NOTE_KIND (new_rtx, REG_NOTE_KIND (copy));
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e4b6cc0dbb5..7e952d7cbeb 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3764,10 +3764,12 @@ extern void free_EXPR_LIST_list (rtx_expr_list **);
>  extern void free_INSN_LIST_list (rtx_insn_list **);
>  extern void free_EXPR_LIST_node (rtx);
>  extern void free_INSN_LIST_node (rtx);
> +extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern rtx_insn_list *alloc_INSN_LIST (rtx, rtx);
> +extern rtx_expr_list *copy_EXPR_LIST (rtx_expr_list *);
>  extern rtx_insn_list *copy_INSN_LIST (rtx_insn_list *);
> +extern rtx_expr_list *concat_EXPR_LIST (rtx_expr_list *, rtx_expr_list *);
>  extern rtx_insn_list *concat_INSN_LIST (rtx_insn_list *, rtx_insn_list *);
> -extern rtx_expr_list *alloc_EXPR_LIST (int, rtx, rtx);
>  extern void remove_free_INSN_LIST_elem (rtx_insn *, rtx_insn_list **);
>  extern rtx remove_list_elem (rtx, rtx *);
>  extern rtx_insn *remove_free_INSN_LIST_node (rtx_insn_list **);
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..da3ec0458ff 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -2585,25 +2585,32 @@ add_branch_dependences (rtx_insn *head, rtx_insn
> *tail)
>
>  static class deps_desc *bb_deps;
>
> +/* Return a new insn_list with all the elements from the two input
> lists.  */
> +static rtx_insn_list *
> +concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
> +{
> +  if (!old)
> +    return copy_INSN_LIST (copy);
> +  return concat_INSN_LIST (copy, old);
> +}
> +
> +/* Return a new expr_list with all the elements from the two input
> lists.  */
> +static rtx_expr_list *
> +concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
> +{
> +  if (!old)
> +    return copy_EXPR_LIST (copy);
> +  return concat_EXPR_LIST (copy, old);
> +}
> +
>  static void
>  concat_insn_mem_list (rtx_insn_list *copy_insns,
>                       rtx_expr_list *copy_mems,
>                       rtx_insn_list **old_insns_p,
>                       rtx_expr_list **old_mems_p)
>  {
> -  rtx_insn_list *new_insns = *old_insns_p;
> -  rtx_expr_list *new_mems = *old_mems_p;
> -
> -  while (copy_insns)
> -    {
> -      new_insns = alloc_INSN_LIST (copy_insns->insn (), new_insns);
> -      new_mems = alloc_EXPR_LIST (VOIDmode, copy_mems->element (),
> new_mems);
> -      copy_insns = copy_insns->next ();
> -      copy_mems = copy_mems->next ();
> -    }
> -
> -  *old_insns_p = new_insns;
> -  *old_mems_p = new_mems;
> +  *old_insns_p = concat_insn_list (copy_insns, *old_insns_p);
> +  *old_mems_p = concat_expr_list (copy_mems, *old_mems_p);
>  }
>
>  /* Join PRED_DEPS to the SUCC_DEPS.  */
> @@ -2619,11 +2626,11 @@ deps_join (class deps_desc *succ_deps, class
> deps_desc *pred_deps)
>        struct deps_reg *pred_rl = &pred_deps->reg_last[reg];
>        struct deps_reg *succ_rl = &succ_deps->reg_last[reg];
>
> -      succ_rl->uses = concat_INSN_LIST (pred_rl->uses, succ_rl->uses);
> -      succ_rl->sets = concat_INSN_LIST (pred_rl->sets, succ_rl->sets);
> +      succ_rl->uses = concat_insn_list (pred_rl->uses, succ_rl->uses);
> +      succ_rl->sets = concat_insn_list (pred_rl->sets, succ_rl->sets);
>        succ_rl->implicit_sets
> -       = concat_INSN_LIST (pred_rl->implicit_sets,
> succ_rl->implicit_sets);
> -      succ_rl->clobbers = concat_INSN_LIST (pred_rl->clobbers,
> +       = concat_insn_list (pred_rl->implicit_sets,
> succ_rl->implicit_sets);
> +      succ_rl->clobbers = concat_insn_list (pred_rl->clobbers,
>                                              succ_rl->clobbers);
>        succ_rl->uses_length += pred_rl->uses_length;
>        succ_rl->clobbers_length += pred_rl->clobbers_length;
> @@ -2641,10 +2648,10 @@ deps_join (class deps_desc *succ_deps, class
> deps_desc *pred_deps)
>                          &succ_deps->pending_write_mems);
>
>    succ_deps->pending_jump_insns
> -    = concat_INSN_LIST (pred_deps->pending_jump_insns,
> +    = concat_insn_list (pred_deps->pending_jump_insns,
>                          succ_deps->pending_jump_insns);
>    succ_deps->last_pending_memory_flush
> -    = concat_INSN_LIST (pred_deps->last_pending_memory_flush,
> +    = concat_insn_list (pred_deps->last_pending_memory_flush,
>                          succ_deps->last_pending_memory_flush);
>
>    succ_deps->pending_read_list_length +=
> pred_deps->pending_read_list_length;
> @@ -2653,17 +2660,17 @@ deps_join (class deps_desc *succ_deps, class
> deps_desc *pred_deps)
>
>    /* last_function_call is inherited by successor.  */
>    succ_deps->last_function_call
> -    = concat_INSN_LIST (pred_deps->last_function_call,
> +    = concat_insn_list (pred_deps->last_function_call,
>                          succ_deps->last_function_call);
>
>    /* last_function_call_may_noreturn is inherited by successor.  */
>    succ_deps->last_function_call_may_noreturn
> -    = concat_INSN_LIST (pred_deps->last_function_call_may_noreturn,
> +    = concat_insn_list (pred_deps->last_function_call_may_noreturn,
>                          succ_deps->last_function_call_may_noreturn);
>
>    /* sched_before_next_call is inherited by successor.  */
>    succ_deps->sched_before_next_call
> -    = concat_INSN_LIST (pred_deps->sched_before_next_call,
> +    = concat_insn_list (pred_deps->sched_before_next_call,
>                          succ_deps->sched_before_next_call);
>  }
>
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists().
  2023-11-22 11:14 ` [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists() Maxim Kuvyrkov
@ 2024-01-15 13:01   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 13:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 3083 bytes --]

Dear scheduler maintainers,

Gentle ping.  This patch is borderline trivial and affects only the lucky
few who debug sched-deps.cc code.  OK to merge?

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> Better propagate flags from dump_lists() into dump_dep() and
> add a missing "]" in dump_lists().
>
> gcc/ChangeLog:
>
>         * sched-deps.cc (DUMP_DEP_PRO): Improve comment.
>         (dump_dep_flags): Remove.
>         (DUMP_LISTS_SIZE, DUMP_LISTS_DEPS, DUMP_LISTS_ALL): Continue
>         numbering from DUMP_DEP_* flags.
>         (dump_lists): Update and fix.
> ---
>  gcc/sched-deps.cc | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 005fc0f567e..4d357079a7a 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -132,7 +132,8 @@ static void dump_ds (FILE *, ds_t);
>  /* Define flags for dump_dep ().  */
>
>  /* Dump producer of the dependence.  */
> -#define DUMP_DEP_PRO (2)
> +#define DUMP_DEP_PRO (2) /* Reserve "1" for handling of DUMP_DEP_ALL and
> +                           DUMP_LISTS_ALL.  */
>
>  /* Dump consumer of the dependence.  */
>  #define DUMP_DEP_CON (4)
> @@ -206,9 +207,6 @@ dump_dep (FILE *dump, dep_t dep, int flags)
>    fprintf (dump, ">");
>  }
>
> -/* Default flags for dump_dep ().  */
> -static int dump_dep_flags = (DUMP_DEP_PRO | DUMP_DEP_CON);
> -
>  /* Dump all fields of DEP to STDERR.  */
>  void
>  sd_debug_dep (dep_t dep)
> @@ -1454,19 +1452,20 @@ sd_delete_dep (sd_iterator_def sd_it)
>  }
>
>  /* Dump size of the lists.  */
> -#define DUMP_LISTS_SIZE (2)
> +#define DUMP_LISTS_SIZE (32) /* (DUMP_DEP_STATUS << 1)  */
>
>  /* Dump dependencies of the lists.  */
> -#define DUMP_LISTS_DEPS (4)
> +#define DUMP_LISTS_DEPS (64)
>
>  /* Dump all information about the lists.  */
>  #define DUMP_LISTS_ALL (DUMP_LISTS_SIZE | DUMP_LISTS_DEPS)
>
>  /* Dump deps_lists of INSN specified by TYPES to DUMP.
> -   FLAGS is a bit mask specifying what information about the lists needs
> -   to be printed.
> +   FLAGS is a bit mask specifying what information about the lists and
> +   the individual deps needs to be printed, this is a combination of
> +   DUMP_DEP_* and DUMP_LISTS_* flags.
>     If FLAGS has the very first bit set, then dump all information about
> -   the lists and propagate this bit into the callee dump functions.  */
> +   the lists and deps propagate this bit into the callee dump functions.
> */
>  static void
>  dump_lists (FILE *dump, rtx insn, sd_list_types_def types, int flags)
>  {
> @@ -1488,10 +1487,12 @@ dump_lists (FILE *dump, rtx insn,
> sd_list_types_def types, int flags)
>      {
>        FOR_EACH_DEP (insn, types, sd_it, dep)
>         {
> -         dump_dep (dump, dep, dump_dep_flags | all);
> +         dump_dep (dump, dep, flags | all);
>           fprintf (dump, " ");
>         }
>      }
> +
> +  fprintf (dump, "]");
>  }
>
>  /* Dump all information about deps_lists of INSN specified by TYPES
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis
  2023-11-22 11:14 ` [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis Maxim Kuvyrkov
@ 2024-01-15 13:04   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 13:04 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

Dear scheduler maintainers,

Gentle ping.  This is a trivial patch, which makes debugging sched-deps.cc
slightly more enjoyable.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> gcc/ChangeLog:
>
>         * sched-deps.cc (sd_add_dep, find_inc): Add logging about
>         dependency creation.
> ---
>  gcc/sched-deps.cc | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 4d357079a7a..2a87158ba4b 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1342,6 +1342,13 @@ sd_add_dep (dep_t dep, bool resolved_p)
>       in the bitmap caches of dependency information.  */
>    if (true_dependency_cache != NULL)
>      set_dependency_caches (dep);
> +
> +  if (sched_verbose >= 9)
> +    {
> +      fprintf (sched_dump, "created dependency ");
> +      dump_dep (sched_dump, dep, 1);
> +      fprintf (sched_dump, "\n");
> +    }
>  }
>
>  /* Add or update backward dependence between INSN and ELEM
> @@ -4879,18 +4886,33 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>              we create.  */
>           gcc_assert (mii->inc_insn == inc_cand);
>
> +         int n_deps_created = 0;
>           if (backwards)
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> -               add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> -                                 REG_DEP_TRUE);
> +               {
> +                 add_dependence_1 (mii->mem_insn, DEP_PRO (dep),
> +                                   REG_DEP_TRUE);
> +                 ++n_deps_created;
> +               }
>             }
>           else
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_FORW, sd_it, dep)
> -               add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> -                                 REG_DEP_ANTI);
> +               {
> +                 add_dependence_1 (DEP_CON (dep), mii->mem_insn,
> +                                   REG_DEP_ANTI);
> +                 ++n_deps_created;
> +               }
>             }
> +         if (sched_verbose >= 6)
> +           fprintf (sched_dump,
> +                    "created %d deps for mem_insn %d due to "
> +                    "inc_insn %d %s deps\n",
> +                    n_deps_created, INSN_UID (mii->mem_insn),
> +                    INSN_UID (mii->inc_insn),
> +                    backwards ? "backward" : "forward");
> +
>           return true;
>         }
>      next:
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts
  2023-11-22 11:14 ` [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts Maxim Kuvyrkov
@ 2024-01-15 13:05   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 13:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

Dear scheduler maintainers,

Gentle ping.  This is a trivial cleanup.

On Wed, 22 Nov 2023 at 15:14, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> gcc/ChangeLog:
>
>         * sched-deps.cc (init_deps, init_deps_reg_last): Simplify.
>         (free_deps): Remove useless code.
> ---
>  gcc/sched-deps.cc | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 2a87158ba4b..e0d3c97d935 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3927,10 +3927,9 @@ init_deps (class deps_desc *deps, bool
> lazy_reg_last)
>    int max_reg = (reload_completed ? FIRST_PSEUDO_REGISTER : max_reg_num
> ());
>
>    deps->max_reg = max_reg;
> -  if (lazy_reg_last)
> -    deps->reg_last = NULL;
> -  else
> -    deps->reg_last = XCNEWVEC (struct deps_reg, max_reg);
> +  deps->reg_last = NULL;
> +  if (!lazy_reg_last)
> +    init_deps_reg_last (deps);
>    INIT_REG_SET (&deps->reg_last_in_use);
>
>    deps->pending_read_insns = 0;
> @@ -3961,9 +3960,7 @@ init_deps (class deps_desc *deps, bool lazy_reg_last)
>  void
>  init_deps_reg_last (class deps_desc *deps)
>  {
> -  gcc_assert (deps && deps->max_reg > 0);
> -  gcc_assert (deps->reg_last == NULL);
> -
> +  gcc_assert (deps && deps->max_reg > 0 && deps->reg_last == NULL);
>    deps->reg_last = XCNEWVEC (struct deps_reg, deps->max_reg);
>  }
>
> @@ -4013,8 +4010,6 @@ free_deps (class deps_desc *deps)
>       it at all.  */
>    free (deps->reg_last);
>    deps->reg_last = NULL;
> -
> -  deps = NULL;
>  }
>
>  /* Remove INSN from dependence contexts DEPS.  */
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis
  2023-11-22 11:14 ` [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis Maxim Kuvyrkov
@ 2024-01-15 13:06   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 13:06 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 5830 bytes --]

Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>    which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>    instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (1).
>
> gcc/ChangeLog:
>
>         * sched-deps.cc (print-rtl.h): Include for str_pattern_slim().
>         (dump_reg_pending_data): New function.
>         (sched_analyze_insn): Use it.
> ---
>  gcc/sched-deps.cc | 90 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index e0d3c97d935..f9290c82fd2 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "cselib.h"
>  #include "function-abi.h"
> +#include "print-rtl.h"
>
>  #ifdef INSN_SCHEDULING
>
> @@ -432,10 +433,24 @@ dep_spec_p (dep_t dep)
>    return false;
>  }
>
> +/* These regsets describe how a single instruction affects registers.
> +   Their "life-time" is restricted to a single call of
> sched_analyze_insn().
> +   They are populated by sched_analyze_1() and sched_analyze_2(), and
> +   then sched_analyze_insn() transfers data from these into
> deps->reg_last[i].
> +   Near the end sched_analyze_insn() clears these regsets for the next
> +   insn.  */
>  static regset reg_pending_sets;
>  static regset reg_pending_clobbers;
>  static regset reg_pending_uses;
>  static regset reg_pending_control_uses;
> +
> +/* Similar to reg_pending_* regsets, this variable specifies whether
> +   the current insn analyzed by sched_analyze_insn() is a scheduling
> +   barrier that should "split" dependencies inside a block.  Internally
> +   sched-deps.cc does this by pretending that the barrier insn uses and
> +   sets all registers.
> +   Near the end sched_analyze_insn() transfers barrier info from this
> variable
> +   into deps->last_reg_pending_barrier.  */
>  static enum reg_pending_barrier_mode reg_pending_barrier;
>
>  /* Hard registers implicitly clobbered or used (or may be implicitly
> @@ -2880,7 +2895,77 @@ get_implicit_reg_pending_clobbers (HARD_REG_SET
> *temp, rtx_insn *insn)
>    *temp &= ~ira_no_alloc_regs;
>  }
>
> -/* Analyze an INSN with pattern X to find all dependencies.  */
> +/* Dump state of reg_pending_* data for debug purposes.
> +   Dump only non-empty data to reduce log clobber.  */
> +static void
> +dump_reg_pending_data (FILE *file, rtx_insn *insn)
> +{
> +  fprintf (file, "\n");
> +  fprintf (file, ";; sched_analysis after insn %d: %s\n",
> +          INSN_UID (insn), str_pattern_slim (PATTERN (insn)));
> +
> +  if (!REG_SET_EMPTY_P (reg_pending_sets)
> +      || !REG_SET_EMPTY_P (reg_pending_clobbers)
> +      || !REG_SET_EMPTY_P (reg_pending_uses)
> +      || !REG_SET_EMPTY_P (reg_pending_control_uses))
> +    {
> +      fprintf (file, ";; insn reg");
> +      if (!REG_SET_EMPTY_P (reg_pending_sets))
> +       {
> +         fprintf (file, " sets(");
> +         dump_regset (reg_pending_sets, file);
> +         fprintf (file, ")");
> +       }
> +      if (!REG_SET_EMPTY_P (reg_pending_clobbers))
> +       {
> +         fprintf (file, " clobbers(");
> +         dump_regset (reg_pending_clobbers, file);
> +         fprintf (file, ")");
> +       }
> +      if (!REG_SET_EMPTY_P (reg_pending_uses))
> +       {
> +         fprintf (file, " uses(");
> +         dump_regset (reg_pending_uses, file);
> +         fprintf (file, ")");
> +       }
> +      if (!REG_SET_EMPTY_P (reg_pending_control_uses))
> +       {
> +         fprintf (file, " control(");
> +         dump_regset (reg_pending_control_uses, file);
> +         fprintf (file, ")");
> +       }
> +      fprintf (file, "\n");
> +    }
> +
> +  if (reg_pending_barrier)
> +    fprintf (file, ";; insn reg barrier: %d\n", reg_pending_barrier);
> +
> +  if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers)
> +      || !hard_reg_set_empty_p (implicit_reg_pending_uses))
> +    {
> +      fprintf (file, ";; insn reg");
> +      if (!hard_reg_set_empty_p (implicit_reg_pending_clobbers))
> +       {
> +         print_hard_reg_set (file, implicit_reg_pending_clobbers,
> +                             " implicit clobbers(", false);
> +         fprintf (file, ")");
> +       }
> +      if (!hard_reg_set_empty_p (implicit_reg_pending_uses))
> +       {
> +         print_hard_reg_set (file, implicit_reg_pending_uses,
> +                             " implicit uses(", false);
> +         fprintf (file, ")");
> +       }
> +      fprintf (file, "\n");
> +    }
> +}
> +
> +/* Analyze an INSN with pattern X to find all dependencies.
> +   This analysis uses two main data structures:
> +   1. reg_pending_* data contains effects of INSN on the register file,
> +      which is then incorporated into ...
> +   2. deps_desc object, which contains commulative information about all
> +      instructions processed from deps_desc object's initialization.  */
>  static void
>  sched_analyze_insn (class deps_desc *deps, rtx x, rtx_insn *insn)
>  {
> @@ -3328,6 +3413,9 @@ sched_analyze_insn (class deps_desc *deps, rtx x,
> rtx_insn *insn)
>        deps->last_reg_pending_barrier = reg_pending_barrier;
>      }
>
> +  if (sched_verbose >= 8)
> +    dump_reg_pending_data (sched_dump, insn);
> +
>    CLEAR_REG_SET (reg_pending_uses);
>    CLEAR_REG_SET (reg_pending_clobbers);
>    CLEAR_REG_SET (reg_pending_sets);
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 8/8] Improve logging of scheduler dependency analysis context
  2023-11-22 11:14 ` [PATCH v3 8/8] Improve logging of scheduler dependency analysis context Maxim Kuvyrkov
@ 2024-01-15 13:08   ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-15 13:08 UTC (permalink / raw)
  To: gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Jeff Law, Alexander Monakov,
	Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 12831 bytes --]

Dear scheduler maintainers,

Gentle ping.  This patch improves debugging output, it does not touch
scheduling logic.

On Wed, 22 Nov 2023 at 15:15, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
wrote:

> Scheduler dependency analysis uses two main data structures:
> 1. reg_pending_* data contains effects of INSN on the register file,
>    which is then incorporated into ...
> 2. deps_desc object, which contains commulative information about all
>    instructions processed from deps_desc object's initialization.
>
> This patch adds debug dumping of (2).
>
> Dependency analysis contexts (aka deps_desc objects) are huge, but
> each instruction affects only a small amount of data in these objects.
> Therefore, it is most useful to dump differential information
> compared to the dependency state after previous instruction.
>
> gcc/ChangeLog:
>
>         * sched-deps.cc (reset_deps, dump_rtx_insn_list)
>         (rtx_insn_list_same_p): New helper functions.
>         (dump_deps_desc_diff): New function to dump dependency information.
>         (sched_analysis_prev_deps): New static variable.
>         (sched_analyze_insn): Dump dependency information.
>         (init_deps_global, finish_deps_global): Handle
> sched_analysis_prev_deps.
>         * sched-int.h (struct deps_reg): Update comments.
>         * sched-rgn.cc (concat_insn_list, concat_expr_list): Update
> comments.
> ---
>  gcc/sched-deps.cc | 197 ++++++++++++++++++++++++++++++++++++++++++++++
>  gcc/sched-int.h   |   9 ++-
>  gcc/sched-rgn.cc  |   5 ++
>  3 files changed, 210 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index f9290c82fd2..edca9927e23 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -1677,6 +1677,15 @@ delete_all_dependences (rtx_insn *insn)
>      sd_delete_dep (sd_it);
>  }
>
> +/* Re-initialize existing dependency context DEPS to be a copy of FROM.
> */
> +static void
> +reset_deps (class deps_desc *deps, class deps_desc *from)
> +{
> +  free_deps (deps);
> +  init_deps (deps, false);
> +  deps_join (deps, from);
> +}
> +
>  /* All insns in a scheduling group except the first should only have
>     dependencies on the previous insn in the group.  So we find the
>     first instruction in the scheduling group by walking the dependence
> @@ -2960,6 +2969,177 @@ dump_reg_pending_data (FILE *file, rtx_insn *insn)
>      }
>  }
>
> +/* Dump rtx_insn_list LIST.
> +   Consider moving to lists.cc if there are users outside of
> sched-deps.cc.  */
> +static void
> +dump_rtx_insn_list (FILE *file, rtx_insn_list *list)
> +{
> +  for (; list; list = list->next ())
> +    fprintf (file, " %d", INSN_UID (list->insn ()));
> +}
> +
> +/* Return TRUE if lists A and B have same elements in the same order.  */
> +static bool
> +rtx_insn_list_same_p (rtx_insn_list *a, rtx_insn_list *b)
> +{
> +  for (; a && b; a = a->next (), b = b->next ())
> +    if (a->insn () != b->insn ())
> +      return false;
> +
> +  if (a || b)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Dump parts of DEPS that are different from PREV.
> +   Dumping all information from dependency context produces huge
> +   hard-to-analize logs; differential dumping is way more managable.  */
> +static void
> +dump_deps_desc_diff (FILE *file, class deps_desc *deps, class deps_desc
> *prev)
> +{
> +  /* Each "paragraph" is a single line of output.  */
> +
> +  /* Note on param_max_pending_list_length:
> +     During normal dependency analysis various lists should not exceed
> this
> +     limit.  Searching for "!!!" in scheduler logs can point to potential
> bugs
> +     or poorly-handled corner-cases.  */
> +
> +  if (!rtx_insn_list_same_p (deps->pending_read_insns,
> +                            prev->pending_read_insns))
> +    {
> +      fprintf (file, ";; deps pending mem reads length(%d):",
> +              deps->pending_read_list_length);
> +      if ((deps->pending_read_list_length +
> deps->pending_write_list_length)
> +         >= param_max_pending_list_length)
> +       fprintf (file, "%d insns!!!", deps->pending_read_list_length);
> +      else
> +       dump_rtx_insn_list (file, deps->pending_read_insns);
> +      fprintf (file, "\n");
> +    }
> +
> +  if (!rtx_insn_list_same_p (deps->pending_write_insns,
> +                            prev->pending_write_insns))
> +    {
> +      fprintf (file, ";; deps pending mem writes length(%d):",
> +              deps->pending_write_list_length);
> +      if ((deps->pending_read_list_length +
> deps->pending_write_list_length)
> +         >= param_max_pending_list_length)
> +       fprintf (file, "%d insns!!!", deps->pending_write_list_length);
> +      else
> +       dump_rtx_insn_list (file, deps->pending_write_insns);
> +      fprintf (file, "\n");
> +    }
> +
> +  if (!rtx_insn_list_same_p (deps->pending_jump_insns,
> +                            prev->pending_jump_insns))
> +    {
> +      fprintf (file, ";; deps pending jump length(%d):",
> +              deps->pending_flush_length);
> +      if (deps->pending_flush_length >= param_max_pending_list_length)
> +       fprintf (file, "%d insns!!!", deps->pending_flush_length);
> +      else
> +       dump_rtx_insn_list (file, deps->pending_jump_insns);
> +      fprintf (file, "\n");
> +    }
> +
> +  fprintf (file, ";; last");
> +  if (!rtx_insn_list_same_p (deps->last_pending_memory_flush,
> +                            prev->last_pending_memory_flush))
> +    {
> +      fprintf (file, " memory_flush(");
> +      dump_rtx_insn_list (file, deps->last_pending_memory_flush);
> +      fprintf (file, ")");
> +    }
> +  if (!rtx_insn_list_same_p (deps->last_function_call,
> +                            prev->last_function_call))
> +    {
> +      fprintf (file, " call(");
> +      dump_rtx_insn_list (file, deps->last_function_call);
> +      fprintf (file, ")");
> +    }
> +  if (!rtx_insn_list_same_p (deps->last_function_call_may_noreturn,
> +                            prev->last_function_call_may_noreturn))
> +    {
> +      fprintf (file, " noreturn(");
> +      dump_rtx_insn_list (file, deps->last_function_call_may_noreturn);
> +      fprintf (file, ")");
> +    }
> +  fprintf (file, "\n");
> +
> +  fprintf (file, ";; sched_before_next");
> +  if (!rtx_insn_list_same_p (deps->sched_before_next_call,
> +                            prev->sched_before_next_call))
> +    {
> +      fprintf (file, " call(");
> +      dump_rtx_insn_list (file, deps->sched_before_next_call);
> +      fprintf (file, ")");
> +    }
> +  if (!rtx_insn_list_same_p (deps->sched_before_next_jump,
> +                            prev->sched_before_next_jump))
> +    {
> +      fprintf (file, " jump(");
> +      dump_rtx_insn_list (file, deps->sched_before_next_jump);
> +      fprintf (file, ")");
> +    }
> +  fprintf (file, " in_post_call_group_p(%d)\n",
> deps->in_post_call_group_p);
> +
> +  fprintf (file, ";; last_debug_insn(%d) last_args_size(%d)
> last_prologue(",
> +          deps->last_debug_insn ? INSN_UID (deps->last_debug_insn) : -1,
> +          deps->last_args_size ? INSN_UID (deps->last_args_size) : -1);
> +  dump_rtx_insn_list (file, deps->last_prologue);
> +  fprintf (file, ") last_epilogue(");
> +  dump_rtx_insn_list (file, deps->last_epilogue);
> +  fprintf (file, ") last_logue_was_epilogue(%d)\n",
> +          deps->last_logue_was_epilogue);
> +
> +  int i;
> +
> +  gcc_assert (deps->max_reg == prev->max_reg);
> +  for (i = 0; i < deps->max_reg; ++i)
> +    {
> +      struct deps_reg *reg_last = &deps->reg_last[i];
> +      struct deps_reg *reg_prev = &prev->reg_last[i];
> +
> +      if (rtx_insn_list_same_p (reg_last->uses, reg_prev->uses)
> +         && rtx_insn_list_same_p (reg_last->sets, reg_prev->sets)
> +         && rtx_insn_list_same_p (reg_last->implicit_sets,
> +                                  reg_prev->implicit_sets)
> +         && rtx_insn_list_same_p (reg_last->control_uses,
> +                                  reg_prev->control_uses)
> +         && rtx_insn_list_same_p (reg_last->clobbers,
> +                                  reg_prev->clobbers))
> +       continue;
> +
> +      fprintf (file, ";; reg %u: uses(", i);
> +      if (reg_last->uses_length >= param_max_pending_list_length)
> +       fprintf (file, "%d insns!!!", reg_last->uses_length);
> +      else
> +       dump_rtx_insn_list (file, reg_last->uses);
> +      if (reg_last->clobbers_length >= param_max_pending_list_length)
> +       {
> +         fprintf (file, ") clobbers(");
> +         fprintf (file, "%d insns!!!", reg_last->clobbers_length);
> +       }
> +      else
> +       {
> +         fprintf (file, ") sets(");
> +         dump_rtx_insn_list (file, reg_last->sets);
> +         fprintf (file, ") implicit_sets(");
> +         dump_rtx_insn_list (file, reg_last->implicit_sets);
> +         fprintf (file, ") control_uses(");
> +         dump_rtx_insn_list (file, reg_last->control_uses);
> +         fprintf (file, ") clobbers(");
> +         dump_rtx_insn_list (file, reg_last->clobbers);
> +       }
> +      fprintf (file, ")\n");
> +    }
> +}
> +
> +/* Dependency analysis state before current insn.
> +   Used by dump_deps_desc_diff().  */
> +static class deps_desc *sched_analysis_prev_deps;
> +
>  /* Analyze an INSN with pattern X to find all dependencies.
>     This analysis uses two main data structures:
>     1. reg_pending_* data contains effects of INSN on the register file,
> @@ -3627,6 +3807,16 @@ sched_analyze_insn (class deps_desc *deps, rtx x,
> rtx_insn *insn)
>           deps->last_logue_was_epilogue = true;
>         }
>      }
> +
> +    if (sched_verbose >= 8)
> +      {
> +       dump_deps_desc_diff (sched_dump, deps, sched_analysis_prev_deps);
> +       reset_deps (sched_analysis_prev_deps, deps);
> +       fprintf (sched_dump, ";; ");
> +       dump_lists (sched_dump, insn, SD_LIST_BACK,
> +                   DUMP_LISTS_ALL | DUMP_DEP_PRO);
> +       fprintf (sched_dump, "\n");
> +      }
>  }
>
>  /* Return TRUE if INSN might not always return normally (e.g. call exit,
> @@ -4305,6 +4495,9 @@ init_deps_global (void)
>        sched_deps_info->note_mem_dep = haifa_note_mem_dep;
>        sched_deps_info->note_dep = haifa_note_dep;
>     }
> +
> +  sched_analysis_prev_deps = XNEW (class deps_desc);
> +  init_deps (sched_analysis_prev_deps, false);
>  }
>
>  /* Free everything used by the dependency analysis code.  */
> @@ -4316,6 +4509,10 @@ finish_deps_global (void)
>    FREE_REG_SET (reg_pending_clobbers);
>    FREE_REG_SET (reg_pending_uses);
>    FREE_REG_SET (reg_pending_control_uses);
> +
> +  free_deps (sched_analysis_prev_deps);
> +  free (sched_analysis_prev_deps);
> +  sched_analysis_prev_deps = NULL;
>  }
>
>  /* Estimate the weakness of dependence between MEM1 and MEM2.  */
> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
> index 64a2f0bcff9..8a3109ce86e 100644
> --- a/gcc/sched-int.h
> +++ b/gcc/sched-int.h
> @@ -455,7 +455,9 @@ struct deps_reg
>    int clobbers_length;
>  };
>
> -/* Describe state of dependencies used during sched_analyze phase.  */
> +/* Describe state of dependencies used during sched_analyze phase.
> +   Please consider updating sched-deps.cc:dump_deps_desc_diff() when
> adding
> +   new fields.  */
>  class deps_desc
>  {
>  public:
> @@ -499,6 +501,11 @@ public:
>       large lists.  */
>    int pending_flush_length;
>
> +  /* Below are lists (and not just a single instructions) to allow
> inter-block
> +     dependency analysis.  Each block
> +     may add a single insn to below lists, but when merging dependency
> +     analysis context from multiple predecessor blocks, we may get a
> list.  */
> +
>    /* The last insn upon which all memory references must depend.
>       This is an insn which flushed the pending lists, creating a
> dependency
>       between it and all previously pending memory references.  This
> creates
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index da3ec0458ff..72f0a2de66e 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -2590,6 +2590,10 @@ static rtx_insn_list *
>  concat_insn_list (rtx_insn_list *copy, rtx_insn_list *old)
>  {
>    if (!old)
> +    /* concat_*_LIST() will return a reversed copy of COPY.  Working with
> +       reversed copies would complicate dependency dumping in
> +       dump_deps_desc_diff(), which internally uses reset_deps() and
> +       deps_join().  Therefore, use copy_INSN_LIST() when OLD is NULL.  */
>      return copy_INSN_LIST (copy);
>    return concat_INSN_LIST (copy, old);
>  }
> @@ -2599,6 +2603,7 @@ static rtx_expr_list *
>  concat_expr_list (rtx_expr_list *copy, rtx_expr_list *old)
>  {
>    if (!old)
> +    /* See comment in concat_insn_list() above.  */
>      return copy_EXPR_LIST (copy);
>    return concat_EXPR_LIST (copy, old);
>  }
> --
> 2.34.1
>
>

-- 
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-15 12:56   ` Maxim Kuvyrkov
@ 2024-01-15 18:26     ` Vladimir Makarov
  2024-01-16 14:52     ` Jeff Law
  1 sibling, 0 replies; 37+ messages in thread
From: Vladimir Makarov @ 2024-01-15 18:26 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches
  Cc: Bernd Schmidt, Jeff Law, Alexander Monakov, Richard Guenther


On 1/15/24 07:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
>
> Richard and Alexander have reviewed this patch and [I assume] have no 
> further comments.  OK to merge?
>
>
I trust Richard and Alexander therefore I did not do additional review 
of the patches and have no any comment.  Richard's or Alexander's 
approval is enough for comitting the patches.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-15 12:56   ` Maxim Kuvyrkov
  2024-01-15 18:26     ` Vladimir Makarov
@ 2024-01-16 14:52     ` Jeff Law
  2024-01-17  6:51       ` Richard Biener
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Law @ 2024-01-16 14:52 UTC (permalink / raw)
  To: Maxim Kuvyrkov, gcc-patches
  Cc: Bernd Schmidt, Vladimir Makarov, Alexander Monakov, Richard Guenther



On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> Hi Vladimir,
> Hi Jeff,
> 
> Richard and Alexander have reviewed this patch and [I assume] have no 
> further comments.  OK to merge?
I think the question is whether or not we're too late.  I know that 
Richard S has held off on his late-combine pass and I'm holding off on 
the ext-dce work due to the fact that we're well past stage1 close.

I think the release managers ought to have the final say on this.

jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-16 14:52     ` Jeff Law
@ 2024-01-17  6:51       ` Richard Biener
  2024-01-17  7:39         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 37+ messages in thread
From: Richard Biener @ 2024-01-17  6:51 UTC (permalink / raw)
  To: Jeff Law
  Cc: Maxim Kuvyrkov, gcc-patches, Bernd Schmidt, Vladimir Makarov,
	Alexander Monakov

On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > Hi Vladimir,
> > Hi Jeff,
> >
> > Richard and Alexander have reviewed this patch and [I assume] have no
> > further comments.  OK to merge?
> I think the question is whether or not we're too late.  I know that
> Richard S has held off on his late-combine pass and I'm holding off on
> the ext-dce work due to the fact that we're well past stage1 close.
>
> I think the release managers ought to have the final say on this.

I'm fine with this now, it doesn't change code generation.

Richard.

>
> jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-17  6:51       ` Richard Biener
@ 2024-01-17  7:39         ` Maxim Kuvyrkov
  2024-01-17 15:02           ` Richard Biener
  0 siblings, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-17  7:39 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Jeff Law, GCC Patches, Bernd Schmidt, Vladimir Makarov,
	Alexander Monakov

> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>> 
>> 
>> 
>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>> Hi Vladimir,
>>> Hi Jeff,
>>> 
>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>> further comments.  OK to merge?
>> I think the question is whether or not we're too late.  I know that
>> Richard S has held off on his late-combine pass and I'm holding off on
>> the ext-dce work due to the fact that we're well past stage1 close.
>> 
>> I think the release managers ought to have the final say on this.
> 
> I'm fine with this now, it doesn't change code generation.

Thanks, Richard.

I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-17  7:39         ` Maxim Kuvyrkov
@ 2024-01-17 15:02           ` Richard Biener
  2024-01-17 15:05             ` Maxim Kuvyrkov
  2024-01-17 18:54             ` H.J. Lu
  0 siblings, 2 replies; 37+ messages in thread
From: Richard Biener @ 2024-01-17 15:02 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Jeff Law, GCC Patches, Bernd Schmidt, Vladimir Makarov,
	Alexander Monakov

On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>
> > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> >>> Hi Vladimir,
> >>> Hi Jeff,
> >>>
> >>> Richard and Alexander have reviewed this patch and [I assume] have no
> >>> further comments.  OK to merge?
> >> I think the question is whether or not we're too late.  I know that
> >> Richard S has held off on his late-combine pass and I'm holding off on
> >> the ext-dce work due to the fact that we're well past stage1 close.
> >>
> >> I think the release managers ought to have the final say on this.
> >
> > I'm fine with this now, it doesn't change code generation.
>
> Thanks, Richard.
>
> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.

This seems to have caused a compare-debug bootstrap issue on x86_64-linux,

gcc/fortran/f95-lang.o differs

does n_mem_deps or n_inc_deps include debug insns?

Richard.

> Regards,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-17 15:02           ` Richard Biener
@ 2024-01-17 15:05             ` Maxim Kuvyrkov
  2024-01-17 15:44               ` Maxim Kuvyrkov
  2024-01-17 18:54             ` H.J. Lu
  1 sibling, 1 reply; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-17 15:05 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Jeff Law, GCC Patches, Bernd Schmidt, Vladimir Makarov,
	Alexander Monakov

> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>> 
>>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>> Hi Vladimir,
>>>>> Hi Jeff,
>>>>> 
>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>> further comments.  OK to merge?
>>>> I think the question is whether or not we're too late.  I know that
>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>> 
>>>> I think the release managers ought to have the final say on this.
>>> 
>>> I'm fine with this now, it doesn't change code generation.
>> 
>> Thanks, Richard.
>> 
>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
> 
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
> 
> gcc/fortran/f95-lang.o differs
> 
> does n_mem_deps or n_inc_deps include debug insns?

Thanks, investigating.

--
Maxim Kuvyrkov
https://www.linaro.org


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-17 15:05             ` Maxim Kuvyrkov
@ 2024-01-17 15:44               ` Maxim Kuvyrkov
  0 siblings, 0 replies; 37+ messages in thread
From: Maxim Kuvyrkov @ 2024-01-17 15:44 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Jeff Law, GCC Patches, Bernd Schmidt, Vladimir Makarov,
	Alexander Monakov

> On Jan 17, 2024, at 19:05, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
>> On Jan 17, 2024, at 19:02, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
>> <maxim.kuvyrkov@linaro.org> wrote:
>>> 
>>>> On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
>>>> 
>>>> On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
>>>>>> Hi Vladimir,
>>>>>> Hi Jeff,
>>>>>> 
>>>>>> Richard and Alexander have reviewed this patch and [I assume] have no
>>>>>> further comments.  OK to merge?
>>>>> I think the question is whether or not we're too late.  I know that
>>>>> Richard S has held off on his late-combine pass and I'm holding off on
>>>>> the ext-dce work due to the fact that we're well past stage1 close.
>>>>> 
>>>>> I think the release managers ought to have the final say on this.
>>>> 
>>>> I'm fine with this now, it doesn't change code generation.
>>> 
>>> Thanks, Richard.
>>> 
>>> I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
>> 
>> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>> 
>> gcc/fortran/f95-lang.o differs
>> 
>> does n_mem_deps or n_inc_deps include debug insns?
> 
> Thanks, investigating.

Hi Richard,

Yes, both n_mem_deps or n_inc_deps include debug insns, I posted a patch for this in https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643267.html .  Testing it now.

If you prefer, I can revert the fix for PR96388 and PR111554.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior
  2024-01-17 15:02           ` Richard Biener
  2024-01-17 15:05             ` Maxim Kuvyrkov
@ 2024-01-17 18:54             ` H.J. Lu
  1 sibling, 0 replies; 37+ messages in thread
From: H.J. Lu @ 2024-01-17 18:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: Maxim Kuvyrkov, Jeff Law, GCC Patches, Bernd Schmidt,
	Vladimir Makarov, Alexander Monakov

On Wed, Jan 17, 2024 at 7:02 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Jan 17, 2024 at 8:39 AM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
> >
> > > On Jan 17, 2024, at 10:51, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 3:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> > >>
> > >>
> > >>
> > >> On 1/15/24 05:56, Maxim Kuvyrkov wrote:
> > >>> Hi Vladimir,
> > >>> Hi Jeff,
> > >>>
> > >>> Richard and Alexander have reviewed this patch and [I assume] have no
> > >>> further comments.  OK to merge?
> > >> I think the question is whether or not we're too late.  I know that
> > >> Richard S has held off on his late-combine pass and I'm holding off on
> > >> the ext-dce work due to the fact that we're well past stage1 close.
> > >>
> > >> I think the release managers ought to have the final say on this.
> > >
> > > I'm fine with this now, it doesn't change code generation.
> >
> > Thanks, Richard.
> >
> > I'll merge the fix for PR96388 and PR111554 -- patch 1/8.  I'll commit cleanups and improvements to scheduler logging -- patches 2/8 - 8/8 -- when stage1 opens.
>
> This seems to have caused a compare-debug bootstrap issue on x86_64-linux,
>
> gcc/fortran/f95-lang.o differs
>
> does n_mem_deps or n_inc_deps include debug insns?
>
> Richard.

FWIW, I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113456

-- 
H.J.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-01-17 18:54 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-20 12:06 [PATCH 0/1] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
2023-11-20 12:06 ` [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
2023-11-20 13:09   ` Richard Biener
2023-11-20 13:42     ` Maxim Kuvyrkov
2023-11-20 13:45       ` Richard Biener
2023-11-20 14:48         ` [PATCH v2] " Maxim Kuvyrkov
2023-11-20 18:59         ` [PATCH 1/1] " Maxim Kuvyrkov
2023-11-20 13:52   ` Alexander Monakov
2023-11-20 14:39     ` Maxim Kuvyrkov
2023-11-20 16:30       ` Alexander Monakov
2023-11-21 10:32         ` Maxim Kuvyrkov
2023-11-21 11:05           ` Alexander Monakov
2023-11-22 11:14 ` [PATCH v3 0/8] Avoid exponential behavior in scheduler and better logging Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 1/8] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior Maxim Kuvyrkov
2024-01-15 12:56   ` Maxim Kuvyrkov
2024-01-15 18:26     ` Vladimir Makarov
2024-01-16 14:52     ` Jeff Law
2024-01-17  6:51       ` Richard Biener
2024-01-17  7:39         ` Maxim Kuvyrkov
2024-01-17 15:02           ` Richard Biener
2024-01-17 15:05             ` Maxim Kuvyrkov
2024-01-17 15:44               ` Maxim Kuvyrkov
2024-01-17 18:54             ` H.J. Lu
2023-11-22 11:14 ` [PATCH v3 2/8] Unify implementations of print_hard_reg_set() Maxim Kuvyrkov
2023-11-22 15:04   ` Vladimir Makarov
2023-11-22 11:14 ` [PATCH v3 3/8] Simplify handling of INSN_ and EXPR_LISTs in sched-rgn.cc Maxim Kuvyrkov
2024-01-15 12:59   ` Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 4/8] Improve and fix sched-deps.cc: dump_dep() and dump_lists() Maxim Kuvyrkov
2024-01-15 13:01   ` Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 5/8] Add a bit more logging scheduler's dependency analysis Maxim Kuvyrkov
2024-01-15 13:04   ` Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 6/8] sched_deps.cc: Simplify initialization of dependency contexts Maxim Kuvyrkov
2024-01-15 13:05   ` Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 7/8] Improve logging of register data in scheduler dependency analysis Maxim Kuvyrkov
2024-01-15 13:06   ` Maxim Kuvyrkov
2023-11-22 11:14 ` [PATCH v3 8/8] Improve logging of scheduler dependency analysis context Maxim Kuvyrkov
2024-01-15 13:08   ` Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).