[PATCH V2] Disable sched1 in functions that call setjmp

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V2] Disable sched1 in functions that call setjmp
@ 2022-12-22 17:32 Jose E. Marchesi
  2022-12-22 17:56 ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Jose E. Marchesi @ 2022-12-22 17:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: qing.zhao

When the following testcase is built with -fschedule-insns in either
x86_64 or aarch64:

  #include <stdio.h>
  #include <setjmp.h>
  #include <stdbool.h>

  jmp_buf ex_buf__;

   #define TRY do{ if( !setjmp(ex_buf__) ){
   #define CATCH } else {
   #define ETRY } }while(0)
   #define THROW longjmp(ex_buf__, 1)

  int f(int x)
  {
    int arr[] = {1,2,6,8,9,10};
    int lo=0;
    int hi=5;

    while(lo<=hi) {
          int mid=(lo+hi)/2;

          if(arr[mid]==x) {
            THROW;
          } else if(arr[mid]<x) {
            lo=mid+1;
          } else if(arr[mid]>x) {
            hi=mid-1;
          }
    }

    return -1;
  }

  int
  main(int argc, char** argv)
  {
    int a=2;
    bool b=false;

    TRY
    {
     a=f(a);
     b=true;
    }
    CATCH
    {
     printf("a : %d\n",a);
     printf("Got Exception!\n");
    }
    ETRY;

    if(b) {
          printf("b is true!\n");
    }
    return 0;
  }

The first instruction scheduler pass reorders instructions in the TRY
block in a way `b=true' gets executed before the call to the function
`f'.  This optimization is wrong, because `main' calls setjmp and `f'
is known to call longjmp.

As discussed in BZ 57067, the root cause for this is the fact that
setjmp is not properly modeled in RTL, and therefore the backend
passes have no normalized way to handle this situation.

As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
functions that call setjmp.  This includes for example gcse,
store_motion and cprop.  This patch adds the sched1 pass to that list.

Note that the other instruction scheduling passes are still allowed to
run on these functions, since they reorder instructions within basic
blocks, and therefore they cannot cross function calls.

This doesn't fix the fundamental issue, but at least assures that
sched1 wont perform invalid transformation in correct C programs.

regtested in aarch64-linux-gnu.

gcc/ChangeLog:

	PR rtl-optimization/57067
	* sched-rgn.cc (pass_sched::gate): Disable pass if current
	function calls setjmp.
---
 gcc/sched-rgn.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index 420c45dffb4..c536d0b8dea 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3847,7 +3847,8 @@ bool
 pass_sched::gate (function *)
 {
 #ifdef INSN_SCHEDULING
-  return optimize > 0 && flag_schedule_insns && dbg_cnt (sched_func);
+  return optimize > 0 && flag_schedule_insns
+    && !cfun->calls_setjmp && dbg_cnt (sched_func);
 #else
   return 0;
 #endif
-- 
2.30.2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-22 17:32 [PATCH V2] Disable sched1 in functions that call setjmp Jose E. Marchesi
@ 2022-12-22 17:56 ` Alexander Monakov
  2022-12-22 19:28   ` Qing Zhao
  2022-12-23 10:05   ` Richard Sandiford
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Monakov @ 2022-12-22 17:56 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: gcc-patches, qing.zhao


On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:

> The first instruction scheduler pass reorders instructions in the TRY
> block in a way `b=true' gets executed before the call to the function
> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
> is known to call longjmp.
> 
> As discussed in BZ 57067, the root cause for this is the fact that
> setjmp is not properly modeled in RTL, and therefore the backend
> passes have no normalized way to handle this situation.
> 
> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
> functions that call setjmp.  This includes for example gcse,
> store_motion and cprop.  This patch adds the sched1 pass to that list.
> 
> Note that the other instruction scheduling passes are still allowed to
> run on these functions, since they reorder instructions within basic
> blocks, and therefore they cannot cross function calls.
> 
> This doesn't fix the fundamental issue, but at least assures that
> sched1 wont perform invalid transformation in correct C programs.

I think scheduling across calls in the pre-RA scheduler is simply an oversight,
we do not look at dataflow information and with 50% chance risk extending
lifetime of a pseudoregister across a call, causing higher register pressure at
the point of the call, and potentially an extra spill.

Therefore I would suggest to indeed solve the root cause, with (untested):

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 948aa0c3b..343fe2bfa 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)

       CANT_MOVE (insn) = 1;

-      if (find_reg_note (insn, REG_SETJMP, NULL))
+      if (!reload_completed)
+       {
+         /* Do not schedule across calls, this is prone to extending lifetime
+            of a pseudo and causing extra spill later on.  */
+         reg_pending_barrier = MOVE_BARRIER;
+       }
+      else if (find_reg_note (insn, REG_SETJMP, NULL))
         {
           /* This is setjmp.  Assume that all registers, not just
              hard registers, may be clobbered by this call.  */

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-22 17:56 ` Alexander Monakov
@ 2022-12-22 19:28   ` Qing Zhao
  2022-12-23  7:33     ` Alexander Monakov
  2022-12-23 10:05   ` Richard Sandiford
  1 sibling, 1 reply; 24+ messages in thread
From: Qing Zhao @ 2022-12-22 19:28 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jose Marchesi, gcc-patches



> On Dec 22, 2022, at 12:56 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
> 
>> The first instruction scheduler pass reorders instructions in the TRY
>> block in a way `b=true' gets executed before the call to the function
>> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
>> is known to call longjmp.
>> 
>> As discussed in BZ 57067, the root cause for this is the fact that
>> setjmp is not properly modeled in RTL, and therefore the backend
>> passes have no normalized way to handle this situation.
>> 
>> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
>> functions that call setjmp.  This includes for example gcse,
>> store_motion and cprop.  This patch adds the sched1 pass to that list.
>> 
>> Note that the other instruction scheduling passes are still allowed to
>> run on these functions, since they reorder instructions within basic
>> blocks, and therefore they cannot cross function calls.
>> 
>> This doesn't fix the fundamental issue, but at least assures that
>> sched1 wont perform invalid transformation in correct C programs.
> 
> I think scheduling across calls in the pre-RA scheduler is simply an oversight,
> we do not look at dataflow information and with 50% chance risk extending
> lifetime of a pseudoregister across a call, causing higher register pressure at
> the point of the call, and potentially an extra spill.

I am a little confused, you mean pre-RA scheduler does not look at the data flow
 information at all when scheduling insns across calls currently?

Qing


> 
> Therefore I would suggest to indeed solve the root cause, with (untested):
> 
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 948aa0c3b..343fe2bfa 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
> 
>       CANT_MOVE (insn) = 1;
> 
> -      if (find_reg_note (insn, REG_SETJMP, NULL))
> +      if (!reload_completed)
> +       {
> +         /* Do not schedule across calls, this is prone to extending lifetime
> +            of a pseudo and causing extra spill later on.  */
> +         reg_pending_barrier = MOVE_BARRIER;
> +       }
> +      else if (find_reg_note (insn, REG_SETJMP, NULL))
>         {
>           /* This is setjmp.  Assume that all registers, not just
>              hard registers, may be clobbered by this call.  */
> 
> Alexander


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-22 19:28   ` Qing Zhao
@ 2022-12-23  7:33     ` Alexander Monakov
  2022-12-23 14:34       ` Qing Zhao
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2022-12-23  7:33 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jose Marchesi, gcc-patches


On Thu, 22 Dec 2022, Qing Zhao wrote:

> > I think scheduling across calls in the pre-RA scheduler is simply an oversight,
> > we do not look at dataflow information and with 50% chance risk extending
> > lifetime of a pseudoregister across a call, causing higher register pressure at
> > the point of the call, and potentially an extra spill.
> 
> I am a little confused, you mean pre-RA scheduler does not look at the data flow
>  information at all when scheduling insns across calls currently?

I think it does not inspect liveness info, and may extend lifetime of a pseudo
across a call, transforming
 
  call foo
  reg = 1
  ...
  use reg

to

  reg = 1
  call foo
  ...
  use reg

but this is undesirable, because now register allocation cannot select a
call-clobbered register for 'reg'.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-22 17:56 ` Alexander Monakov
  2022-12-22 19:28   ` Qing Zhao
@ 2022-12-23 10:05   ` Richard Sandiford
  2022-12-23 10:42     ` Jose E. Marchesi
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Sandiford @ 2022-12-23 10:05 UTC (permalink / raw)
  To: Alexander Monakov via Gcc-patches
  Cc: Jose E. Marchesi, Alexander Monakov, qing.zhao

Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
>
>> The first instruction scheduler pass reorders instructions in the TRY
>> block in a way `b=true' gets executed before the call to the function
>> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
>> is known to call longjmp.
>> 
>> As discussed in BZ 57067, the root cause for this is the fact that
>> setjmp is not properly modeled in RTL, and therefore the backend
>> passes have no normalized way to handle this situation.
>> 
>> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
>> functions that call setjmp.  This includes for example gcse,
>> store_motion and cprop.  This patch adds the sched1 pass to that list.
>> 
>> Note that the other instruction scheduling passes are still allowed to
>> run on these functions, since they reorder instructions within basic
>> blocks, and therefore they cannot cross function calls.
>> 
>> This doesn't fix the fundamental issue, but at least assures that
>> sched1 wont perform invalid transformation in correct C programs.
>
> I think scheduling across calls in the pre-RA scheduler is simply an oversight,
> we do not look at dataflow information and with 50% chance risk extending
> lifetime of a pseudoregister across a call, causing higher register pressure at
> the point of the call, and potentially an extra spill.
>
> Therefore I would suggest to indeed solve the root cause, with (untested):
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 948aa0c3b..343fe2bfa 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
>
>        CANT_MOVE (insn) = 1;
>
> -      if (find_reg_note (insn, REG_SETJMP, NULL))
> +      if (!reload_completed)
> +       {
> +         /* Do not schedule across calls, this is prone to extending lifetime
> +            of a pseudo and causing extra spill later on.  */
> +         reg_pending_barrier = MOVE_BARRIER;
> +       }
> +      else if (find_reg_note (insn, REG_SETJMP, NULL))
>          {
>            /* This is setjmp.  Assume that all registers, not just
>               hard registers, may be clobbered by this call.  */

+1 for trying this FWIW.  There's still plenty of time to try an
alternative solution if there are unexpected performance problems.

Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 10:05   ` Richard Sandiford
@ 2022-12-23 10:42     ` Jose E. Marchesi
  2023-01-13 18:20       ` [PATCH] sched-deps: do not schedule pseudos across calls [PR108117] Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Jose E. Marchesi @ 2022-12-23 10:42 UTC (permalink / raw)
  To: Alexander Monakov via Gcc-patches
  Cc: Alexander Monakov, qing.zhao, richard.sandiford


> Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote:
>>
>>> The first instruction scheduler pass reorders instructions in the TRY
>>> block in a way `b=true' gets executed before the call to the function
>>> `f'.  This optimization is wrong, because `main' calls setjmp and `f'
>>> is known to call longjmp.
>>> 
>>> As discussed in BZ 57067, the root cause for this is the fact that
>>> setjmp is not properly modeled in RTL, and therefore the backend
>>> passes have no normalized way to handle this situation.
>>> 
>>> As Alexander Monakov noted in the BZ, many RTL passes refuse to touch
>>> functions that call setjmp.  This includes for example gcse,
>>> store_motion and cprop.  This patch adds the sched1 pass to that list.
>>> 
>>> Note that the other instruction scheduling passes are still allowed to
>>> run on these functions, since they reorder instructions within basic
>>> blocks, and therefore they cannot cross function calls.
>>> 
>>> This doesn't fix the fundamental issue, but at least assures that
>>> sched1 wont perform invalid transformation in correct C programs.
>>
>> I think scheduling across calls in the pre-RA scheduler is simply an oversight,
>> we do not look at dataflow information and with 50% chance risk extending
>> lifetime of a pseudoregister across a call, causing higher register pressure at
>> the point of the call, and potentially an extra spill.
>>
>> Therefore I would suggest to indeed solve the root cause, with (untested):
>>
>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>> index 948aa0c3b..343fe2bfa 100644
>> --- a/gcc/sched-deps.cc
>> +++ b/gcc/sched-deps.cc
>> @@ -3688,7 +3688,13 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
>>
>>        CANT_MOVE (insn) = 1;
>>
>> -      if (find_reg_note (insn, REG_SETJMP, NULL))
>> +      if (!reload_completed)
>> +       {
>> +         /* Do not schedule across calls, this is prone to extending lifetime
>> +            of a pseudo and causing extra spill later on.  */
>> +         reg_pending_barrier = MOVE_BARRIER;
>> +       }
>> +      else if (find_reg_note (insn, REG_SETJMP, NULL))
>>          {
>>            /* This is setjmp.  Assume that all registers, not just
>>               hard registers, may be clobbered by this call.  */
>
> +1 for trying this FWIW.  There's still plenty of time to try an
> alternative solution if there are unexpected performance problems.

Let me see if Alexander's patch fixes the issue at hand (it must) and
will also do some regression testing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23  7:33     ` Alexander Monakov
@ 2022-12-23 14:34       ` Qing Zhao
  2022-12-23 15:03         ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Qing Zhao @ 2022-12-23 14:34 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jose Marchesi, gcc-patches



> On Dec 23, 2022, at 2:33 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Thu, 22 Dec 2022, Qing Zhao wrote:
> 
>>> I think scheduling across calls in the pre-RA scheduler is simply an oversight,
>>> we do not look at dataflow information and with 50% chance risk extending
>>> lifetime of a pseudoregister across a call, causing higher register pressure at
>>> the point of the call, and potentially an extra spill.
>> 
>> I am a little confused, you mean pre-RA scheduler does not look at the data flow
>> information at all when scheduling insns across calls currently?
> 
> I think it does not inspect liveness info, and may extend lifetime of a pseudo
> across a call, transforming
> 
>  call foo
>  reg = 1
>  ...
>  use reg
> 
> to
> 
>  reg = 1
>  call foo
>  ...
>  use reg
> 
> but this is undesirable, because now register allocation cannot select a
> call-clobbered register for 'reg’.
Okay, thanks for the explanation.

Then, why not just check the liveness info instead of inhibiting all scheduling across calls?

Qing
> 
> Alexander


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 14:34       ` Qing Zhao
@ 2022-12-23 15:03         ` Alexander Monakov
  2022-12-23 17:27           ` Jose E. Marchesi
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2022-12-23 15:03 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jose Marchesi, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

On Fri, 23 Dec 2022, Qing Zhao wrote:
> >> I am a little confused, you mean pre-RA scheduler does not look at the data flow
> >> information at all when scheduling insns across calls currently?
> > 
> > I think it does not inspect liveness info, and may extend lifetime of a pseudo
> > across a call, transforming
> > 
> >  call foo
> >  reg = 1
> >  ...
> >  use reg
> > 
> > to
> > 
> >  reg = 1
> >  call foo
> >  ...
> >  use reg
> > 
> > but this is undesirable, because now register allocation cannot select a
> > call-clobbered register for 'reg’.
> Okay, thanks for the explanation.
> 
> Then, why not just check the liveness info instead of inhibiting all scheduling across calls?

Because there's almost nothing to gain from pre-RA scheduling across calls in
the first place. Remember that the call transfers control flow elsewhere and
therefore the scheduler has no idea about the pipeline state after the call
and after the return, so modeling-wise it's a gamble.

For instructions that lie on a critical path such scheduling can be useful when
it substantially reduces the difference between the priority of the call and
nearby instructions of the critical path. But we don't track which instructions
are on critical path(s) and which are not.

(scheduling across calls in sched2 is somewhat dubious as well, but
it doesn't risk register pressure issues, and on VLIW CPUs it at least
can result in better VLIW packing)

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 15:03         ` Alexander Monakov
@ 2022-12-23 17:27           ` Jose E. Marchesi
  2022-12-23 17:31             ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Jose E. Marchesi @ 2022-12-23 17:27 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Qing Zhao, gcc-patches


> On Fri, 23 Dec 2022, Qing Zhao wrote:
>> >> I am a little confused, you mean pre-RA scheduler does not look at the data flow
>> >> information at all when scheduling insns across calls currently?
>> > 
>> > I think it does not inspect liveness info, and may extend lifetime of a pseudo
>> > across a call, transforming
>> > 
>> >  call foo
>> >  reg = 1
>> >  ...
>> >  use reg
>> > 
>> > to
>> > 
>> >  reg = 1
>> >  call foo
>> >  ...
>> >  use reg
>> > 
>> > but this is undesirable, because now register allocation cannot select a
>> > call-clobbered register for 'reg’.
>> Okay, thanks for the explanation.
>> 
>> Then, why not just check the liveness info instead of inhibiting all scheduling across calls?
>
> Because there's almost nothing to gain from pre-RA scheduling across calls in
> the first place. Remember that the call transfers control flow elsewhere and
> therefore the scheduler has no idea about the pipeline state after the call
> and after the return, so modeling-wise it's a gamble.
>
> For instructions that lie on a critical path such scheduling can be useful when
> it substantially reduces the difference between the priority of the call and
> nearby instructions of the critical path. But we don't track which instructions
> are on critical path(s) and which are not.
>
> (scheduling across calls in sched2 is somewhat dubious as well, but
> it doesn't risk register pressure issues, and on VLIW CPUs it at least
> can result in better VLIW packing)

Does sched2 actually schedule across calls?  All the comments in the
source code stress the fact that the second scheduler pass (after
register allocation) works in regions that correspond to basic blocks:
"(after reload, each region is of one block)".

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 17:27           ` Jose E. Marchesi
@ 2022-12-23 17:31             ` Alexander Monakov
  2022-12-23 17:45               ` Jose E. Marchesi
  2022-12-23 19:21               ` Qing Zhao
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Monakov @ 2022-12-23 17:31 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Qing Zhao, gcc-patches


On Fri, 23 Dec 2022, Jose E. Marchesi wrote:

> > (scheduling across calls in sched2 is somewhat dubious as well, but
> > it doesn't risk register pressure issues, and on VLIW CPUs it at least
> > can result in better VLIW packing)
> 
> Does sched2 actually schedule across calls?  All the comments in the
> source code stress the fact that the second scheduler pass (after
> register allocation) works in regions that correspond to basic blocks:
> "(after reload, each region is of one block)".

A call instruction does not end a basic block.

(also, with -fsched2-use-superblocks sched2 works on regions like sched1)

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 17:31             ` Alexander Monakov
@ 2022-12-23 17:45               ` Jose E. Marchesi
  2022-12-23 19:21               ` Qing Zhao
  1 sibling, 0 replies; 24+ messages in thread
From: Jose E. Marchesi @ 2022-12-23 17:45 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Qing Zhao, gcc-patches


> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
>
>> > (scheduling across calls in sched2 is somewhat dubious as well, but
>> > it doesn't risk register pressure issues, and on VLIW CPUs it at least
>> > can result in better VLIW packing)
>> 
>> Does sched2 actually schedule across calls?  All the comments in the
>> source code stress the fact that the second scheduler pass (after
>> register allocation) works in regions that correspond to basic blocks:
>> "(after reload, each region is of one block)".
>
> A call instruction does not end a basic block.

Ok, so my original assumption in the patch explaining why I disabled
sched1 but not sched2 was not correct.  Good to know.

> (also, with -fsched2-use-superblocks sched2 works on regions like sched1)
>
> Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 17:31             ` Alexander Monakov
  2022-12-23 17:45               ` Jose E. Marchesi
@ 2022-12-23 19:21               ` Qing Zhao
  2022-12-23 19:36                 ` Alexander Monakov
  1 sibling, 1 reply; 24+ messages in thread
From: Qing Zhao @ 2022-12-23 19:21 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jose Marchesi, gcc-patches

Then, sched2 still can move insn across calls? 
So does sched2 have the same issue of incorrectly moving  the insn across a call which has unknown control flow?

Qing

> On Dec 23, 2022, at 12:31 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
> 
>>> (scheduling across calls in sched2 is somewhat dubious as well, but
>>> it doesn't risk register pressure issues, and on VLIW CPUs it at least
>>> can result in better VLIW packing)
>> 
>> Does sched2 actually schedule across calls?  All the comments in the
>> source code stress the fact that the second scheduler pass (after
>> register allocation) works in regions that correspond to basic blocks:
>> "(after reload, each region is of one block)".
> 
> A call instruction does not end a basic block.
> 
> (also, with -fsched2-use-superblocks sched2 works on regions like sched1)
> 
> Alexander


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 19:21               ` Qing Zhao
@ 2022-12-23 19:36                 ` Alexander Monakov
  2022-12-23 19:57                   ` Qing Zhao
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Monakov @ 2022-12-23 19:36 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jose Marchesi, gcc-patches

On Fri, 23 Dec 2022, Qing Zhao wrote:

> Then, sched2 still can move insn across calls? 
> So does sched2 have the same issue of incorrectly moving  the insn across a call which has unknown control flow?

I think problems are unlikely because register allocator assigns pseudos that
cross setjmp to memory.

I think you hit the problem with sched1 because most testing is done on x86 and
sched1 is not enabled there, otherwise the problem would have been noticed much
earlier.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 19:36                 ` Alexander Monakov
@ 2022-12-23 19:57                   ` Qing Zhao
  2022-12-24  8:10                     ` Alexander Monakov
  0 siblings, 1 reply; 24+ messages in thread
From: Qing Zhao @ 2022-12-23 19:57 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jose Marchesi, gcc-patches

> On Dec 23, 2022, at 2:36 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> 
> On Fri, 23 Dec 2022, Qing Zhao wrote:
> 
>> Then, sched2 still can move insn across calls? 
>> So does sched2 have the same issue of incorrectly moving  the insn across a call which has unknown control flow?
> 
> I think problems are unlikely because register allocator assigns pseudos that
> cross setjmp to memory.
> 
> I think you hit the problem with sched1 because most testing is done on x86 and
> sched1 is not enabled there, otherwise the problem would have been noticed much
> earlier.

Yes, the problem with this bug is in sched1 on aarch64.  On x86 the same issue will be exposed when explicitly enable sched1 with -fschedule-insns. 

BTW, Why sched1 is not enabled on x86 by default?

Another question is:  As discussed in the original bug PR57067: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067
The root cause of this issue related to the abnormal control flow edges (from setjmp/longjmp) cannot be represented correctly at RTL stage, shall we fix
this root cause instead? 

Qing

> Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-23 19:57                   ` Qing Zhao
@ 2022-12-24  8:10                     ` Alexander Monakov
  2022-12-24  9:26                       ` Richard Biener
  2023-01-05 18:11                       ` Qing Zhao
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Monakov @ 2022-12-24  8:10 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jose Marchesi, gcc-patches

On Fri, 23 Dec 2022, Qing Zhao wrote:

> BTW, Why sched1 is not enabled on x86 by default?

Register allocation is tricky on x86 due to small number of general-purpose
registers, and sched1 can make it even more difficult. I think before register
pressure modeling was added, sched1 could not be enabled because then allocation
would sometimes fail, and now there's no incentive to enable it, as it is not so
important for modern x86 CPUs. Perhaps someone else has a more comprehensive
answer.

> Another question is:  As discussed in the original bug PR57067:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 The root cause of this
> issue related to the abnormal control flow edges (from setjmp/longjmp) cannot
> be represented correctly at RTL stage, shall we fix this root cause instead? 

You'd need an experienced reviewer to work with you, especially on high-level
design decisions such as "How ABNORMAL_DISPATCHER should be represented on RTL".
I'm afraid it's not just a matter of applying a small patch in one place.

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  8:10                     ` Alexander Monakov
@ 2022-12-24  9:26                       ` Richard Biener
  2022-12-24  9:58                         ` Jose E. Marchesi
  2023-01-05 18:11                       ` Qing Zhao
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Biener @ 2022-12-24  9:26 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Qing Zhao, Jose Marchesi, gcc-patches



> Am 24.12.2022 um 09:11 schrieb Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> 
>> On Fri, 23 Dec 2022, Qing Zhao wrote:
>> 
>> BTW, Why sched1 is not enabled on x86 by default?
> 
> Register allocation is tricky on x86 due to small number of general-purpose
> registers, and sched1 can make it even more difficult. I think before register
> pressure modeling was added, sched1 could not be enabled because then allocation
> would sometimes fail, and now there's no incentive to enable it, as it is not so
> important for modern x86 CPUs. Perhaps someone else has a more comprehensive
> answer.
> 
>> Another question is:  As discussed in the original bug PR57067:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 The root cause of this
>> issue related to the abnormal control flow edges (from setjmp/longjmp) cannot
>> be represented correctly at RTL stage, shall we fix this root cause instead? 
> 
> You'd need an experienced reviewer to work with you, especially on high-level
> design decisions such as "How ABNORMAL_DISPATCHER should be represented on RTL".
> I'm afraid it's not just a matter of applying a small patch in one place.

For nonlocal goto we Thread the abnormal dispatcher.  Of course by regenerating abnormal edges, not by keeping and modifying them.  We cannot re-generate the (optimal) set of abnormal edges for setjmp so we want to preserve those edges.  But as you say it’s a very non-trivial change.

Richard 

> Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  9:26                       ` Richard Biener
@ 2022-12-24  9:58                         ` Jose E. Marchesi
  2022-12-24 10:06                           ` Richard Biener
                                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Jose E. Marchesi @ 2022-12-24  9:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alexander Monakov, Qing Zhao, gcc-patches


>> Am 24.12.2022 um 09:11 schrieb Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>:
>> 
>> 
>>> On Fri, 23 Dec 2022, Qing Zhao wrote:
>>> 
>>> BTW, Why sched1 is not enabled on x86 by default?
>> 
>> Register allocation is tricky on x86 due to small number of general-purpose
>> registers, and sched1 can make it even more difficult. I think before register
>> pressure modeling was added, sched1 could not be enabled because then allocation
>> would sometimes fail, and now there's no incentive to enable it, as it is not so
>> important for modern x86 CPUs. Perhaps someone else has a more comprehensive
>> answer.
>> 
>>> Another question is:  As discussed in the original bug PR57067:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 The root cause of this
>>> issue related to the abnormal control flow edges (from setjmp/longjmp) cannot
>>> be represented correctly at RTL stage, shall we fix this root cause instead? 
>> 
>> You'd need an experienced reviewer to work with you, especially on high-level
>> design decisions such as "How ABNORMAL_DISPATCHER should be represented on RTL".
>> I'm afraid it's not just a matter of applying a small patch in one place.
>
> For nonlocal goto we Thread the abnormal dispatcher.  Of course by
> regenerating abnormal edges, not by keeping and modifying them.  We
> cannot re-generate the (optimal) set of abnormal edges for setjmp so
> we want to preserve those edges.  But as you say it’s a very
> non-trivial change.

Allright, so we have two short-term alternatives for at least remove the
possibility that GCC generates wrong code for valid C when the scheduler
is turned on:

a) To disable sched1 in functions that call setjmp.

b) To change deps_analyze_insn so instructions are not moved across
   function calls before register allocation (!reload_completed).

Both patches fix our particular use cases and are regression free in
aarch64-linux-gnu.

However, there is something I don't understand: wouldn't sched2
introduce the same problem when -fsched2-use-superblocks is specified?
In that case, the option a) would need to be expanded to disable sched2
as well, and b) wouldn't have effect (!after_reload)?

Using -fsched2-use-superblocks doesn't trigger the problem in our use
case.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  9:58                         ` Jose E. Marchesi
@ 2022-12-24 10:06                           ` Richard Biener
  2022-12-24 10:18                           ` Alexander Monakov
  2022-12-26  0:29                           ` Segher Boessenkool
  2 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2022-12-24 10:06 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Alexander Monakov, Qing Zhao, gcc-patches



> Am 24.12.2022 um 10:54 schrieb Jose E. Marchesi <jose.marchesi@oracle.com>:
> 
> 
>>>> Am 24.12.2022 um 09:11 schrieb Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>:
>>> 
>>> 
>>>> On Fri, 23 Dec 2022, Qing Zhao wrote:
>>>> 
>>>> BTW, Why sched1 is not enabled on x86 by default?
>>> 
>>> Register allocation is tricky on x86 due to small number of general-purpose
>>> registers, and sched1 can make it even more difficult. I think before register
>>> pressure modeling was added, sched1 could not be enabled because then allocation
>>> would sometimes fail, and now there's no incentive to enable it, as it is not so
>>> important for modern x86 CPUs. Perhaps someone else has a more comprehensive
>>> answer.
>>> 
>>>> Another question is:  As discussed in the original bug PR57067:
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 The root cause of this
>>>> issue related to the abnormal control flow edges (from setjmp/longjmp) cannot
>>>> be represented correctly at RTL stage, shall we fix this root cause instead? 
>>> 
>>> You'd need an experienced reviewer to work with you, especially on high-level
>>> design decisions such as "How ABNORMAL_DISPATCHER should be represented on RTL".
>>> I'm afraid it's not just a matter of applying a small patch in one place.
>> 
>> For nonlocal goto we Thread the abnormal dispatcher.  Of course by
>> regenerating abnormal edges, not by keeping and modifying them.  We
>> cannot re-generate the (optimal) set of abnormal edges for setjmp so
>> we want to preserve those edges.  But as you say it’s a very
>> non-trivial change.
> 
> Allright, so we have two short-term alternatives for at least remove the
> possibility that GCC generates wrong code for valid C when the scheduler
> is turned on:
> 
> a) To disable sched1 in functions that call setjmp.
> 
> b) To change deps_analyze_insn so instructions are not moved across
>   function calls before register allocation (!reload_completed).
> 
> Both patches fix our particular use cases and are regression free in
> aarch64-linux-gnu.
> 
> However, there is something I don't understand: wouldn't sched2
> introduce the same problem when -fsched2-use-superblocks is specified?
> In that case, the option a) would need to be expanded to disable sched2
> as well, and b) wouldn't have effect (!after_reload)?
> 
> Using -fsched2-use-superblocks doesn't trigger the problem in our use
> case.

I’d go with b) and revisit sched2 when we run into a testcase that’s mishandled.

Richard 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  9:58                         ` Jose E. Marchesi
  2022-12-24 10:06                           ` Richard Biener
@ 2022-12-24 10:18                           ` Alexander Monakov
  2022-12-26  0:29                           ` Segher Boessenkool
  2 siblings, 0 replies; 24+ messages in thread
From: Alexander Monakov @ 2022-12-24 10:18 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Richard Biener, Qing Zhao, gcc-patches

On Sat, 24 Dec 2022, Jose E. Marchesi wrote:

> However, there is something I don't understand: wouldn't sched2
> introduce the same problem when -fsched2-use-superblocks is specified?

Superblocks are irrelevant, a call instruction does not end a basic block
and the problematic motion happens within a BB on your testcase. Didn't you
ask about this already?

> In that case, the option a) would need to be expanded to disable sched2
> as well, and b) wouldn't have effect (!after_reload)?

See my response to Qing Zhao, I think due to special-casing of pseudos
that are live at setjmp during register allocation, sched2 will not move
them in such manner (they should be assigned to memory and I don't expect
sched2 will move such MEMs across calls). But of course there may be holes
in this theory.

On some targets disabling sched2 is not so easy because it's responsible
for VLIW packing (bundling on ia64).

Alexander

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  9:58                         ` Jose E. Marchesi
  2022-12-24 10:06                           ` Richard Biener
  2022-12-24 10:18                           ` Alexander Monakov
@ 2022-12-26  0:29                           ` Segher Boessenkool
  2 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2022-12-26  0:29 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: Richard Biener, Alexander Monakov, Qing Zhao, gcc-patches

Hi!

On Sat, Dec 24, 2022 at 10:58:41AM +0100, Jose E. Marchesi via Gcc-patches wrote:
> Allright, so we have two short-term alternatives for at least remove the
> possibility that GCC generates wrong code for valid C when the scheduler
> is turned on:
> 
> a) To disable sched1 in functions that call setjmp.

That is a heavy hammer.

> b) To change deps_analyze_insn so instructions are not moved across
>    function calls before register allocation (!reload_completed).

And this is way heavier still.

OTOH, it is possible b) actually improves code (improves performance) in
general (and maybe even without such a reload_completed check).

> Both patches fix our particular use cases and are regression free in
> aarch64-linux-gnu.

Did you also check for performance regressions?


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2] Disable sched1 in functions that call setjmp
  2022-12-24  8:10                     ` Alexander Monakov
  2022-12-24  9:26                       ` Richard Biener
@ 2023-01-05 18:11                       ` Qing Zhao
  1 sibling, 0 replies; 24+ messages in thread
From: Qing Zhao @ 2023-01-05 18:11 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Jose Marchesi, gcc-patches

Alexander,

(Sorry for the late reply due to holiday vacation).

> On Dec 24, 2022, at 3:10 AM, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> 
> On Fri, 23 Dec 2022, Qing Zhao wrote:
> 
>> BTW, Why sched1 is not enabled on x86 by default?
> 
> Register allocation is tricky on x86 due to small number of general-purpose
> registers, and sched1 can make it even more difficult. I think before register
> pressure modeling was added, sched1 could not be enabled because then allocation
> would sometimes fail, and now there's no incentive to enable it, as it is not so
> important for modern x86 CPUs. Perhaps someone else has a more comprehensive
> answer.

Okay. I see. Thanks for the explanation of the history. 
> 
>> Another question is:  As discussed in the original bug PR57067:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57067 The root cause of this
>> issue related to the abnormal control flow edges (from setjmp/longjmp) cannot
>> be represented correctly at RTL stage, shall we fix this root cause instead? 
> 
> You'd need an experienced reviewer to work with you, especially on high-level
> design decisions such as "How ABNORMAL_DISPATCHER should be represented on RTL".
> I'm afraid it's not just a matter of applying a small patch in one place.
I see. (And I guess so, fixing this is not a trivial work).

Qing
> 
> Alexander


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]
  2022-12-23 10:42     ` Jose E. Marchesi
@ 2023-01-13 18:20       ` Alexander Monakov
  2023-01-13 18:23         ` Richard Sandiford
  2023-01-13 18:33         ` Jose E. Marchesi
  0 siblings, 2 replies; 24+ messages in thread
From: Alexander Monakov @ 2023-01-13 18:20 UTC (permalink / raw)
  To: Jose E. Marchesi
  Cc: Alexander Monakov via Gcc-patches, qing.zhao, richard.sandiford


On Fri, 23 Dec 2022, Jose E. Marchesi wrote:

> > +1 for trying this FWIW.  There's still plenty of time to try an
> > alternative solution if there are unexpected performance problems.
> 
> Let me see if Alexander's patch fixes the issue at hand (it must) and
> will also do some regression testing.

Hi, I'm not sure at which court the ball is, but in the interest at moving
things forward here's the complete patch with the testcase. OK to apply?

---8<---

From: Alexander Monakov <amonakov@ispras.ru>
Date: Fri, 13 Jan 2023 21:04:02 +0300
Subject: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]

Scheduling across calls in the pre-RA scheduler is problematic: we do
not take liveness info into account, and are thus prone to extending
lifetime of a pseudo over the loop, requiring a callee-saved hardreg
or causing a spill.

If current function called a setjmp, lifting an assignment over a call
may be incorrect if a longjmp would happen before the assignment.

Thanks to Jose Marchesi for testing on AArch64.

gcc/ChangeLog:

	PR rtl-optimization/108117
	PR rtl-optimization/108132
	* sched-deps.cc (deps_analyze_insn): Do not schedule across
	calls before reload.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/108117
	PR rtl-optimization/108132
	* gcc.dg/pr108117.c: New test.
---
 gcc/sched-deps.cc               |  9 ++++++++-
 gcc/testsuite/gcc.dg/pr108117.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr108117.c

diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
index 948aa0c3b..5dc4fa4cd 100644
--- a/gcc/sched-deps.cc
+++ b/gcc/sched-deps.cc
@@ -3688,7 +3688,14 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
 
       CANT_MOVE (insn) = 1;
 
-      if (find_reg_note (insn, REG_SETJMP, NULL))
+      if (!reload_completed)
+	{
+	  /* Scheduling across calls may increase register pressure by extending
+	     live ranges of pseudos over the call.  Worse, in presence of setjmp
+	     it may incorrectly move up an assignment over a longjmp.  */
+	  reg_pending_barrier = MOVE_BARRIER;
+	}
+      else if (find_reg_note (insn, REG_SETJMP, NULL))
         {
           /* This is setjmp.  Assume that all registers, not just
              hard registers, may be clobbered by this call.  */
diff --git a/gcc/testsuite/gcc.dg/pr108117.c b/gcc/testsuite/gcc.dg/pr108117.c
new file mode 100644
index 000000000..ae151693e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr108117.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target nonlocal_goto } */
+/* { dg-options "-O2 -fschedule-insns" } */
+
+#include <stdio.h>
+#include <setjmp.h>
+
+jmp_buf ex_buf;
+
+__attribute__((noipa))
+void fn_throw(int x)
+{
+   if (x)
+      longjmp(ex_buf, 1);
+}
+
+int main(void)
+{
+    int vb = 0; // NB: not volatile, not modified after setjmp
+
+    if (!setjmp(ex_buf)) {
+        fn_throw(1);
+        vb = 1; // not reached in the abstract machine
+    }
+
+    if (vb) {
+        printf("Failed, vb = %d!\n", vb);
+        return 1;
+    }
+}
-- 
2.37.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]
  2023-01-13 18:20       ` [PATCH] sched-deps: do not schedule pseudos across calls [PR108117] Alexander Monakov
@ 2023-01-13 18:23         ` Richard Sandiford
  2023-01-13 18:33         ` Jose E. Marchesi
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Sandiford @ 2023-01-13 18:23 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Jose E. Marchesi, Alexander Monakov via Gcc-patches, qing.zhao

Alexander Monakov <amonakov@ispras.ru> writes:
> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
>
>> > +1 for trying this FWIW.  There's still plenty of time to try an
>> > alternative solution if there are unexpected performance problems.
>> 
>> Let me see if Alexander's patch fixes the issue at hand (it must) and
>> will also do some regression testing.
>
> Hi, I'm not sure at which court the ball is, but in the interest at moving
> things forward here's the complete patch with the testcase. OK to apply?
>
> ---8<---
>
> From: Alexander Monakov <amonakov@ispras.ru>
> Date: Fri, 13 Jan 2023 21:04:02 +0300
> Subject: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]
>
> Scheduling across calls in the pre-RA scheduler is problematic: we do
> not take liveness info into account, and are thus prone to extending
> lifetime of a pseudo over the loop, requiring a callee-saved hardreg
> or causing a spill.
>
> If current function called a setjmp, lifting an assignment over a call
> may be incorrect if a longjmp would happen before the assignment.
>
> Thanks to Jose Marchesi for testing on AArch64.
>
> gcc/ChangeLog:
>
> 	PR rtl-optimization/108117
> 	PR rtl-optimization/108132
> 	* sched-deps.cc (deps_analyze_insn): Do not schedule across
> 	calls before reload.
>
> gcc/testsuite/ChangeLog:
>
> 	PR rtl-optimization/108117
> 	PR rtl-optimization/108132
> 	* gcc.dg/pr108117.c: New test.

OK, thanks.

Richard

> ---
>  gcc/sched-deps.cc               |  9 ++++++++-
>  gcc/testsuite/gcc.dg/pr108117.c | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr108117.c
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 948aa0c3b..5dc4fa4cd 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3688,7 +3688,14 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
>  
>        CANT_MOVE (insn) = 1;
>  
> -      if (find_reg_note (insn, REG_SETJMP, NULL))
> +      if (!reload_completed)
> +	{
> +	  /* Scheduling across calls may increase register pressure by extending
> +	     live ranges of pseudos over the call.  Worse, in presence of setjmp
> +	     it may incorrectly move up an assignment over a longjmp.  */
> +	  reg_pending_barrier = MOVE_BARRIER;
> +	}
> +      else if (find_reg_note (insn, REG_SETJMP, NULL))
>          {
>            /* This is setjmp.  Assume that all registers, not just
>               hard registers, may be clobbered by this call.  */
> diff --git a/gcc/testsuite/gcc.dg/pr108117.c b/gcc/testsuite/gcc.dg/pr108117.c
> new file mode 100644
> index 000000000..ae151693e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr108117.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target nonlocal_goto } */
> +/* { dg-options "-O2 -fschedule-insns" } */
> +
> +#include <stdio.h>
> +#include <setjmp.h>
> +
> +jmp_buf ex_buf;
> +
> +__attribute__((noipa))
> +void fn_throw(int x)
> +{
> +   if (x)
> +      longjmp(ex_buf, 1);
> +}
> +
> +int main(void)
> +{
> +    int vb = 0; // NB: not volatile, not modified after setjmp
> +
> +    if (!setjmp(ex_buf)) {
> +        fn_throw(1);
> +        vb = 1; // not reached in the abstract machine
> +    }
> +
> +    if (vb) {
> +        printf("Failed, vb = %d!\n", vb);
> +        return 1;
> +    }
> +}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]
  2023-01-13 18:20       ` [PATCH] sched-deps: do not schedule pseudos across calls [PR108117] Alexander Monakov
  2023-01-13 18:23         ` Richard Sandiford
@ 2023-01-13 18:33         ` Jose E. Marchesi
  1 sibling, 0 replies; 24+ messages in thread
From: Jose E. Marchesi @ 2023-01-13 18:33 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Alexander Monakov via Gcc-patches, qing.zhao, richard.sandiford


> On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
>
>> > +1 for trying this FWIW.  There's still plenty of time to try an
>> > alternative solution if there are unexpected performance problems.
>> 
>> Let me see if Alexander's patch fixes the issue at hand (it must) and
>> will also do some regression testing.
>
> Hi, I'm not sure at which court the ball is, but in the interest at moving
> things forward here's the complete patch with the testcase. OK to
> apply?

Thanks for this.
We were actually on it, but of course busy with other stuff :)

>
> ---8<---
>
> From: Alexander Monakov <amonakov@ispras.ru>
> Date: Fri, 13 Jan 2023 21:04:02 +0300
> Subject: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]
>
> Scheduling across calls in the pre-RA scheduler is problematic: we do
> not take liveness info into account, and are thus prone to extending
> lifetime of a pseudo over the loop, requiring a callee-saved hardreg
> or causing a spill.
>
> If current function called a setjmp, lifting an assignment over a call
> may be incorrect if a longjmp would happen before the assignment.
>
> Thanks to Jose Marchesi for testing on AArch64.
>
> gcc/ChangeLog:
>
> 	PR rtl-optimization/108117
> 	PR rtl-optimization/108132
> 	* sched-deps.cc (deps_analyze_insn): Do not schedule across
> 	calls before reload.
>
> gcc/testsuite/ChangeLog:
>
> 	PR rtl-optimization/108117
> 	PR rtl-optimization/108132
> 	* gcc.dg/pr108117.c: New test.
> ---
>  gcc/sched-deps.cc               |  9 ++++++++-
>  gcc/testsuite/gcc.dg/pr108117.c | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr108117.c
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index 948aa0c3b..5dc4fa4cd 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -3688,7 +3688,14 @@ deps_analyze_insn (class deps_desc *deps, rtx_insn *insn)
>  
>        CANT_MOVE (insn) = 1;
>  
> -      if (find_reg_note (insn, REG_SETJMP, NULL))
> +      if (!reload_completed)
> +	{
> +	  /* Scheduling across calls may increase register pressure by extending
> +	     live ranges of pseudos over the call.  Worse, in presence of setjmp
> +	     it may incorrectly move up an assignment over a longjmp.  */
> +	  reg_pending_barrier = MOVE_BARRIER;
> +	}
> +      else if (find_reg_note (insn, REG_SETJMP, NULL))
>          {
>            /* This is setjmp.  Assume that all registers, not just
>               hard registers, may be clobbered by this call.  */
> diff --git a/gcc/testsuite/gcc.dg/pr108117.c b/gcc/testsuite/gcc.dg/pr108117.c
> new file mode 100644
> index 000000000..ae151693e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr108117.c
> @@ -0,0 +1,30 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target nonlocal_goto } */
> +/* { dg-options "-O2 -fschedule-insns" } */
> +
> +#include <stdio.h>
> +#include <setjmp.h>
> +
> +jmp_buf ex_buf;
> +
> +__attribute__((noipa))
> +void fn_throw(int x)
> +{
> +   if (x)
> +      longjmp(ex_buf, 1);
> +}
> +
> +int main(void)
> +{
> +    int vb = 0; // NB: not volatile, not modified after setjmp
> +
> +    if (!setjmp(ex_buf)) {
> +        fn_throw(1);
> +        vb = 1; // not reached in the abstract machine
> +    }
> +
> +    if (vb) {
> +        printf("Failed, vb = %d!\n", vb);
> +        return 1;
> +    }
> +}

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-01-13 18:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 17:32 [PATCH V2] Disable sched1 in functions that call setjmp Jose E. Marchesi
2022-12-22 17:56 ` Alexander Monakov
2022-12-22 19:28   ` Qing Zhao
2022-12-23  7:33     ` Alexander Monakov
2022-12-23 14:34       ` Qing Zhao
2022-12-23 15:03         ` Alexander Monakov
2022-12-23 17:27           ` Jose E. Marchesi
2022-12-23 17:31             ` Alexander Monakov
2022-12-23 17:45               ` Jose E. Marchesi
2022-12-23 19:21               ` Qing Zhao
2022-12-23 19:36                 ` Alexander Monakov
2022-12-23 19:57                   ` Qing Zhao
2022-12-24  8:10                     ` Alexander Monakov
2022-12-24  9:26                       ` Richard Biener
2022-12-24  9:58                         ` Jose E. Marchesi
2022-12-24 10:06                           ` Richard Biener
2022-12-24 10:18                           ` Alexander Monakov
2022-12-26  0:29                           ` Segher Boessenkool
2023-01-05 18:11                       ` Qing Zhao
2022-12-23 10:05   ` Richard Sandiford
2022-12-23 10:42     ` Jose E. Marchesi
2023-01-13 18:20       ` [PATCH] sched-deps: do not schedule pseudos across calls [PR108117] Alexander Monakov
2023-01-13 18:23         ` Richard Sandiford
2023-01-13 18:33         ` Jose E. Marchesi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).