* Re: [21/32] Remove global call sets: LRA
@ 2019-10-06 8:45 Uros Bizjak
2019-10-06 14:32 ` Richard Sandiford
0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2019-10-06 8:45 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, H. J. Lu
>>> This caused:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
>
> Thanks for reducing & tracking down the underlying cause.
>
>> This change doesn't work with -mzeroupper. When -mzeroupper is used,
>> upper bits of vector registers are clobbered upon callee return if any
>> MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
>
> The problem here really is that the pattern is just:
>
> (define_insn "avx_vzeroupper"
> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> "TARGET_AVX"
> "vzeroupper"
> ...)
>
> and so its effect on the registers isn't modelled at all in rtl.
> Maybe one option would be to add a parallel:
>
> (set (reg:V2DI N) (reg:V2DI N))
>
> for each register. Or we could do something like I did for the SVE
> tlsdesc calls, although here that would mean using a call pattern for
> something that isn't really a call. Or we could reinstate clobber_high
> and use that, but that's very much third out of three.
>
> I don't think we should add target hooks to get around this, since that's
> IMO papering over the issue.
>
> I'll try the parallel set thing first.
Please note that vzeroupper insertion pass runs after register
allocation, so in effect vzeroupper pattern is hidden to the register
allocator.
Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-10-06 8:45 [21/32] Remove global call sets: LRA Uros Bizjak
@ 2019-10-06 14:32 ` Richard Sandiford
2019-10-07 6:04 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2019-10-06 14:32 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches, H. J. Lu
Uros Bizjak <ubizjak@gmail.com> writes:
>>>> This caused:
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
>>
>> Thanks for reducing & tracking down the underlying cause.
>>
>>> This change doesn't work with -mzeroupper. When -mzeroupper is used,
>>> upper bits of vector registers are clobbered upon callee return if any
>>> MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
>>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
>>
>> The problem here really is that the pattern is just:
>>
>> (define_insn "avx_vzeroupper"
>> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
>> "TARGET_AVX"
>> "vzeroupper"
>> ...)
>>
>> and so its effect on the registers isn't modelled at all in rtl.
>> Maybe one option would be to add a parallel:
>>
>> (set (reg:V2DI N) (reg:V2DI N))
>>
>> for each register. Or we could do something like I did for the SVE
>> tlsdesc calls, although here that would mean using a call pattern for
>> something that isn't really a call. Or we could reinstate clobber_high
>> and use that, but that's very much third out of three.
>>
>> I don't think we should add target hooks to get around this, since that's
>> IMO papering over the issue.
>>
>> I'll try the parallel set thing first.
>
> Please note that vzeroupper insertion pass runs after register
> allocation, so in effect vzeroupper pattern is hidden to the register
> allocator.
Right, but even post-RA passes rely on the register usage being accurate.
Same for collect_fn_hard_reg_usage, which is the issue here.
The info collected by collect_fn_hard_reg_usage was always wrong for
vzeroupper. What changed with my patch is that we now use that info
for partly call-clobbered registers as well as "normally" clobbered
registers. So this is another instance of a problem that was previously
being masked by having ix86_hard_regno_call_part_clobbered enforce Win64
rules for all ABIs.
My first idea of adding:
(set (reg:V2DI N) (reg:V2DI N))
for all clobbered registers didn't work well because it left previously-
dead registers upwards exposed (obvious in hindsight). And the second
idea of using a fake call would require too many "is this really a call?"
hacks.
So in the end I went for a subpass that chooses between:
(set (reg:V2DI N) (reg:V2DI N))
and
(clobber (reg:V2DI N))
depending on whether register N is live or not. This fixes the testcase
and doesn't seem to regress code quality for the tests I've tried.
Tested on x86_64-linux-gnu. OK to install?
Richard
2019-10-06 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR target/91994
* config/i386/sse.md (avx_vzeroupper): Turn into a define_expand
and wrap the unspec_volatile in a parallel.
(*avx_vzeroupper): New define_insn. Use a match_parallel around
the unspec_volatile.
* config/i386/predicates.md (vzeroupper_pattern): Expect the
unspec_volatile to be wrapped in a parallel.
* config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper)
(ix86_add_reg_usage_to_vzerouppers): New functions.
(rest_of_handle_insert_vzeroupper): Use them to add register
usage information to the vzeroupper instructions.
gcc/testsuite/
PR target/91994
* gcc.target/i386/pr91994.c: New test.
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md 2019-09-17 15:27:10.214075253 +0100
+++ gcc/config/i386/sse.md 2019-10-06 15:19:10.062769500 +0100
@@ -19622,9 +19622,16 @@ (define_insn "*avx_vzeroall"
(set_attr "mode" "OI")])
;; Clear the upper 128bits of AVX registers, equivalent to a NOP
-;; if the upper 128bits are unused.
-(define_insn "avx_vzeroupper"
- [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
+;; if the upper 128bits are unused. Initially we expand the instructions
+;; as though they had no effect on the SSE registers, but later add SETs and
+;; CLOBBERs to the PARALLEL to model the real effect.
+(define_expand "avx_vzeroupper"
+ [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
+ "TARGET_AVX")
+
+(define_insn "*avx_vzeroupper"
+ [(match_parallel 0 "vzeroupper_pattern"
+ [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
"TARGET_AVX"
"vzeroupper"
[(set_attr "type" "sse")
Index: gcc/config/i386/predicates.md
===================================================================
--- gcc/config/i386/predicates.md 2019-09-10 19:56:45.337178032 +0100
+++ gcc/config/i386/predicates.md 2019-10-06 15:19:10.054769556 +0100
@@ -1441,8 +1441,9 @@ (define_predicate "vzeroall_pattern"
;; return true if OP is a vzeroupper pattern.
(define_predicate "vzeroupper_pattern"
- (and (match_code "unspec_volatile")
- (match_test "XINT (op, 1) == UNSPECV_VZEROUPPER")))
+ (and (match_code "parallel")
+ (match_code "unspec_volatile" "a")
+ (match_test "XINT (XVECEXP (op, 0, 0), 1) == UNSPECV_VZEROUPPER")))
;; Return true if OP is an addsub vec_merge operation
(define_predicate "addsub_vm_operator"
Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c 2019-09-21 13:56:08.895934718 +0100
+++ gcc/config/i386/i386-features.c 2019-10-06 15:19:10.054769556 +0100
@@ -1757,6 +1757,68 @@ convert_scalars_to_vector (bool timode_p
return 0;
}
+/* Modify the vzeroupper pattern in INSN so that it describes the effect
+ that the instruction has on the SSE registers. LIVE_REGS are the set
+ of registers that are live across the instruction.
+
+ For a live register R we use:
+
+ (set (reg:V2DF R) (reg:V2DF R))
+
+ which preserves the low 128 bits but clobbers the upper bits.
+ For a dead register we just use:
+
+ (clobber (reg:V2DF R))
+
+ which invalidates any previous contents of R and stops R from becoming
+ live across the vzeroupper in future. */
+
+static void
+ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
+{
+ rtx pattern = PATTERN (insn);
+ unsigned int nregs = TARGET_64BIT ? 16 : 8;
+ rtvec vec = rtvec_alloc (nregs + 1);
+ RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
+ for (unsigned int i = 0; i < nregs; ++i)
+ {
+ unsigned int regno = GET_SSE_REGNO (i);
+ rtx reg = gen_rtx_REG (V2DImode, regno);
+ if (bitmap_bit_p (live_regs, regno))
+ RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
+ else
+ RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
+ }
+ XVEC (pattern, 0) = vec;
+ df_insn_rescan (insn);
+}
+
+/* Walk the vzeroupper instructions in the function and annotate them
+ with the effect that they have on the SSE registers. */
+
+static void
+ix86_add_reg_usage_to_vzerouppers (void)
+{
+ basic_block bb;
+ rtx_insn *insn;
+ auto_bitmap live_regs;
+
+ df_analyze ();
+ FOR_EACH_BB_FN (bb, cfun)
+ {
+ bitmap_copy (live_regs, df_get_live_out (bb));
+ df_simulate_initialize_backwards (bb, live_regs);
+ FOR_BB_INSNS_REVERSE (bb, insn)
+ {
+ if (!NONDEBUG_INSN_P (insn))
+ continue;
+ if (vzeroupper_pattern (PATTERN (insn), VOIDmode))
+ ix86_add_reg_usage_to_vzeroupper (insn, live_regs);
+ df_simulate_one_insn_backwards (bb, insn, live_regs);
+ }
+ }
+}
+
static unsigned int
rest_of_handle_insert_vzeroupper (void)
{
@@ -1773,6 +1835,7 @@ rest_of_handle_insert_vzeroupper (void)
/* Call optimize_mode_switching. */
g->get_passes ()->execute_pass_mode_switching ();
+ ix86_add_reg_usage_to_vzerouppers ();
return 0;
}
Index: gcc/testsuite/gcc.target/i386/pr91994.c
===================================================================
--- /dev/null 2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/i386/pr91994.c 2019-10-06 15:19:10.062769500 +0100
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O2 -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#include <immintrin.h>
+
+__m256i x1, x2, x3;
+
+__attribute__ ((noinline))
+static void
+foo (void)
+{
+ x1 = x2;
+}
+
+void
+bar (void)
+{
+ __m256i x = x1;
+ foo ();
+ x3 = x;
+}
+
+__attribute__ ((noinline))
+void
+avx_test (void)
+{
+ __m256i x = _mm256_set1_epi8 (3);
+ x1 = x;
+ bar ();
+ if (__builtin_memcmp (&x3, &x, sizeof (x)))
+ __builtin_abort ();
+}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-10-06 14:32 ` Richard Sandiford
@ 2019-10-07 6:04 ` Uros Bizjak
0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2019-10-07 6:04 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, H. J. Lu
On Sun, Oct 6, 2019 at 4:32 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Uros Bizjak <ubizjak@gmail.com> writes:
> >>>> This caused:
> >>>>
> >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
> >>
> >> Thanks for reducing & tracking down the underlying cause.
> >>
> >>> This change doesn't work with -mzeroupper. When -mzeroupper is used,
> >>> upper bits of vector registers are clobbered upon callee return if any
> >>> MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
> >>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
> >>
> >> The problem here really is that the pattern is just:
> >>
> >> (define_insn "avx_vzeroupper"
> >> [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> >> "TARGET_AVX"
> >> "vzeroupper"
> >> ...)
> >>
> >> and so its effect on the registers isn't modelled at all in rtl.
> >> Maybe one option would be to add a parallel:
> >>
> >> (set (reg:V2DI N) (reg:V2DI N))
> >>
> >> for each register. Or we could do something like I did for the SVE
> >> tlsdesc calls, although here that would mean using a call pattern for
> >> something that isn't really a call. Or we could reinstate clobber_high
> >> and use that, but that's very much third out of three.
> >>
> >> I don't think we should add target hooks to get around this, since that's
> >> IMO papering over the issue.
> >>
> >> I'll try the parallel set thing first.
> >
> > Please note that vzeroupper insertion pass runs after register
> > allocation, so in effect vzeroupper pattern is hidden to the register
> > allocator.
>
> Right, but even post-RA passes rely on the register usage being accurate.
> Same for collect_fn_hard_reg_usage, which is the issue here.
>
> The info collected by collect_fn_hard_reg_usage was always wrong for
> vzeroupper. What changed with my patch is that we now use that info
> for partly call-clobbered registers as well as "normally" clobbered
> registers. So this is another instance of a problem that was previously
> being masked by having ix86_hard_regno_call_part_clobbered enforce Win64
> rules for all ABIs.
>
> My first idea of adding:
>
> (set (reg:V2DI N) (reg:V2DI N))
>
> for all clobbered registers didn't work well because it left previously-
> dead registers upwards exposed (obvious in hindsight). And the second
> idea of using a fake call would require too many "is this really a call?"
> hacks.
>
> So in the end I went for a subpass that chooses between:
>
> (set (reg:V2DI N) (reg:V2DI N))
>
> and
>
> (clobber (reg:V2DI N))
>
> depending on whether register N is live or not. This fixes the testcase
> and doesn't seem to regress code quality for the tests I've tried.
>
> Tested on x86_64-linux-gnu. OK to install?
>
> Richard
>
>
> 2019-10-06 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> PR target/91994
> * config/i386/sse.md (avx_vzeroupper): Turn into a define_expand
> and wrap the unspec_volatile in a parallel.
> (*avx_vzeroupper): New define_insn. Use a match_parallel around
> the unspec_volatile.
> * config/i386/predicates.md (vzeroupper_pattern): Expect the
> unspec_volatile to be wrapped in a parallel.
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper)
> (ix86_add_reg_usage_to_vzerouppers): New functions.
> (rest_of_handle_insert_vzeroupper): Use them to add register
> usage information to the vzeroupper instructions.
>
> gcc/testsuite/
> PR target/91994
> * gcc.target/i386/pr91994.c: New test.
LGTM.
Thanks,
Uros.
> Index: gcc/config/i386/sse.md
> ===================================================================
> --- gcc/config/i386/sse.md 2019-09-17 15:27:10.214075253 +0100
> +++ gcc/config/i386/sse.md 2019-10-06 15:19:10.062769500 +0100
> @@ -19622,9 +19622,16 @@ (define_insn "*avx_vzeroall"
> (set_attr "mode" "OI")])
>
> ;; Clear the upper 128bits of AVX registers, equivalent to a NOP
> -;; if the upper 128bits are unused.
> -(define_insn "avx_vzeroupper"
> - [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> +;; if the upper 128bits are unused. Initially we expand the instructions
> +;; as though they had no effect on the SSE registers, but later add SETs and
> +;; CLOBBERs to the PARALLEL to model the real effect.
> +(define_expand "avx_vzeroupper"
> + [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> + "TARGET_AVX")
> +
> +(define_insn "*avx_vzeroupper"
> + [(match_parallel 0 "vzeroupper_pattern"
> + [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> "TARGET_AVX"
> "vzeroupper"
> [(set_attr "type" "sse")
> Index: gcc/config/i386/predicates.md
> ===================================================================
> --- gcc/config/i386/predicates.md 2019-09-10 19:56:45.337178032 +0100
> +++ gcc/config/i386/predicates.md 2019-10-06 15:19:10.054769556 +0100
> @@ -1441,8 +1441,9 @@ (define_predicate "vzeroall_pattern"
>
> ;; return true if OP is a vzeroupper pattern.
> (define_predicate "vzeroupper_pattern"
> - (and (match_code "unspec_volatile")
> - (match_test "XINT (op, 1) == UNSPECV_VZEROUPPER")))
> + (and (match_code "parallel")
> + (match_code "unspec_volatile" "a")
> + (match_test "XINT (XVECEXP (op, 0, 0), 1) == UNSPECV_VZEROUPPER")))
>
> ;; Return true if OP is an addsub vec_merge operation
> (define_predicate "addsub_vm_operator"
> Index: gcc/config/i386/i386-features.c
> ===================================================================
> --- gcc/config/i386/i386-features.c 2019-09-21 13:56:08.895934718 +0100
> +++ gcc/config/i386/i386-features.c 2019-10-06 15:19:10.054769556 +0100
> @@ -1757,6 +1757,68 @@ convert_scalars_to_vector (bool timode_p
> return 0;
> }
>
> +/* Modify the vzeroupper pattern in INSN so that it describes the effect
> + that the instruction has on the SSE registers. LIVE_REGS are the set
> + of registers that are live across the instruction.
> +
> + For a live register R we use:
> +
> + (set (reg:V2DF R) (reg:V2DF R))
> +
> + which preserves the low 128 bits but clobbers the upper bits.
> + For a dead register we just use:
> +
> + (clobber (reg:V2DF R))
> +
> + which invalidates any previous contents of R and stops R from becoming
> + live across the vzeroupper in future. */
> +
> +static void
> +ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> +{
> + rtx pattern = PATTERN (insn);
> + unsigned int nregs = TARGET_64BIT ? 16 : 8;
> + rtvec vec = rtvec_alloc (nregs + 1);
> + RTVEC_ELT (vec, 0) = XVECEXP (pattern, 0, 0);
> + for (unsigned int i = 0; i < nregs; ++i)
> + {
> + unsigned int regno = GET_SSE_REGNO (i);
> + rtx reg = gen_rtx_REG (V2DImode, regno);
> + if (bitmap_bit_p (live_regs, regno))
> + RTVEC_ELT (vec, i + 1) = gen_rtx_SET (reg, reg);
> + else
> + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg);
> + }
> + XVEC (pattern, 0) = vec;
> + df_insn_rescan (insn);
> +}
> +
> +/* Walk the vzeroupper instructions in the function and annotate them
> + with the effect that they have on the SSE registers. */
> +
> +static void
> +ix86_add_reg_usage_to_vzerouppers (void)
> +{
> + basic_block bb;
> + rtx_insn *insn;
> + auto_bitmap live_regs;
> +
> + df_analyze ();
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + bitmap_copy (live_regs, df_get_live_out (bb));
> + df_simulate_initialize_backwards (bb, live_regs);
> + FOR_BB_INSNS_REVERSE (bb, insn)
> + {
> + if (!NONDEBUG_INSN_P (insn))
> + continue;
> + if (vzeroupper_pattern (PATTERN (insn), VOIDmode))
> + ix86_add_reg_usage_to_vzeroupper (insn, live_regs);
> + df_simulate_one_insn_backwards (bb, insn, live_regs);
> + }
> + }
> +}
> +
> static unsigned int
> rest_of_handle_insert_vzeroupper (void)
> {
> @@ -1773,6 +1835,7 @@ rest_of_handle_insert_vzeroupper (void)
>
> /* Call optimize_mode_switching. */
> g->get_passes ()->execute_pass_mode_switching ();
> + ix86_add_reg_usage_to_vzerouppers ();
> return 0;
> }
>
> Index: gcc/testsuite/gcc.target/i386/pr91994.c
> ===================================================================
> --- /dev/null 2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/i386/pr91994.c 2019-10-06 15:19:10.062769500 +0100
> @@ -0,0 +1,35 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target avx } */
> +/* { dg-options "-O2 -mavx -mvzeroupper" } */
> +
> +#include "avx-check.h"
> +
> +#include <immintrin.h>
> +
> +__m256i x1, x2, x3;
> +
> +__attribute__ ((noinline))
> +static void
> +foo (void)
> +{
> + x1 = x2;
> +}
> +
> +void
> +bar (void)
> +{
> + __m256i x = x1;
> + foo ();
> + x3 = x;
> +}
> +
> +__attribute__ ((noinline))
> +void
> +avx_test (void)
> +{
> + __m256i x = _mm256_set1_epi8 (3);
> + x1 = x;
> + bar ();
> + if (__builtin_memcmp (&x3, &x, sizeof (x)))
> + __builtin_abort ();
> +}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-10-04 21:52 ` H.J. Lu
@ 2019-10-05 13:33 ` Richard Sandiford
0 siblings, 0 replies; 8+ messages in thread
From: Richard Sandiford @ 2019-10-05 13:33 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Fri, Oct 4, 2019 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Wed, Sep 11, 2019 at 12:14 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> >
>> > lra_reg has an actual_call_used_reg_set field that is only used during
>> > inheritance. This in turn required a special lra_create_live_ranges
>> > pass for flag_ipa_ra to set up this field. This patch instead makes
>> > the inheritance code do its own live register tracking, using the
>> > same ABI-mask-and-clobber-set pair as for IRA.
>> >
>> > Tracking ABIs simplifies (and cheapens) the logic in lra-lives.c and
>> > means we no longer need a separate path for -fipa-ra. It also means
>> > we can remove TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.
>> >
>> > The patch also strengthens the sanity check in lra_assigns so that
>> > we check that reg_renumber is consistent with the whole conflict set,
>> > not just the call-clobbered registers.
>> >
>> >
>> > 2019-09-11 Richard Sandiford <richard.sandiford@arm.com>
>> >
>> > gcc/
>> > * target.def (return_call_with_max_clobbers): Delete.
>> > * doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
>> > * doc/tm.texi: Regenerate.
>> > * config/aarch64/aarch64.c (aarch64_return_call_with_max_clobbers)
>> > (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
>> > * lra-int.h (lra_reg::actual_call_used_reg_set): Delete.
>> > (lra_reg::call_insn): Delete.
>> > * lra.c: Include function-abi.h.
>> > (initialize_lra_reg_info_element): Don't initialize the fields above.
>> > (lra): Use crtl->abi to test whether the current function needs to
>> > save a register in the prologue. Remove special pre-inheritance
>> > lra_create_live_ranges pass for flag_ipa_ra.
>> > * lra-assigns.c: Include function-abi.h
>> > (find_hard_regno_for_1): Use crtl->abi to test whether the current
>> > function needs to save a register in the prologue.
>> > (lra_assign): Assert that registers aren't allocated to a
>> > conflicting register, rather than checking only for overlaps
>> > with call_used_or_fixed_regs. Do this even for flag_ipa_ra,
>> > and for registers that are not live across a call.
>> > * lra-constraints.c (last_call_for_abi): New variable.
>> > (full_and_partial_call_clobbers): Likewise.
>> > (setup_next_usage_insn): Remove the register from
>> > full_and_partial_call_clobbers.
>> > (need_for_call_save_p): Use call_clobbered_in_region_p to test
>> > whether the register needs a caller save.
>> > (need_for_split_p): Use full_and_partial_reg_clobbers instead
>> > of call_used_or_fixed_regs.
>> > (inherit_in_ebb): Initialize and maintain last_call_for_abi and
>> > full_and_partial_call_clobbers.
>> > * lra-lives.c (check_pseudos_live_through_calls): Replace
>> > last_call_used_reg_set and call_insn arguments with an abi argument.
>> > Remove handling of lra_reg::call_insn. Use function_abi::mode_clobbers
>> > as the set of conflicting registers.
>> > (calls_have_same_clobbers_p): Delete.
>> > (process_bb_lives): Track the ABI of the last call instead of an
>> > insn/HARD_REG_SET pair. Update calls to
>> > check_pseudos_live_through_calls. Use eh_edge_abi to calculate
>> > the set of registers that could be clobbered by an EH edge.
>> > Include partially-clobbered as well as fully-clobbered registers.
>> > (lra_create_live_ranges_1): Don't initialize lra_reg::call_insn.
>> > * lra-remat.c: Include function-abi.h.
>> > (call_used_regs_arr_len, call_used_regs_arr): Delete.
>> > (set_bb_regs): Use call_insn_abi to get the set of call-clobbered
>> > registers and bitmap_view to combine them into dead_regs.
>> > (call_used_input_regno_present_p): Take a function_abi argument
>> > and use it to test whether a register is call-clobbered.
>> > (calculate_gen_cands): Use call_insn_abi to get the ABI of the
>> > call insn target. Update tje call to call_used_input_regno_present_p.
>> > (do_remat): Likewise.
>> > (lra_remat): Remove the initialization of call_used_regs_arr_len
>> > and call_used_regs_arr.
>>
>> This caused:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
Thanks for reducing & tracking down the underlying cause.
> This change doesn't work with -mzeroupper. When -mzeroupper is used,
> upper bits of vector registers are clobbered upon callee return if any
> MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
The problem here really is that the pattern is just:
(define_insn "avx_vzeroupper"
[(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
"TARGET_AVX"
"vzeroupper"
...)
and so its effect on the registers isn't modelled at all in rtl.
Maybe one option would be to add a parallel:
(set (reg:V2DI N) (reg:V2DI N))
for each register. Or we could do something like I did for the SVE
tlsdesc calls, although here that would mean using a call pattern for
something that isn't really a call. Or we could reinstate clobber_high
and use that, but that's very much third out of three.
I don't think we should add target hooks to get around this, since that's
IMO papering over the issue.
I'll try the parallel set thing first.
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-10-04 18:03 ` H.J. Lu
@ 2019-10-04 21:52 ` H.J. Lu
2019-10-05 13:33 ` Richard Sandiford
0 siblings, 1 reply; 8+ messages in thread
From: H.J. Lu @ 2019-10-04 21:52 UTC (permalink / raw)
To: Richard Sandiford; +Cc: GCC Patches
On Fri, Oct 4, 2019 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Sep 11, 2019 at 12:14 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > lra_reg has an actual_call_used_reg_set field that is only used during
> > inheritance. This in turn required a special lra_create_live_ranges
> > pass for flag_ipa_ra to set up this field. This patch instead makes
> > the inheritance code do its own live register tracking, using the
> > same ABI-mask-and-clobber-set pair as for IRA.
> >
> > Tracking ABIs simplifies (and cheapens) the logic in lra-lives.c and
> > means we no longer need a separate path for -fipa-ra. It also means
> > we can remove TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.
> >
> > The patch also strengthens the sanity check in lra_assigns so that
> > we check that reg_renumber is consistent with the whole conflict set,
> > not just the call-clobbered registers.
> >
> >
> > 2019-09-11 Richard Sandiford <richard.sandiford@arm.com>
> >
> > gcc/
> > * target.def (return_call_with_max_clobbers): Delete.
> > * doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> > * doc/tm.texi: Regenerate.
> > * config/aarch64/aarch64.c (aarch64_return_call_with_max_clobbers)
> > (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> > * lra-int.h (lra_reg::actual_call_used_reg_set): Delete.
> > (lra_reg::call_insn): Delete.
> > * lra.c: Include function-abi.h.
> > (initialize_lra_reg_info_element): Don't initialize the fields above.
> > (lra): Use crtl->abi to test whether the current function needs to
> > save a register in the prologue. Remove special pre-inheritance
> > lra_create_live_ranges pass for flag_ipa_ra.
> > * lra-assigns.c: Include function-abi.h
> > (find_hard_regno_for_1): Use crtl->abi to test whether the current
> > function needs to save a register in the prologue.
> > (lra_assign): Assert that registers aren't allocated to a
> > conflicting register, rather than checking only for overlaps
> > with call_used_or_fixed_regs. Do this even for flag_ipa_ra,
> > and for registers that are not live across a call.
> > * lra-constraints.c (last_call_for_abi): New variable.
> > (full_and_partial_call_clobbers): Likewise.
> > (setup_next_usage_insn): Remove the register from
> > full_and_partial_call_clobbers.
> > (need_for_call_save_p): Use call_clobbered_in_region_p to test
> > whether the register needs a caller save.
> > (need_for_split_p): Use full_and_partial_reg_clobbers instead
> > of call_used_or_fixed_regs.
> > (inherit_in_ebb): Initialize and maintain last_call_for_abi and
> > full_and_partial_call_clobbers.
> > * lra-lives.c (check_pseudos_live_through_calls): Replace
> > last_call_used_reg_set and call_insn arguments with an abi argument.
> > Remove handling of lra_reg::call_insn. Use function_abi::mode_clobbers
> > as the set of conflicting registers.
> > (calls_have_same_clobbers_p): Delete.
> > (process_bb_lives): Track the ABI of the last call instead of an
> > insn/HARD_REG_SET pair. Update calls to
> > check_pseudos_live_through_calls. Use eh_edge_abi to calculate
> > the set of registers that could be clobbered by an EH edge.
> > Include partially-clobbered as well as fully-clobbered registers.
> > (lra_create_live_ranges_1): Don't initialize lra_reg::call_insn.
> > * lra-remat.c: Include function-abi.h.
> > (call_used_regs_arr_len, call_used_regs_arr): Delete.
> > (set_bb_regs): Use call_insn_abi to get the set of call-clobbered
> > registers and bitmap_view to combine them into dead_regs.
> > (call_used_input_regno_present_p): Take a function_abi argument
> > and use it to test whether a register is call-clobbered.
> > (calculate_gen_cands): Use call_insn_abi to get the ABI of the
> > call insn target. Update tje call to call_used_input_regno_present_p.
> > (do_remat): Likewise.
> > (lra_remat): Remove the initialization of call_used_regs_arr_len
> > and call_used_regs_arr.
>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
>
This change doesn't work with -mzeroupper. When -mzeroupper is used,
upper bits of vector registers are clobbered upon callee return if any
MM/ZMM registers are used in callee. Even if YMM7 isn't used, upper
bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
--
H.J.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-09-11 19:14 ` [21/32] Remove global call sets: LRA Richard Sandiford
2019-09-30 15:29 ` Jeff Law
@ 2019-10-04 18:03 ` H.J. Lu
2019-10-04 21:52 ` H.J. Lu
1 sibling, 1 reply; 8+ messages in thread
From: H.J. Lu @ 2019-10-04 18:03 UTC (permalink / raw)
To: Richard Sandiford; +Cc: GCC Patches
On Wed, Sep 11, 2019 at 12:14 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> lra_reg has an actual_call_used_reg_set field that is only used during
> inheritance. This in turn required a special lra_create_live_ranges
> pass for flag_ipa_ra to set up this field. This patch instead makes
> the inheritance code do its own live register tracking, using the
> same ABI-mask-and-clobber-set pair as for IRA.
>
> Tracking ABIs simplifies (and cheapens) the logic in lra-lives.c and
> means we no longer need a separate path for -fipa-ra. It also means
> we can remove TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.
>
> The patch also strengthens the sanity check in lra_assigns so that
> we check that reg_renumber is consistent with the whole conflict set,
> not just the call-clobbered registers.
>
>
> 2019-09-11 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * target.def (return_call_with_max_clobbers): Delete.
> * doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> * doc/tm.texi: Regenerate.
> * config/aarch64/aarch64.c (aarch64_return_call_with_max_clobbers)
> (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> * lra-int.h (lra_reg::actual_call_used_reg_set): Delete.
> (lra_reg::call_insn): Delete.
> * lra.c: Include function-abi.h.
> (initialize_lra_reg_info_element): Don't initialize the fields above.
> (lra): Use crtl->abi to test whether the current function needs to
> save a register in the prologue. Remove special pre-inheritance
> lra_create_live_ranges pass for flag_ipa_ra.
> * lra-assigns.c: Include function-abi.h
> (find_hard_regno_for_1): Use crtl->abi to test whether the current
> function needs to save a register in the prologue.
> (lra_assign): Assert that registers aren't allocated to a
> conflicting register, rather than checking only for overlaps
> with call_used_or_fixed_regs. Do this even for flag_ipa_ra,
> and for registers that are not live across a call.
> * lra-constraints.c (last_call_for_abi): New variable.
> (full_and_partial_call_clobbers): Likewise.
> (setup_next_usage_insn): Remove the register from
> full_and_partial_call_clobbers.
> (need_for_call_save_p): Use call_clobbered_in_region_p to test
> whether the register needs a caller save.
> (need_for_split_p): Use full_and_partial_reg_clobbers instead
> of call_used_or_fixed_regs.
> (inherit_in_ebb): Initialize and maintain last_call_for_abi and
> full_and_partial_call_clobbers.
> * lra-lives.c (check_pseudos_live_through_calls): Replace
> last_call_used_reg_set and call_insn arguments with an abi argument.
> Remove handling of lra_reg::call_insn. Use function_abi::mode_clobbers
> as the set of conflicting registers.
> (calls_have_same_clobbers_p): Delete.
> (process_bb_lives): Track the ABI of the last call instead of an
> insn/HARD_REG_SET pair. Update calls to
> check_pseudos_live_through_calls. Use eh_edge_abi to calculate
> the set of registers that could be clobbered by an EH edge.
> Include partially-clobbered as well as fully-clobbered registers.
> (lra_create_live_ranges_1): Don't initialize lra_reg::call_insn.
> * lra-remat.c: Include function-abi.h.
> (call_used_regs_arr_len, call_used_regs_arr): Delete.
> (set_bb_regs): Use call_insn_abi to get the set of call-clobbered
> registers and bitmap_view to combine them into dead_regs.
> (call_used_input_regno_present_p): Take a function_abi argument
> and use it to test whether a register is call-clobbered.
> (calculate_gen_cands): Use call_insn_abi to get the ABI of the
> call insn target. Update tje call to call_used_input_regno_present_p.
> (do_remat): Likewise.
> (lra_remat): Remove the initialization of call_used_regs_arr_len
> and call_used_regs_arr.
This caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
--
H.J.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [21/32] Remove global call sets: LRA
2019-09-11 19:14 ` [21/32] Remove global call sets: LRA Richard Sandiford
@ 2019-09-30 15:29 ` Jeff Law
2019-10-04 18:03 ` H.J. Lu
1 sibling, 0 replies; 8+ messages in thread
From: Jeff Law @ 2019-09-30 15:29 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 9/11/19 1:14 PM, Richard Sandiford wrote:
> lra_reg has an actual_call_used_reg_set field that is only used during
> inheritance. This in turn required a special lra_create_live_ranges
> pass for flag_ipa_ra to set up this field. This patch instead makes
> the inheritance code do its own live register tracking, using the
> same ABI-mask-and-clobber-set pair as for IRA.
>
> Tracking ABIs simplifies (and cheapens) the logic in lra-lives.c and
> means we no longer need a separate path for -fipa-ra. It also means
> we can remove TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.
>
> The patch also strengthens the sanity check in lra_assigns so that
> we check that reg_renumber is consistent with the whole conflict set,
> not just the call-clobbered registers.
>
>
> 2019-09-11 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * target.def (return_call_with_max_clobbers): Delete.
> * doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> * doc/tm.texi: Regenerate.
> * config/aarch64/aarch64.c (aarch64_return_call_with_max_clobbers)
> (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
> * lra-int.h (lra_reg::actual_call_used_reg_set): Delete.
> (lra_reg::call_insn): Delete.
> * lra.c: Include function-abi.h.
> (initialize_lra_reg_info_element): Don't initialize the fields above.
> (lra): Use crtl->abi to test whether the current function needs to
> save a register in the prologue. Remove special pre-inheritance
> lra_create_live_ranges pass for flag_ipa_ra.
> * lra-assigns.c: Include function-abi.h
> (find_hard_regno_for_1): Use crtl->abi to test whether the current
> function needs to save a register in the prologue.
> (lra_assign): Assert that registers aren't allocated to a
> conflicting register, rather than checking only for overlaps
> with call_used_or_fixed_regs. Do this even for flag_ipa_ra,
> and for registers that are not live across a call.
> * lra-constraints.c (last_call_for_abi): New variable.
> (full_and_partial_call_clobbers): Likewise.
> (setup_next_usage_insn): Remove the register from
> full_and_partial_call_clobbers.
> (need_for_call_save_p): Use call_clobbered_in_region_p to test
> whether the register needs a caller save.
> (need_for_split_p): Use full_and_partial_reg_clobbers instead
> of call_used_or_fixed_regs.
> (inherit_in_ebb): Initialize and maintain last_call_for_abi and
> full_and_partial_call_clobbers.
> * lra-lives.c (check_pseudos_live_through_calls): Replace
> last_call_used_reg_set and call_insn arguments with an abi argument.
> Remove handling of lra_reg::call_insn. Use function_abi::mode_clobbers
> as the set of conflicting registers.
> (calls_have_same_clobbers_p): Delete.
> (process_bb_lives): Track the ABI of the last call instead of an
> insn/HARD_REG_SET pair. Update calls to
> check_pseudos_live_through_calls. Use eh_edge_abi to calculate
> the set of registers that could be clobbered by an EH edge.
> Include partially-clobbered as well as fully-clobbered registers.
> (lra_create_live_ranges_1): Don't initialize lra_reg::call_insn.
> * lra-remat.c: Include function-abi.h.
> (call_used_regs_arr_len, call_used_regs_arr): Delete.
> (set_bb_regs): Use call_insn_abi to get the set of call-clobbered
> registers and bitmap_view to combine them into dead_regs.
> (call_used_input_regno_present_p): Take a function_abi argument
> and use it to test whether a register is call-clobbered.
> (calculate_gen_cands): Use call_insn_abi to get the ABI of the
> call insn target. Update tje call to call_used_input_regno_present_p.
> (do_remat): Likewise.
> (lra_remat): Remove the initialization of call_used_regs_arr_len
> and call_used_regs_arr.
OK
jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* [21/32] Remove global call sets: LRA
2019-09-11 19:02 [00/32] Support multiple ABIs in the same translation unit Richard Sandiford
@ 2019-09-11 19:14 ` Richard Sandiford
2019-09-30 15:29 ` Jeff Law
2019-10-04 18:03 ` H.J. Lu
0 siblings, 2 replies; 8+ messages in thread
From: Richard Sandiford @ 2019-09-11 19:14 UTC (permalink / raw)
To: gcc-patches
lra_reg has an actual_call_used_reg_set field that is only used during
inheritance. This in turn required a special lra_create_live_ranges
pass for flag_ipa_ra to set up this field. This patch instead makes
the inheritance code do its own live register tracking, using the
same ABI-mask-and-clobber-set pair as for IRA.
Tracking ABIs simplifies (and cheapens) the logic in lra-lives.c and
means we no longer need a separate path for -fipa-ra. It also means
we can remove TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.
The patch also strengthens the sanity check in lra_assigns so that
we check that reg_renumber is consistent with the whole conflict set,
not just the call-clobbered registers.
2019-09-11 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.def (return_call_with_max_clobbers): Delete.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
* doc/tm.texi: Regenerate.
* config/aarch64/aarch64.c (aarch64_return_call_with_max_clobbers)
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): Delete.
* lra-int.h (lra_reg::actual_call_used_reg_set): Delete.
(lra_reg::call_insn): Delete.
* lra.c: Include function-abi.h.
(initialize_lra_reg_info_element): Don't initialize the fields above.
(lra): Use crtl->abi to test whether the current function needs to
save a register in the prologue. Remove special pre-inheritance
lra_create_live_ranges pass for flag_ipa_ra.
* lra-assigns.c: Include function-abi.h
(find_hard_regno_for_1): Use crtl->abi to test whether the current
function needs to save a register in the prologue.
(lra_assign): Assert that registers aren't allocated to a
conflicting register, rather than checking only for overlaps
with call_used_or_fixed_regs. Do this even for flag_ipa_ra,
and for registers that are not live across a call.
* lra-constraints.c (last_call_for_abi): New variable.
(full_and_partial_call_clobbers): Likewise.
(setup_next_usage_insn): Remove the register from
full_and_partial_call_clobbers.
(need_for_call_save_p): Use call_clobbered_in_region_p to test
whether the register needs a caller save.
(need_for_split_p): Use full_and_partial_reg_clobbers instead
of call_used_or_fixed_regs.
(inherit_in_ebb): Initialize and maintain last_call_for_abi and
full_and_partial_call_clobbers.
* lra-lives.c (check_pseudos_live_through_calls): Replace
last_call_used_reg_set and call_insn arguments with an abi argument.
Remove handling of lra_reg::call_insn. Use function_abi::mode_clobbers
as the set of conflicting registers.
(calls_have_same_clobbers_p): Delete.
(process_bb_lives): Track the ABI of the last call instead of an
insn/HARD_REG_SET pair. Update calls to
check_pseudos_live_through_calls. Use eh_edge_abi to calculate
the set of registers that could be clobbered by an EH edge.
Include partially-clobbered as well as fully-clobbered registers.
(lra_create_live_ranges_1): Don't initialize lra_reg::call_insn.
* lra-remat.c: Include function-abi.h.
(call_used_regs_arr_len, call_used_regs_arr): Delete.
(set_bb_regs): Use call_insn_abi to get the set of call-clobbered
registers and bitmap_view to combine them into dead_regs.
(call_used_input_regno_present_p): Take a function_abi argument
and use it to test whether a register is call-clobbered.
(calculate_gen_cands): Use call_insn_abi to get the ABI of the
call insn target. Update tje call to call_used_input_regno_present_p.
(do_remat): Likewise.
(lra_remat): Remove the initialization of call_used_regs_arr_len
and call_used_regs_arr.
Index: gcc/target.def
===================================================================
--- gcc/target.def 2019-09-11 19:47:32.906202859 +0100
+++ gcc/target.def 2019-09-11 19:48:38.549740292 +0100
@@ -5786,20 +5786,6 @@ for targets that don't have partly call-
hook_bool_uint_uint_mode_false)
DEFHOOK
-(return_call_with_max_clobbers,
- "This hook returns a pointer to the call that partially clobbers the\n\
-most registers. If a platform supports multiple ABIs where the registers\n\
-that are partially clobbered may vary, this function compares two\n\
-calls and returns a pointer to the one that clobbers the most registers.\n\
-If both calls clobber the same registers, @var{call_1} must be returned.\n\
-\n\
-The registers clobbered in different ABIs must be a proper subset or\n\
-superset of all other ABIs. @var{call_1} must always be a call insn,\n\
-call_2 may be NULL or a call insn.",
- rtx_insn *, (rtx_insn *call_1, rtx_insn *call_2),
- NULL)
-
-DEFHOOK
(get_multilib_abi_name,
"This hook returns name of multilib ABI name.",
const char *, (void),
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in 2019-09-11 19:47:24.414262702 +0100
+++ gcc/doc/tm.texi.in 2019-09-11 19:48:38.545740321 +0100
@@ -1718,8 +1718,6 @@ must be defined. Modern ports should de
@cindex call-saved register
@hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
-@hook TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
-
@hook TARGET_GET_MULTILIB_ABI_NAME
@findex fixed_regs
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi 2019-09-11 19:47:32.898202916 +0100
+++ gcc/doc/tm.texi 2019-09-11 19:48:38.545740321 +0100
@@ -1941,18 +1941,6 @@ The default implementation returns false
for targets that don't have partly call-clobbered registers.
@end deftypefn
-@deftypefn {Target Hook} {rtx_insn *} TARGET_RETURN_CALL_WITH_MAX_CLOBBERS (rtx_insn *@var{call_1}, rtx_insn *@var{call_2})
-This hook returns a pointer to the call that partially clobbers the
-most registers. If a platform supports multiple ABIs where the registers
-that are partially clobbered may vary, this function compares two
-calls and returns a pointer to the one that clobbers the most registers.
-If both calls clobber the same registers, @var{call_1} must be returned.
-
-The registers clobbered in different ABIs must be a proper subset or
-superset of all other ABIs. @var{call_1} must always be a call insn,
-call_2 may be NULL or a call insn.
-@end deftypefn
-
@deftypefn {Target Hook} {const char *} TARGET_GET_MULTILIB_ABI_NAME (void)
This hook returns name of multilib ABI name.
@end deftypefn
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2019-09-11 19:47:32.858203198 +0100
+++ gcc/config/aarch64/aarch64.c 2019-09-11 19:48:38.541740349 +0100
@@ -1926,19 +1926,6 @@ aarch64_hard_regno_call_part_clobbered (
return false;
}
-/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS. */
-
-rtx_insn *
-aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
-{
- gcc_assert (CALL_P (call_1) && CALL_P (call_2));
-
- if (!aarch64_simd_call_p (call_1) || aarch64_simd_call_p (call_2))
- return call_1;
- else
- return call_2;
-}
-
/* Implement REGMODE_NATURAL_SIZE. */
poly_uint64
aarch64_regmode_natural_size (machine_mode mode)
@@ -20804,10 +20791,6 @@ #define TARGET_HARD_REGNO_CALL_PART_CLOB
#undef TARGET_CALL_INSN_ABI
#define TARGET_CALL_INSN_ABI aarch64_call_insn_abi
-#undef TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
-#define TARGET_RETURN_CALL_WITH_MAX_CLOBBERS \
- aarch64_return_call_with_max_clobbers
-
#undef TARGET_CONSTANT_ALIGNMENT
#define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
Index: gcc/lra-int.h
===================================================================
--- gcc/lra-int.h 2019-08-19 15:57:56.818306311 +0100
+++ gcc/lra-int.h 2019-09-11 19:48:38.545740321 +0100
@@ -73,10 +73,6 @@ struct lra_copy
/* The following fields are defined only for pseudos. */
/* Hard registers with which the pseudo conflicts. */
HARD_REG_SET conflict_hard_regs;
- /* Call used registers with which the pseudo conflicts, taking into account
- the registers used by functions called from calls which cross the
- pseudo. */
- HARD_REG_SET actual_call_used_reg_set;
/* We assign hard registers to reload pseudos which can occur in few
places. So two hard register preferences are enough for them.
The following fields define the preferred hard registers. If
@@ -104,8 +100,6 @@ struct lra_copy
int val;
/* Offset from relative eliminate register to pesudo reg. */
poly_int64 offset;
- /* Call instruction, if any, that may affect this psuedo reg. */
- rtx_insn *call_insn;
/* These members are set up in lra-lives.c and updated in
lra-coalesce.c. */
/* The biggest size mode in which each pseudo reg is referred in
Index: gcc/lra.c
===================================================================
--- gcc/lra.c 2019-09-10 19:56:45.357177891 +0100
+++ gcc/lra.c 2019-09-11 19:48:38.549740292 +0100
@@ -121,6 +121,7 @@ Software Foundation; either version 3, o
#include "lra.h"
#include "lra-int.h"
#include "print-rtl.h"
+#include "function-abi.h"
/* Dump bitmap SET with TITLE and BB INDEX. */
void
@@ -1323,7 +1324,6 @@ initialize_lra_reg_info_element (int i)
lra_reg_info[i].no_stack_p = false;
#endif
CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
- CLEAR_HARD_REG_SET (lra_reg_info[i].actual_call_used_reg_set);
lra_reg_info[i].preferred_hard_regno1 = -1;
lra_reg_info[i].preferred_hard_regno2 = -1;
lra_reg_info[i].preferred_hard_regno_profit1 = 0;
@@ -1336,7 +1336,6 @@ initialize_lra_reg_info_element (int i)
lra_reg_info[i].val = get_new_reg_value ();
lra_reg_info[i].offset = 0;
lra_reg_info[i].copies = NULL;
- lra_reg_info[i].call_insn = NULL;
}
/* Initialize common reg info and copies. */
@@ -2420,7 +2419,9 @@ lra (FILE *f)
if (crtl->saves_all_registers)
for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
- if (!call_used_or_fixed_reg_p (i) && !fixed_regs[i] && !LOCAL_REGNO (i))
+ if (!crtl->abi->clobbers_full_reg_p (i)
+ && !fixed_regs[i]
+ && !LOCAL_REGNO (i))
df_set_regs_ever_live (i, true);
/* We don't DF from now and avoid its using because it is to
@@ -2478,19 +2479,7 @@ lra (FILE *f)
}
/* Do inheritance only for regular algorithms. */
if (! lra_simple_p)
- {
- if (flag_ipa_ra)
- {
- if (live_p)
- lra_clear_live_ranges ();
- /* As a side-effect of lra_create_live_ranges, we calculate
- actual_call_used_reg_set, which is needed during
- lra_inheritance. */
- lra_create_live_ranges (true, true);
- live_p = true;
- }
- lra_inheritance ();
- }
+ lra_inheritance ();
if (live_p)
lra_clear_live_ranges ();
bool fails_p;
Index: gcc/lra-assigns.c
===================================================================
--- gcc/lra-assigns.c 2019-09-10 19:56:32.573268120 +0100
+++ gcc/lra-assigns.c 2019-09-11 19:48:38.545740321 +0100
@@ -94,6 +94,7 @@ Software Foundation; either version 3, o
#include "params.h"
#include "lra.h"
#include "lra-int.h"
+#include "function-abi.h"
/* Current iteration number of the pass and current iteration number
of the pass after the latest spill pass when any former reload
@@ -654,7 +655,7 @@ find_hard_regno_for_1 (int regno, int *c
for (j = 0;
j < hard_regno_nregs (hard_regno, PSEUDO_REGNO_MODE (regno));
j++)
- if (! TEST_HARD_REG_BIT (call_used_or_fixed_regs, hard_regno + j)
+ if (! crtl->abi->clobbers_full_reg_p (hard_regno + j)
&& ! df_regs_ever_live_p (hard_regno + j))
/* It needs save restore. */
hard_regno_costs[hard_regno]
@@ -1634,14 +1635,14 @@ lra_assign (bool &fails_p)
bitmap_initialize (&all_spilled_pseudos, ®_obstack);
create_live_range_start_chains ();
setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
- if (! lra_asm_error_p && flag_checking && !flag_ipa_ra)
+ if (! lra_asm_error_p && flag_checking)
/* Check correctness of allocation for call-crossed pseudos but
only when there are no asm errors as in the case of errors the
asm is removed and it can result in incorrect allocation. */
for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
- if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
- && lra_reg_info[i].call_insn
- && overlaps_hard_reg_set_p (call_used_or_fixed_regs,
+ if (lra_reg_info[i].nrefs != 0
+ && reg_renumber[i] >= 0
+ && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs,
PSEUDO_REGNO_MODE (i), reg_renumber[i]))
gcc_unreachable ();
/* Setup insns to process on the next constraint pass. */
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c 2019-09-11 19:47:32.898202916 +0100
+++ gcc/lra-constraints.c 2019-09-11 19:48:38.545740321 +0100
@@ -5147,6 +5147,14 @@ clear_invariants (void)
/* Number of calls passed so far in current EBB. */
static int calls_num;
+/* Index ID is the CALLS_NUM associated the last call we saw with
+ ABI identifier ID. */
+static int last_call_for_abi[NUM_ABI_IDS];
+
+/* Which registers have been fully or partially clobbered by a call
+ since they were last used. */
+static HARD_REG_SET full_and_partial_call_clobbers;
+
/* Current reload pseudo check for validity of elements in
USAGE_INSNS. */
static int curr_usage_insns_check;
@@ -5190,6 +5198,10 @@ setup_next_usage_insn (int regno, rtx in
usage_insns[regno].reloads_num = reloads_num;
usage_insns[regno].calls_num = calls_num;
usage_insns[regno].after_p = after_p;
+ if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
+ remove_from_hard_reg_set (&full_and_partial_call_clobbers,
+ PSEUDO_REGNO_MODE (regno),
+ reg_renumber[regno]);
}
/* The function is used to form list REGNO usages which consists of
@@ -5435,17 +5447,19 @@ inherit_reload_reg (bool def_p, int orig
need_for_call_save_p (int regno)
{
lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
- return (usage_insns[regno].calls_num < calls_num
- && (overlaps_hard_reg_set_p
- ((flag_ipa_ra &&
- ! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
- ? lra_reg_info[regno].actual_call_used_reg_set
- : call_used_or_fixed_regs,
- PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
- || (targetm.hard_regno_call_part_clobbered
- (lra_reg_info[regno].call_insn
- ? call_insn_abi (lra_reg_info[regno].call_insn).id () : 0,
- reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+ if (usage_insns[regno].calls_num < calls_num)
+ {
+ unsigned int abis = 0;
+ for (unsigned int i = 0; i < NUM_ABI_IDS; ++i)
+ if (last_call_for_abi[i] > usage_insns[regno].calls_num)
+ abis |= 1 << i;
+ gcc_assert (abis);
+ if (call_clobbered_in_region_p (abis, full_and_partial_call_clobbers,
+ PSEUDO_REGNO_MODE (regno),
+ reg_renumber[regno]))
+ return true;
+ }
+ return false;
}
/* Global registers occurring in the current EBB. */
@@ -5485,8 +5499,7 @@ need_for_split_p (HARD_REG_SET potential
true) the assign pass assumes that all pseudos living
through calls are assigned to call saved hard regs. */
&& (regno >= FIRST_PSEUDO_REGISTER
- || ! TEST_HARD_REG_BIT (call_used_or_fixed_regs, regno)
- || usage_insns[regno].calls_num == calls_num)
+ || !TEST_HARD_REG_BIT (full_and_partial_call_clobbers, regno))
/* We need at least 2 reloads to make pseudo splitting
profitable. We should provide hard regno splitting in
any case to solve 1st insn scheduling problem when
@@ -6238,6 +6251,9 @@ inherit_in_ebb (rtx_insn *head, rtx_insn
curr_usage_insns_check++;
clear_invariants ();
reloads_num = calls_num = 0;
+ for (unsigned int i = 0; i < NUM_ABI_IDS; ++i)
+ last_call_for_abi[i] = 0;
+ CLEAR_HARD_REG_SET (full_and_partial_call_clobbers);
bitmap_clear (&check_only_regs);
bitmap_clear (&invalid_invariant_regs);
last_processed_bb = NULL;
@@ -6451,6 +6467,10 @@ inherit_in_ebb (rtx_insn *head, rtx_insn
int regno, hard_regno;
calls_num++;
+ function_abi abi = call_insn_abi (curr_insn);
+ last_call_for_abi[abi.id ()] = calls_num;
+ full_and_partial_call_clobbers
+ |= abi.full_and_partial_reg_clobbers ();
if ((cheap = find_reg_note (curr_insn,
REG_RETURNED, NULL_RTX)) != NULL_RTX
&& ((cheap = XEXP (cheap, 0)), true)
@@ -6460,7 +6480,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn
/* If there are pending saves/restores, the
optimization is not worth. */
&& usage_insns[regno].calls_num == calls_num - 1
- && TEST_HARD_REG_BIT (call_used_or_fixed_regs, hard_regno))
+ && abi.clobbers_reg_p (GET_MODE (cheap), hard_regno))
{
/* Restore the pseudo from the call result as
REG_RETURNED note says that the pseudo value is
@@ -6483,6 +6503,9 @@ inherit_in_ebb (rtx_insn *head, rtx_insn
/* We don't need to save/restore of the pseudo from
this call. */
usage_insns[regno].calls_num = calls_num;
+ remove_from_hard_reg_set
+ (&full_and_partial_call_clobbers,
+ GET_MODE (cheap), hard_regno);
bitmap_set_bit (&check_only_regs, regno);
}
}
Index: gcc/lra-lives.c
===================================================================
--- gcc/lra-lives.c 2019-09-11 19:47:32.898202916 +0100
+++ gcc/lra-lives.c 2019-09-11 19:48:38.549740292 +0100
@@ -576,40 +576,21 @@ lra_setup_reload_pseudo_preferenced_hard
}
}
-/* Check that REGNO living through calls and setjumps, set up conflict
- regs using LAST_CALL_USED_REG_SET, and clear corresponding bits in
- PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.
- CALL_INSN is a call that is representative of all calls in the region
- described by the PSEUDOS_LIVE_THROUGH_* sets, in terms of the registers
- that it preserves and clobbers. */
+/* Check whether REGNO lives through calls and setjmps and clear
+ the corresponding bits in PSEUDOS_LIVE_THROUGH_CALLS and
+ PSEUDOS_LIVE_THROUGH_SETJUMPS. All calls in the region described
+ by PSEUDOS_LIVE_THROUGH_CALLS have the given ABI. */
static inline void
-check_pseudos_live_through_calls (int regno,
- HARD_REG_SET last_call_used_reg_set,
- rtx_insn *call_insn)
+check_pseudos_live_through_calls (int regno, const function_abi &abi)
{
- int hr;
- rtx_insn *old_call_insn;
-
if (! sparseset_bit_p (pseudos_live_through_calls, regno))
return;
- function_abi abi = call_insn_abi (call_insn);
- old_call_insn = lra_reg_info[regno].call_insn;
- if (!old_call_insn
- || (targetm.return_call_with_max_clobbers
- && targetm.return_call_with_max_clobbers (old_call_insn, call_insn)
- == call_insn))
- lra_reg_info[regno].call_insn = call_insn;
+ machine_mode mode = PSEUDO_REGNO_MODE (regno);
sparseset_clear_bit (pseudos_live_through_calls, regno);
- lra_reg_info[regno].conflict_hard_regs |= last_call_used_reg_set;
-
- for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
- if (targetm.hard_regno_call_part_clobbered (abi.id (), hr,
- PSEUDO_REGNO_MODE (regno)))
- add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
- PSEUDO_REGNO_MODE (regno), hr);
+ lra_reg_info[regno].conflict_hard_regs |= abi.mode_clobbers (mode);
if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
return;
sparseset_clear_bit (pseudos_live_through_setjumps, regno);
@@ -630,19 +611,6 @@ reg_early_clobber_p (const struct lra_in
&& TEST_BIT (reg->early_clobber_alts, n_alt)));
}
-/* Return true if call instructions CALL1 and CALL2 use ABIs that
- preserve the same set of registers. */
-
-static bool
-calls_have_same_clobbers_p (rtx_insn *call1, rtx_insn *call2)
-{
- if (!targetm.return_call_with_max_clobbers)
- return false;
-
- return (targetm.return_call_with_max_clobbers (call1, call2) == call1
- && targetm.return_call_with_max_clobbers (call2, call1) == call2);
-}
-
/* Process insns of the basic block BB to update pseudo live ranges,
pseudo hard register conflicts, and insn notes. We do it on
backward scan of BB insns. CURR_POINT is the program point where
@@ -662,15 +630,13 @@ process_bb_lives (basic_block bb, int &c
rtx_insn *next;
rtx link, *link_loc;
bool need_curr_point_incr;
- HARD_REG_SET last_call_used_reg_set;
- rtx_insn *call_insn = NULL;
- rtx_insn *last_call_insn = NULL;
+ /* Only has a meaningful value once we've seen a call. */
+ function_abi last_call_abi = default_function_abi;
reg_live_out = df_get_live_out (bb);
sparseset_clear (pseudos_live);
sparseset_clear (pseudos_live_through_calls);
sparseset_clear (pseudos_live_through_setjumps);
- CLEAR_HARD_REG_SET (last_call_used_reg_set);
REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
hard_regs_live &= ~eliminable_regset;
EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
@@ -876,9 +842,8 @@ process_bb_lives (basic_block bb, int &c
{
update_pseudo_point (reg->regno, curr_point, USE_POINT);
mark_regno_live (reg->regno, reg->biggest_mode);
- check_pseudos_live_through_calls (reg->regno,
- last_call_used_reg_set,
- call_insn);
+ /* ??? Should be a no-op for unused registers. */
+ check_pseudos_live_through_calls (reg->regno, last_call_abi);
}
if (!HARD_REGISTER_NUM_P (reg->regno))
@@ -927,37 +892,13 @@ process_bb_lives (basic_block bb, int &c
if (call_p)
{
- call_insn = curr_insn;
- if (! flag_ipa_ra && ! targetm.return_call_with_max_clobbers)
- last_call_used_reg_set = call_used_or_fixed_regs;
- else
- {
- HARD_REG_SET this_call_used_reg_set
- = call_insn_abi (curr_insn).full_reg_clobbers ();
- /* ??? This preserves traditional behavior; it might not
- be needed. */
- this_call_used_reg_set |= fixed_reg_set;
-
- bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
- && (last_call_used_reg_set
- != this_call_used_reg_set))
- || (last_call_insn && ! calls_have_same_clobbers_p
- (call_insn,
- last_call_insn));
+ function_abi call_abi = call_insn_abi (curr_insn);
- EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
- {
- lra_reg_info[j].actual_call_used_reg_set
- |= this_call_used_reg_set;
+ if (last_call_abi != call_abi)
+ EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
+ check_pseudos_live_through_calls (j, last_call_abi);
- if (flush)
- check_pseudos_live_through_calls (j,
- last_call_used_reg_set,
- last_call_insn);
- }
- last_call_used_reg_set = this_call_used_reg_set;
- last_call_insn = call_insn;
- }
+ last_call_abi = call_abi;
sparseset_ior (pseudos_live_through_calls,
pseudos_live_through_calls, pseudos_live);
@@ -995,9 +936,7 @@ process_bb_lives (basic_block bb, int &c
if (reg->type == OP_IN)
update_pseudo_point (reg->regno, curr_point, USE_POINT);
mark_regno_live (reg->regno, reg->biggest_mode);
- check_pseudos_live_through_calls (reg->regno,
- last_call_used_reg_set,
- call_insn);
+ check_pseudos_live_through_calls (reg->regno, last_call_abi);
}
for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1091,10 +1030,10 @@ process_bb_lives (basic_block bb, int &c
}
/* Pseudos can't go in stack regs at the start of a basic block that
- is reached by an abnormal edge. Likewise for call clobbered regs,
- because caller-save, fixup_abnormal_edges and possibly the table
- driven EH machinery are not quite ready to handle such pseudos
- live across such edges. */
+ is reached by an abnormal edge. Likewise for registers that are at
+ least partly call clobbered, because caller-save, fixup_abnormal_edges
+ and possibly the table driven EH machinery are not quite ready to
+ handle such pseudos live across such edges. */
if (bb_has_abnormal_pred (bb))
{
#ifdef STACK_REGS
@@ -1109,7 +1048,7 @@ process_bb_lives (basic_block bb, int &c
if (!cfun->has_nonlocal_label
&& has_abnormal_call_or_eh_pred_edge_p (bb))
for (px = 0; HARD_REGISTER_NUM_P (px); px++)
- if (call_used_or_fixed_reg_p (px)
+ if (eh_edge_abi.clobbers_at_least_part_of_reg_p (px)
#ifdef REAL_PIC_OFFSET_TABLE_REGNUM
/* We should create a conflict of PIC pseudo with PIC
hard reg as PIC hard reg can have a wrong value after
@@ -1166,7 +1105,7 @@ process_bb_lives (basic_block bb, int &c
if (sparseset_cardinality (pseudos_live_through_calls) == 0)
break;
if (sparseset_bit_p (pseudos_live_through_calls, j))
- check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
+ check_pseudos_live_through_calls (j, last_call_abi);
}
for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
@@ -1400,7 +1339,6 @@ lra_create_live_ranges_1 (bool all_p, bo
lra_reg_info[i].biggest_mode = GET_MODE (regno_reg_rtx[i]);
else
lra_reg_info[i].biggest_mode = VOIDmode;
- lra_reg_info[i].call_insn = NULL;
if (!HARD_REGISTER_NUM_P (i)
&& lra_reg_info[i].nrefs != 0)
{
Index: gcc/lra-remat.c
===================================================================
--- gcc/lra-remat.c 2019-09-10 19:56:45.357177891 +0100
+++ gcc/lra-remat.c 2019-09-11 19:48:38.549740292 +0100
@@ -65,16 +65,11 @@ Software Foundation; either version 3, o
#include "recog.h"
#include "lra.h"
#include "lra-int.h"
+#include "function-abi.h"
/* Number of candidates for rematerialization. */
static unsigned int cands_num;
-/* The following is used for representation of call_used_or_fixed_regs in
- form array whose elements are hard register numbers with nonzero bit
- in CALL_USED_OR_FIXED_REGS. */
-static int call_used_regs_arr_len;
-static int call_used_regs_arr[FIRST_PSEUDO_REGISTER];
-
/* Bitmap used for different calculations. */
static bitmap_head temp_bitmap;
@@ -633,9 +628,12 @@ set_bb_regs (basic_block bb, rtx_insn *i
bitmap_set_bit (&subreg_regs, regno);
}
if (CALL_P (insn))
- for (int i = 0; i < call_used_regs_arr_len; i++)
- bitmap_set_bit (&get_remat_bb_data (bb)->dead_regs,
- call_used_regs_arr[i]);
+ {
+ function_abi abi = call_insn_abi (insn);
+ /* Partially-clobbered registers might still be live. */
+ bitmap_ior_into (&get_remat_bb_data (bb)->dead_regs,
+ bitmap_view<HARD_REG_SET> (abi.full_reg_clobbers ()));
+ }
}
/* Calculate changed_regs and dead_regs for each BB. */
@@ -698,7 +696,7 @@ reg_overlap_for_remat_p (lra_insn_reg *r
/* Return true if a call used register is an input operand of INSN. */
static bool
-call_used_input_regno_present_p (rtx_insn *insn)
+call_used_input_regno_present_p (const function_abi &abi, rtx_insn *insn)
{
int iter;
lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
@@ -709,8 +707,9 @@ call_used_input_regno_present_p (rtx_ins
for (reg = (iter == 0 ? id->regs : static_id->hard_regs);
reg != NULL;
reg = reg->next)
- if (reg->type == OP_IN && reg->regno < FIRST_PSEUDO_REGISTER
- && TEST_HARD_REG_BIT (call_used_or_fixed_regs, reg->regno))
+ if (reg->type == OP_IN
+ && reg->regno < FIRST_PSEUDO_REGISTER
+ && abi.clobbers_reg_p (reg->biggest_mode, reg->regno))
return true;
return false;
}
@@ -799,18 +798,21 @@ calculate_gen_cands (void)
}
if (CALL_P (insn))
- EXECUTE_IF_SET_IN_BITMAP (gen_insns, 0, uid, bi)
- {
- rtx_insn *insn2 = lra_insn_recog_data[uid]->insn;
+ {
+ function_abi abi = call_insn_abi (insn);
+ EXECUTE_IF_SET_IN_BITMAP (gen_insns, 0, uid, bi)
+ {
+ rtx_insn *insn2 = lra_insn_recog_data[uid]->insn;
- cand = insn_to_cand[INSN_UID (insn2)];
- gcc_assert (cand != NULL);
- if (call_used_input_regno_present_p (insn2))
- {
- bitmap_clear_bit (gen_cands, cand->index);
- bitmap_set_bit (&temp_bitmap, uid);
- }
- }
+ cand = insn_to_cand[INSN_UID (insn2)];
+ gcc_assert (cand != NULL);
+ if (call_used_input_regno_present_p (abi, insn2))
+ {
+ bitmap_clear_bit (gen_cands, cand->index);
+ bitmap_set_bit (&temp_bitmap, uid);
+ }
+ }
+ }
bitmap_and_compl_into (gen_insns, &temp_bitmap);
cand = insn_to_cand[INSN_UID (insn)];
@@ -1205,13 +1207,16 @@ do_remat (void)
}
if (CALL_P (insn))
- EXECUTE_IF_SET_IN_BITMAP (avail_cands, 0, cid, bi)
- {
- cand = all_cands[cid];
+ {
+ function_abi abi = call_insn_abi (insn);
+ EXECUTE_IF_SET_IN_BITMAP (avail_cands, 0, cid, bi)
+ {
+ cand = all_cands[cid];
- if (call_used_input_regno_present_p (cand->insn))
- bitmap_set_bit (&temp_bitmap, cand->index);
- }
+ if (call_used_input_regno_present_p (abi, cand->insn))
+ bitmap_set_bit (&temp_bitmap, cand->index);
+ }
+ }
bitmap_and_compl_into (avail_cands, &temp_bitmap);
@@ -1307,10 +1312,6 @@ lra_remat (void)
insn_to_cand_activation = XCNEWVEC (cand_t, get_max_uid ());
regno_cands = XCNEWVEC (cand_t, max_regno);
all_cands.create (8000);
- call_used_regs_arr_len = 0;
- for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
- if (call_used_or_fixed_reg_p (i))
- call_used_regs_arr[call_used_regs_arr_len++] = i;
initiate_cand_table ();
create_remat_bb_data ();
bitmap_initialize (&temp_bitmap, ®_obstack);
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-10-07 6:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-06 8:45 [21/32] Remove global call sets: LRA Uros Bizjak
2019-10-06 14:32 ` Richard Sandiford
2019-10-07 6:04 ` Uros Bizjak
-- strict thread matches above, loose matches on Subject: below --
2019-09-11 19:02 [00/32] Support multiple ABIs in the same translation unit Richard Sandiford
2019-09-11 19:14 ` [21/32] Remove global call sets: LRA Richard Sandiford
2019-09-30 15:29 ` Jeff Law
2019-10-04 18:03 ` H.J. Lu
2019-10-04 21:52 ` H.J. Lu
2019-10-05 13:33 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).