From: Andrew Pinski <pinskia@gmail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Subject: Re: Add a new combine pass
Date: Mon, 18 Nov 2019 02:04:00 -0000 [thread overview]
Message-ID: <CA+=Sn1kz221_SyF5FdoX1JuydzgHsTifNQc2UpViVm59aF8HDw@mail.gmail.com> (raw)
In-Reply-To: <mpteey6xeip.fsf@arm.com>
On Sun, Nov 17, 2019 at 3:35 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> (It's 23:35 local time, so it's still just about stage 1. :-))
>
> While working on SVE, I've noticed several cases in which we fail
> to combine instructions because the combined form would need to be
> placed earlier in the instruction stream than the last of the
> instructions being combined. This includes one very important
> case in the handling of the first fault register (FFR).
>
> Combine currently requires the combined instruction to live at the same
> location as i3. I thought about trying to relax that restriction, but it
> would be difficult to do with the current pass structure while keeping
> everything linear-ish time.
>
> So this patch instead goes for an option that has been talked about
> several times over the years: writing a new combine pass that just
> does instruction combination, and not all the other optimisations
> that have been bolted onto combine over time. E.g. it deliberately
> doesn't do things like nonzero-bits tracking, since that really ought
> to be a separate, more global, optimisation.
>
> This is still far from being a realistic replacement for the even
> the combine parts of the current combine pass. E.g.:
>
> - it only handles combinations that can be built up from individual
> two-instruction combinations.
>
> - it doesn't allow new hard register clobbers to be added.
>
> - it doesn't have the special treatment of CC operations.
>
> - etc.
>
> But we have to start somewhere.
>
> On a more positive note, the pass handles things that the current
> combine pass doesn't:
>
> - the main motivating feature mentioned above: it works out where
> the combined instruction could validly live and moves it there
> if necessary. If there are a range of valid places, it tries
> to pick the best one based on register pressure (although only
> with a simple heuristic for now).
>
> - once it has combined two instructions, it can try combining the
> result with both later and earlier code, i.e. it can combine
> in both directions.
>
> - it tries using REG_EQUAL notes for the final instruction.
>
> - it can parallelise two independent instructions that both read from
> the same register or both read from memory.
>
> This last feature is useful for generating more load-pair combinations
> on AArch64. In some cases it can also produce more store-pair combinations,
> but only for consecutive stores. However, since the pass currently does
> this in a very greedy, peephole way, it only allows load/store-pair
> combinations if the first memory access has a higher alignment than
> the second, i.e. if we can be sure that the combined access is naturally
> aligned. This should help it to make better decisions than the post-RA
> peephole pass in some cases while not being too aggressive.
>
> The pass is supposed to be linear time without debug insns.
> It only tries a constant number C of combinations per instruction
> and its bookkeeping updates are constant-time. Once it has combined two
> instructions, it'll try up to C combinations on the result, but this can
> be counted against the instruction that was deleted by the combination
> and so effectively just doubles the constant. (Note that C depends
> on MAX_RECOG_OPERANDS and the new NUM_RANGE_USERS constant.)
>
> Unfortunately, debug updates via propagate_for_debug are more expensive.
> This could probably be fixed if the pass did more to track debug insns
> itself, but using propagate_for_debug matches combine's behaviour.
>
> The patch adds two instances of the new pass: one before combine and
> one after it. By default both are disabled, but this can be changed
> using the new 3-bit run-combine param, where:
>
> - bit 0 selects the new pre-combine pass
> - bit 1 selects the main combine pass
> - bit 2 selects the new post-combine pass
>
> The idea is that run-combine=3 can be used to see which combinations
> are missed by the new pass, while run-combine=6 (which I hope to be
> the production setting for AArch64 at -O2+) just uses the new pass
> to mop up cases that normal combine misses. Maybe in some distant
> future, the pass will be good enough for run-combine=[14] to be a
> realistic option.
>
> I ended up having to add yet another validate_simplify_* routine,
> this time to do the equivalent of:
>
> newx = simplify_replace_rtx (*loc, old_rtx, new_rtx);
> validate_change (insn, loc, newx, 1);
>
> but in a more memory-efficient way. validate_replace_rtx isn't suitable
> because it deliberately only tries simplifications in limited cases:
>
> /* Do changes needed to keep rtx consistent. Don't do any other
> simplifications, as it is not our job. */
>
> And validate_simplify_insn isn't useful for this case because it works
> on patterns that have already had changes made to them and expects
> those patterns to be valid rtxes. simplify-replace operations instead
> need to simplify as they go, when the original modes are still to hand.
>
> As far as compile-time goes, I tried compiling optabs.ii at -O2
> with an --enable-checking=release compiler:
>
> run-combine=2 (normal combine): 100.0% (baseline)
> run-combine=4 (new pass only) 98.0%
> run-combine=6 (both passes) 100.3%
>
> where the results are easily outside the noise. So the pass on
> its own is quicker than combine, but that's not a fair comparison
> when it doesn't do everything combine does. Running both passes
> only has a slight overhead.
>
> To get a feel for the effect on multiple targets, I did my usual
> bogo-comparison of number of lines of asm for gcc.c-torture, gcc.dg
> and g++.dg, this time comparing run-combine=2 and run-combine=6
> using -O2 -ftree-vectorize:
>
> Target Tests Delta Best Worst Median
> ====== ===== ===== ==== ===== ======
> aarch64-linux-gnu 3974 -39393 -2275 90 -2
> aarch64_be-linux-gnu 3389 -36683 -2275 165 -2
> alpha-linux-gnu 4154 -62860 -2132 335 -2
> amdgcn-amdhsa 4818 9079 -7987 51850 -2
> arc-elf 2868 -63710 -18998 286 -1
> arm-linux-gnueabi 4053 -80404 -10019 605 -2
> arm-linux-gnueabihf 4053 -80404 -10019 605 -2
> avr-elf 3620 38513 -2386 23364 2
> bfin-elf 2691 -32973 -1483 1127 -2
> bpf-elf 5581 -78105 -11064 113 -3
> c6x-elf 3915 -31710 -2441 1560 -2
> cr16-elf 6030 192102 -1757 60009 12
> cris-elf 2217 -30794 -1716 294 -2
> csky-elf 2003 -24989 -9999 1468 -2
> epiphany-elf 3345 -19416 -1803 4594 -2
> fr30-elf 3562 -15077 -1921 2334 -1
> frv-linux-gnu 2423 -16589 -1736 999 -1
> ft32-elf 2246 -46337 -15988 433 -2
> h8300-elf 2581 -33553 -1403 168 -2
> hppa64-hp-hpux11.23 3926 -120876 -50134 1056 -2
> i686-apple-darwin 3562 -46851 -1764 310 -2
> i686-pc-linux-gnu 2902 -3639 -4809 6848 -2
> ia64-linux-gnu 2900 -158870 -14006 428 -7
> iq2000-elf 2929 -54690 -2904 2576 -3
> lm32-elf 5265 162519 -1918 8004 5
> m32r-elf 1861 -25296 -2713 1004 -2
> m68k-linux-gnu 2520 -241573 -21879 200 -3
> mcore-elf 2378 -28532 -1810 1635 -2
> microblaze-elf 2782 -137363 -9516 1986 -2
> mipsel-linux-gnu 2443 -38422 -8331 458 -1
> mipsisa64-linux-gnu 2287 -60294 -12214 432 -2
> mmix 4910 -136549 -13616 599 -2
> mn10300-elf 2944 -29151 -2488 132 -1
> moxie-rtems 1935 -12364 -1002 125 -1
> msp430-elf 2379 -37007 -2163 176 -2
> nds32le-elf 2356 -27551 -2126 163 -1
> nios2-linux-gnu 1572 -44828 -23613 92 -2
> nvptx-none 1014 -17337 -1590 16 -3
> or1k-elf 2724 -92816 -14144 56 -3
> pdp11 1897 -27296 -1370 534 -2
> powerpc-ibm-aix7.0 2909 -58829 -10026 2001 -2
> powerpc64-linux-gnu 3685 -60551 -12158 2001 -1
> powerpc64le-linux-gnu 3501 -61846 -10024 765 -2
> pru-elf 1574 -29734 -19998 1718 -1
> riscv32-elf 2357 -22506 -10002 10175 -1
> riscv64-elf 3320 -56777 -10002 226 -2
> rl78-elf 2113 -232328 -18607 4065 -3
> rx-elf 2800 -38515 -896 491 -2
> s390-linux-gnu 3582 -75626 -12098 3999 -2
> s390x-linux-gnu 3761 -73473 -13748 3999 -2
> sh-linux-gnu 2350 -26401 -1003 522 -2
> sparc-linux-gnu 3279 -49518 -2175 2223 -2
> sparc64-linux-gnu 3849 -123084 -30200 2141 -2
> tilepro-linux-gnu 2737 -35562 -3458 2848 -2
> v850-elf 9002 -169126 -49996 76 -4
> vax-netbsdelf 3325 -57734 -10000 1989 -2
> visium-elf 1860 -17006 -1006 1066 -2
> x86_64-darwin 3278 -48933 -9999 1408 -2
> x86_64-linux-gnu 3008 -43887 -9999 3248 -2
> xstormy16-elf 2497 -26569 -2051 89 -2
> xtensa-elf 2161 -31231 -6910 138 -2
>
> So running both passes does seem to have a significant benefit
> on most targets, but there are some nasty-looking outliers.
> The usual caveat applies: number of lines is a very poor measurement,
> it's just to get a feel.
>
> Bootstrapped & regression-tested on aarch64-linux-gnu and
> x86_64-linux-gnu with both run-combine=3 as the default (so that the new
> pass runs first) and with run-combine=6 as the default (so that the new
> pass runs second). There were no new execution failures. A couple of
> guality.exp tests that already failed for most options started failing
> for a couple more. Enabling the pass fixes the XFAILs in:
>
> gcc.target/aarch64/sve/acle/general/ptrue_pat_[234].c
>
> Inevitably there was some scan-assembler fallout for other tests.
> E.g. in gcc.target/aarch64/vmov_n_1.c:
>
> #define INHIB_OPTIMIZATION asm volatile ("" : : : "memory")
> ...
> INHIB_OPTIMIZATION; \
> (a) = TEST (test, data_len); \
> INHIB_OPTIMIZATION; \
> (b) = VMOV_OBSCURE_INST (reg_len, data_len, data_type) (&(a)); \
>
> is no longer effective for preventing move (a) from being merged
> into (b), because the pass can merge at the point of (a). I think
> this is a valid thing to do -- the asm semantics are still satisfied,
> and asm volatile ("" : : : "memory") never acted as a register barrier.
> But perhaps we should deal with this as a special case?
Not really. I think the testcase should be changed to:
INHIB_OPT_VAR(a)
instead.
Where INHIB_OPT_VAR should be:
#define INHIB_OPT_VAR(a) asm("":"+X"(a));
Since it is obviously not doing the correct testing in the first
place. Even then, this testcase is huge and really should be broken
up into different testcases.
Thanks,
Andrew
>
> Richard
>
>
> 2019-11-17 Richard Sandiford <richard.sandiford@arm.com>
>
> gcc/
> * Makefile.in (OBJS): Add combine2.o
> * params.opt (--param=run-combine): New option.
> * doc/invoke.texi: Document it.
> * tree-pass.h (make_pass_combine2_before): Declare.
> (make_pass_combine2_after): Likewise.
> * passes.def: Add them.
> * timevar.def (TV_COMBINE2): New timevar.
> * cfgrtl.h (update_cfg_for_uncondjump): Declare.
> * combine.c (update_cfg_for_uncondjump): Move to...
> * cfgrtl.c (update_cfg_for_uncondjump): ...here.
> * simplify-rtx.c (simplify_truncation): Handle comparisons.
> * recog.h (validate_simplify_replace_rtx): Declare.
> * recog.c (validate_simplify_replace_rtx_1): New function.
> (validate_simplify_replace_rtx_uses): Likewise.
> (validate_simplify_replace_rtx): Likewise.
> * combine2.c: New file.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in 2019-11-14 14:34:27.599783740 +0000
> +++ gcc/Makefile.in 2019-11-17 23:15:31.188500613 +0000
> @@ -1261,6 +1261,7 @@ OBJS = \
> cgraphunit.o \
> cgraphclones.o \
> combine.o \
> + combine2.o \
> combine-stack-adj.o \
> compare-elim.o \
> context.o \
> Index: gcc/params.opt
> ===================================================================
> --- gcc/params.opt 2019-11-14 14:34:26.339792215 +0000
> +++ gcc/params.opt 2019-11-17 23:15:31.200500531 +0000
> @@ -768,6 +768,10 @@ Use internal function id in profile look
> Common Joined UInteger Var(param_rpo_vn_max_loop_depth) Init(7) IntegerRange(2, 65536) Param
> Maximum depth of a loop nest to fully value-number optimistically.
>
> +-param=run-combine=
> +Target Joined UInteger Var(param_run_combine) Init(2) IntegerRange(0, 7) Param
> +Choose which of the 3 available combine passes to run: bit 1 for the main combine pass, bit 0 for an earlier variant of the combine pass, and bit 2 for a later variant of the combine pass.
> +
> -param=sccvn-max-alias-queries-per-access=
> Common Joined UInteger Var(param_sccvn_max_alias_queries_per_access) Init(1000) Param
> Maximum number of disambiguations to perform per memory access.
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi 2019-11-16 10:43:45.597105823 +0000
> +++ gcc/doc/invoke.texi 2019-11-17 23:15:31.200500531 +0000
> @@ -11807,6 +11807,11 @@ in combiner for a pseudo register as las
> @item max-combine-insns
> The maximum number of instructions the RTL combiner tries to combine.
>
> +@item run-combine
> +Choose which of the 3 available combine passes to run: bit 1 for the main
> +combine pass, bit 0 for an earlier variant of the combine pass, and bit 2
> +for a later variant of the combine pass.
> +
> @item integer-share-limit
> Small integer constants can use a shared data structure, reducing the
> compiler's memory usage and increasing its speed. This sets the maximum
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h 2019-10-29 08:29:03.096444049 +0000
> +++ gcc/tree-pass.h 2019-11-17 23:15:31.204500501 +0000
> @@ -562,7 +562,9 @@ extern rtl_opt_pass *make_pass_reginfo_i
> extern rtl_opt_pass *make_pass_inc_dec (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_stack_ptr_mod (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_initialize_regs (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_combine2_before (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_combine (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_combine2_after (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_if_after_combine (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_jump_after_combine (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_ree (gcc::context *ctxt);
> Index: gcc/passes.def
> ===================================================================
> --- gcc/passes.def 2019-10-29 08:29:03.224443133 +0000
> +++ gcc/passes.def 2019-11-17 23:15:31.200500531 +0000
> @@ -437,7 +437,9 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_inc_dec);
> NEXT_PASS (pass_initialize_regs);
> NEXT_PASS (pass_ud_rtl_dce);
> + NEXT_PASS (pass_combine2_before);
> NEXT_PASS (pass_combine);
> + NEXT_PASS (pass_combine2_after);
> NEXT_PASS (pass_if_after_combine);
> NEXT_PASS (pass_jump_after_combine);
> NEXT_PASS (pass_partition_blocks);
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def 2019-10-11 15:43:53.403498517 +0100
> +++ gcc/timevar.def 2019-11-17 23:15:31.204500501 +0000
> @@ -251,6 +251,7 @@ DEFTIMEVAR (TV_AUTO_INC_DEC , "
> DEFTIMEVAR (TV_CSE2 , "CSE 2")
> DEFTIMEVAR (TV_BRANCH_PROB , "branch prediction")
> DEFTIMEVAR (TV_COMBINE , "combiner")
> +DEFTIMEVAR (TV_COMBINE2 , "second combiner")
> DEFTIMEVAR (TV_IFCVT , "if-conversion")
> DEFTIMEVAR (TV_MODE_SWITCH , "mode switching")
> DEFTIMEVAR (TV_SMS , "sms modulo scheduling")
> Index: gcc/cfgrtl.h
> ===================================================================
> --- gcc/cfgrtl.h 2019-03-08 18:15:39.320730391 +0000
> +++ gcc/cfgrtl.h 2019-11-17 23:15:31.192500584 +0000
> @@ -47,6 +47,7 @@ extern void fixup_partitions (void);
> extern bool purge_dead_edges (basic_block);
> extern bool purge_all_dead_edges (void);
> extern bool fixup_abnormal_edges (void);
> +extern void update_cfg_for_uncondjump (rtx_insn *);
> extern rtx_insn *unlink_insn_chain (rtx_insn *, rtx_insn *);
> extern void relink_block_chain (bool);
> extern rtx_insn *duplicate_insn_chain (rtx_insn *, rtx_insn *);
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c 2019-11-13 08:42:45.537368745 +0000
> +++ gcc/combine.c 2019-11-17 23:15:31.192500584 +0000
> @@ -2530,42 +2530,6 @@ reg_subword_p (rtx x, rtx reg)
> && GET_MODE_CLASS (GET_MODE (x)) == MODE_INT;
> }
>
> -/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
> - Note that the INSN should be deleted *after* removing dead edges, so
> - that the kept edge is the fallthrough edge for a (set (pc) (pc))
> - but not for a (set (pc) (label_ref FOO)). */
> -
> -static void
> -update_cfg_for_uncondjump (rtx_insn *insn)
> -{
> - basic_block bb = BLOCK_FOR_INSN (insn);
> - gcc_assert (BB_END (bb) == insn);
> -
> - purge_dead_edges (bb);
> -
> - delete_insn (insn);
> - if (EDGE_COUNT (bb->succs) == 1)
> - {
> - rtx_insn *insn;
> -
> - single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
> -
> - /* Remove barriers from the footer if there are any. */
> - for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
> - if (BARRIER_P (insn))
> - {
> - if (PREV_INSN (insn))
> - SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
> - else
> - BB_FOOTER (bb) = NEXT_INSN (insn);
> - if (NEXT_INSN (insn))
> - SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
> - }
> - else if (LABEL_P (insn))
> - break;
> - }
> -}
> -
> /* Return whether PAT is a PARALLEL of exactly N register SETs followed
> by an arbitrary number of CLOBBERs. */
> static bool
> @@ -15096,7 +15060,10 @@ const pass_data pass_data_combine =
> {}
>
> /* opt_pass methods: */
> - virtual bool gate (function *) { return (optimize > 0); }
> + virtual bool gate (function *)
> + {
> + return optimize > 0 && (param_run_combine & 2) != 0;
> + }
> virtual unsigned int execute (function *)
> {
> return rest_of_handle_combine ();
> Index: gcc/cfgrtl.c
> ===================================================================
> --- gcc/cfgrtl.c 2019-10-17 14:22:55.523309009 +0100
> +++ gcc/cfgrtl.c 2019-11-17 23:15:31.188500613 +0000
> @@ -3409,6 +3409,42 @@ fixup_abnormal_edges (void)
> return inserted;
> }
>
> +/* Delete the unconditional jump INSN and adjust the CFG correspondingly.
> + Note that the INSN should be deleted *after* removing dead edges, so
> + that the kept edge is the fallthrough edge for a (set (pc) (pc))
> + but not for a (set (pc) (label_ref FOO)). */
> +
> +void
> +update_cfg_for_uncondjump (rtx_insn *insn)
> +{
> + basic_block bb = BLOCK_FOR_INSN (insn);
> + gcc_assert (BB_END (bb) == insn);
> +
> + purge_dead_edges (bb);
> +
> + delete_insn (insn);
> + if (EDGE_COUNT (bb->succs) == 1)
> + {
> + rtx_insn *insn;
> +
> + single_succ_edge (bb)->flags |= EDGE_FALLTHRU;
> +
> + /* Remove barriers from the footer if there are any. */
> + for (insn = BB_FOOTER (bb); insn; insn = NEXT_INSN (insn))
> + if (BARRIER_P (insn))
> + {
> + if (PREV_INSN (insn))
> + SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn);
> + else
> + BB_FOOTER (bb) = NEXT_INSN (insn);
> + if (NEXT_INSN (insn))
> + SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn);
> + }
> + else if (LABEL_P (insn))
> + break;
> + }
> +}
> +
> /* Cut the insns from FIRST to LAST out of the insns stream. */
>
> rtx_insn *
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c 2019-11-16 15:33:36.642840131 +0000
> +++ gcc/simplify-rtx.c 2019-11-17 23:15:31.204500501 +0000
> @@ -851,6 +851,12 @@ simplify_truncation (machine_mode mode,
> && trunc_int_for_mode (INTVAL (XEXP (op, 1)), mode) == -1)
> return constm1_rtx;
>
> + /* (truncate:A (cmp X Y)) is (cmp:A X Y): we can compute the result
> + in a narrower mode if useful. */
> + if (COMPARISON_P (op))
> + return simplify_gen_relational (GET_CODE (op), mode, VOIDmode,
> + XEXP (op, 0), XEXP (op, 1));
> +
> return NULL_RTX;
> }
>
> Index: gcc/recog.h
> ===================================================================
> --- gcc/recog.h 2019-09-09 18:58:28.860430363 +0100
> +++ gcc/recog.h 2019-11-17 23:15:31.204500501 +0000
> @@ -111,6 +111,7 @@ extern int validate_replace_rtx_part_nos
> extern void validate_replace_rtx_group (rtx, rtx, rtx_insn *);
> extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
> extern bool validate_simplify_insn (rtx_insn *insn);
> +extern bool validate_simplify_replace_rtx (rtx_insn *, rtx *, rtx, rtx);
> extern int num_changes_pending (void);
> extern int next_insn_tests_no_inequality (rtx_insn *);
> extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
> Index: gcc/recog.c
> ===================================================================
> --- gcc/recog.c 2019-10-01 09:55:35.150088599 +0100
> +++ gcc/recog.c 2019-11-17 23:15:31.204500501 +0000
> @@ -922,6 +922,226 @@ validate_simplify_insn (rtx_insn *insn)
> }
> return ((num_changes_pending () > 0) && (apply_change_group () > 0));
> }
> +
> +/* A subroutine of validate_simplify_replace_rtx. Apply the replacement
> + described by R to LOC. Return true on success; leave the caller
> + to clean up on failure. */
> +
> +static bool
> +validate_simplify_replace_rtx_1 (validate_replace_src_data &r, rtx *loc)
> +{
> + rtx x = *loc;
> + enum rtx_code code = GET_CODE (x);
> + machine_mode mode = GET_MODE (x);
> +
> + if (rtx_equal_p (x, r.from))
> + {
> + validate_unshare_change (r.insn, loc, r.to, 1);
> + return true;
> + }
> +
> + /* Recursively apply the substitution and see if we can simplify
> + the result. This specifically shouldn't use simplify_gen_*,
> + since we want to avoid generating new expressions where possible. */
> + int old_num_changes = num_validated_changes ();
> + rtx newx = NULL_RTX;
> + bool recurse_p = false;
> + switch (GET_RTX_CLASS (code))
> + {
> + case RTX_UNARY:
> + {
> + machine_mode op0_mode = GET_MODE (XEXP (x, 0));
> + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0)))
> + return false;
> +
> + newx = simplify_unary_operation (code, mode, XEXP (x, 0), op0_mode);
> + break;
> + }
> +
> + case RTX_BIN_ARITH:
> + case RTX_COMM_ARITH:
> + {
> + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> + return false;
> +
> + newx = simplify_binary_operation (code, mode,
> + XEXP (x, 0), XEXP (x, 1));
> + break;
> + }
> +
> + case RTX_COMPARE:
> + case RTX_COMM_COMPARE:
> + {
> + machine_mode op_mode = (GET_MODE (XEXP (x, 0)) != VOIDmode
> + ? GET_MODE (XEXP (x, 0))
> + : GET_MODE (XEXP (x, 1)));
> + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> + return false;
> +
> + newx = simplify_relational_operation (code, mode, op_mode,
> + XEXP (x, 0), XEXP (x, 1));
> + break;
> + }
> +
> + case RTX_TERNARY:
> + case RTX_BITFIELD_OPS:
> + {
> + machine_mode op0_mode = GET_MODE (XEXP (x, 0));
> + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1))
> + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 2)))
> + return false;
> +
> + newx = simplify_ternary_operation (code, mode, op0_mode,
> + XEXP (x, 0), XEXP (x, 1),
> + XEXP (x, 2));
> + break;
> + }
> +
> + case RTX_EXTRA:
> + if (code == SUBREG)
> + {
> + machine_mode inner_mode = GET_MODE (SUBREG_REG (x));
> + if (!validate_simplify_replace_rtx_1 (r, &SUBREG_REG (x)))
> + return false;
> +
> + rtx inner = SUBREG_REG (x);
> + newx = simplify_subreg (mode, inner, inner_mode, SUBREG_BYTE (x));
> + /* Reject the same cases that simplify_gen_subreg would. */
> + if (!newx
> + && (GET_CODE (inner) == SUBREG
> + || GET_CODE (inner) == CONCAT
> + || GET_MODE (inner) == VOIDmode
> + || !validate_subreg (mode, inner_mode,
> + inner, SUBREG_BYTE (x))))
> + return false;
> + break;
> + }
> + else
> + recurse_p = true;
> + break;
> +
> + case RTX_OBJ:
> + if (code == LO_SUM)
> + {
> + if (!validate_simplify_replace_rtx_1 (r, &XEXP (x, 0))
> + || !validate_simplify_replace_rtx_1 (r, &XEXP (x, 1)))
> + return false;
> +
> + /* (lo_sum (high x) y) -> y where x and y have the same base. */
> + rtx op0 = XEXP (x, 0);
> + rtx op1 = XEXP (x, 1);
> + if (GET_CODE (op0) == HIGH)
> + {
> + rtx base0, base1, offset0, offset1;
> + split_const (XEXP (op0, 0), &base0, &offset0);
> + split_const (op1, &base1, &offset1);
> + if (rtx_equal_p (base0, base1))
> + newx = op1;
> + }
> + }
> + else if (code == REG)
> + {
> + if (REG_P (r.from) && reg_overlap_mentioned_p (x, r.from))
> + return false;
> + }
> + else
> + recurse_p = true;
> + break;
> +
> + case RTX_CONST_OBJ:
> + break;
> +
> + case RTX_AUTOINC:
> + if (reg_overlap_mentioned_p (XEXP (x, 0), r.from))
> + return false;
> + recurse_p = true;
> + break;
> +
> + case RTX_MATCH:
> + case RTX_INSN:
> + gcc_unreachable ();
> + }
> +
> + if (recurse_p)
> + {
> + const char *fmt = GET_RTX_FORMAT (code);
> + for (int i = 0; fmt[i]; i++)
> + switch (fmt[i])
> + {
> + case 'E':
> + for (int j = 0; j < XVECLEN (x, i); j++)
> + if (!validate_simplify_replace_rtx_1 (r, &XVECEXP (x, i, j)))
> + return false;
> + break;
> +
> + case 'e':
> + if (XEXP (x, i)
> + && !validate_simplify_replace_rtx_1 (r, &XEXP (x, i)))
> + return false;
> + break;
> + }
> + }
> +
> + if (newx && !rtx_equal_p (x, newx))
> + {
> + /* There's no longer any point unsharing the substitutions made
> + for subexpressions, since we'll just copy this one instead. */
> + for (int i = old_num_changes; i < num_changes; ++i)
> + changes[i].unshare = false;
> + validate_unshare_change (r.insn, loc, newx, 1);
> + }
> +
> + return true;
> +}
> +
> +/* A note_uses callback for validate_simplify_replace_rtx.
> + DATA points to a validate_replace_src_data object. */
> +
> +static void
> +validate_simplify_replace_rtx_uses (rtx *loc, void *data)
> +{
> + validate_replace_src_data &r = *(validate_replace_src_data *) data;
> + if (r.insn && !validate_simplify_replace_rtx_1 (r, loc))
> + r.insn = NULL;
> +}
> +
> +/* Try to perform the equivalent of:
> +
> + newx = simplify_replace_rtx (*loc, OLD_RTX, NEW_RTX);
> + validate_change (INSN, LOC, newx, 1);
> +
> + but without generating as much garbage rtl when the resulting
> + pattern doesn't match.
> +
> + Return true if we were able to replace all uses of OLD_RTX in *LOC
> + and if the result conforms to general rtx rules (e.g. for whether
> + subregs are meaningful).
> +
> + When returning true, add all replacements to the current validation group,
> + leaving the caller to test it in the normal way. Leave both *LOC and the
> + validation group unchanged on failure. */
> +
> +bool
> +validate_simplify_replace_rtx (rtx_insn *insn, rtx *loc,
> + rtx old_rtx, rtx new_rtx)
> +{
> + validate_replace_src_data r;
> + r.from = old_rtx;
> + r.to = new_rtx;
> + r.insn = insn;
> +
> + unsigned int num_changes = num_validated_changes ();
> + note_uses (loc, validate_simplify_replace_rtx_uses, &r);
> + if (!r.insn)
> + {
> + cancel_changes (num_changes);
> + return false;
> + }
> + return true;
> +}
>
> /* Return 1 if the insn using CC0 set by INSN does not contain
> any ordered tests applied to the condition codes.
> Index: gcc/combine2.c
> ===================================================================
> --- /dev/null 2019-09-17 11:41:18.176664108 +0100
> +++ gcc/combine2.c 2019-11-17 23:15:31.196500559 +0000
> @@ -0,0 +1,1576 @@
> +/* Combine instructions
> + Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "print-rtl.h"
> +#include "rtl-iter.h"
> +#include "predict.h"
> +#include "cfgcleanup.h"
> +#include "cfghooks.h"
> +#include "cfgrtl.h"
> +#include "alias.h"
> +#include "valtrack.h"
> +
> +/* This pass tries to combine instructions in the following ways:
> +
> + (1) If we have two dependent instructions:
> +
> + I1: (set DEST1 SRC1)
> + I2: (...DEST1...)
> +
> + and I2 is the only user of DEST1, the pass tries to combine them into:
> +
> + I2: (...SRC1...)
> +
> + (2) If we have two dependent instructions:
> +
> + I1: (set DEST1 SRC1)
> + I2: (...DEST1...)
> +
> + the pass tries to combine them into:
> +
> + I2: (parallel [(set DEST1 SRC1) (...SRC1...)])
> +
> + or:
> +
> + I2: (parallel [(...SRC1...) (set DEST1 SRC1)])
> +
> + (3) If we have two independent instructions:
> +
> + I1: (set DEST1 SRC1)
> + I2: (set DEST2 SRC2)
> +
> + that read from memory or from the same register, the pass tries to
> + combine them into:
> +
> + I2: (parallel [(set DEST1 SRC1) (set DEST2 SRC2)])
> +
> + or:
> +
> + I2: (parallel [(set DEST2 SRC2) (set DEST1 SRC1)])
> +
> + If the combined form is a valid instruction, the pass tries to find a
> + place between I1 and I2 inclusive for the new instruction. If there
> + are multiple valid locations, it tries to pick the best one by taking
> + the effect on register pressure into account.
> +
> + If a combination succeeds and produces a single set, the pass tries to
> + combine the new form with earlier or later instructions.
> +
> + The pass currently optimizes each basic block separately. It walks
> + the instructions in reverse order, building up live ranges for registers
> + and memory. It then uses these live ranges to look for possible
> + combination opportunities and to decide where the combined instructions
> + could be placed.
> +
> + The pass represents positions in the block using point numbers,
> + with higher numbers indicating earlier instructions. The numbering
> + scheme is that:
> +
> + - the end of the current instruction sequence has an even base point B.
> +
> + - instructions initially have odd-numbered points B + 1, B + 3, etc.
> + with B + 1 being the final instruction in the sequence.
> +
> + - even points after B represent gaps between instructions where combined
> + instructions could be placed.
> +
> + Thus even points initially represent no instructions and odd points
> + initially represent single instructions. However, when picking a
> + place for a combined instruction, the pass may choose somewhere
> + inbetween the original two instructions, so that over time a point
> + may come to represent several instructions. When this happens,
> + the pass maintains the invariant that all instructions with the same
> + point number are independent of each other and thus can be treated as
> + acting in parallel (or as acting in any arbitrary sequence).
> +
> + TODOs:
> +
> + - Handle 3-instruction combinations, and possibly more.
> +
> + - Handle existing clobbers more efficiently. At the moment we can't
> + move an instruction that clobbers R across another instruction that
> + clobbers R.
> +
> + - Allow hard register clobbers to be added, like combine does.
> +
> + - Perhaps work on EBBs, or SESE regions. */
> +
> +namespace {
> +
> +/* The number of explicit uses to record in a live range. */
> +const unsigned int NUM_RANGE_USERS = 4;
> +
> +/* The maximum number of instructions that we can combine at once. */
> +const unsigned int MAX_COMBINE_INSNS = 2;
> +
> +/* A fake cost for instructions that we haven't costed yet. */
> +const unsigned int UNKNOWN_COST = ~0U;
> +
> +class combine2
> +{
> +public:
> + combine2 (function *);
> + ~combine2 ();
> +
> + void execute ();
> +
> +private:
> + struct insn_info_rec;
> +
> + /* Describes the live range of a register or of memory. For simplicity,
> + we treat memory as a single entity.
> +
> + If we had a fully-accurate live range, updating it to account for a
> + moved instruction would be a linear-time operation. Doing this for
> + each combination would then make the pass quadratic. We therefore
> + just maintain a list of NUM_RANGE_USERS use insns and use simple,
> + conservatively-correct behavior for the rest. */
> + struct live_range_rec
> + {
> + /* Which instruction provides the dominating definition, or null if
> + we don't know yet. */
> + insn_info_rec *producer;
> +
> + /* A selection of instructions that use the resource, in program order. */
> + insn_info_rec *users[NUM_RANGE_USERS];
> +
> + /* An inclusive range of points that covers instructions not mentioned
> + in USERS. Both values are zero if there are no such instructions.
> +
> + Once we've included a use U at point P in this range, we continue
> + to assume that some kind of use exists at P whatever happens to U
> + afterwards. */
> + unsigned int first_extra_use;
> + unsigned int last_extra_use;
> +
> + /* The register number this range describes, or INVALID_REGNUM
> + for memory. */
> + unsigned int regno;
> +
> + /* Forms a linked list of ranges for the same resource, in program
> + order. */
> + live_range_rec *prev_range;
> + live_range_rec *next_range;
> + };
> +
> + /* Pass-specific information about an instruction. */
> + struct insn_info_rec
> + {
> + /* The instruction itself. */
> + rtx_insn *insn;
> +
> + /* A null-terminated list of live ranges for the things that this
> + instruction defines. */
> + live_range_rec **defs;
> +
> + /* A null-terminated list of live ranges for the things that this
> + instruction uses. */
> + live_range_rec **uses;
> +
> + /* The point at which the instruction appears. */
> + unsigned int point;
> +
> + /* The cost of the instruction, or UNKNOWN_COST if we haven't
> + measured it yet. */
> + unsigned int cost;
> + };
> +
> + /* Describes one attempt to combine instructions. */
> + struct combination_attempt_rec
> + {
> + /* The instruction that we're currently trying to optimize.
> + If the combination succeeds, we'll use this insn_info_rec
> + to describe the new instruction. */
> + insn_info_rec *new_home;
> +
> + /* The instructions we're combining, in program order. */
> + insn_info_rec *sequence[MAX_COMBINE_INSNS];
> +
> + /* If we're substituting SEQUENCE[0] into SEQUENCE[1], this is the
> + live range that describes the substituted register. */
> + live_range_rec *def_use_range;
> +
> + /* The earliest and latest points at which we could insert the
> + combined instruction. */
> + unsigned int earliest_point;
> + unsigned int latest_point;
> +
> + /* The cost of the new instruction, once we have a successful match. */
> + unsigned int new_cost;
> + };
> +
> + /* Pass-specific information about a register. */
> + struct reg_info_rec
> + {
> + /* The live range associated with the last reference to the register. */
> + live_range_rec *range;
> +
> + /* The point at which the last reference occurred. */
> + unsigned int next_ref;
> +
> + /* True if the register is currently live. We record this here rather
> + than in a separate bitmap because (a) there's a natural hole for
> + it on LP64 hosts and (b) we only refer to it when updating the
> + other fields, and so recording it here should give better locality. */
> + unsigned int live_p : 1;
> + };
> +
> + live_range_rec *new_live_range (unsigned int, live_range_rec *);
> + live_range_rec *reg_live_range (unsigned int);
> + live_range_rec *mem_live_range ();
> + bool add_range_use (live_range_rec *, insn_info_rec *);
> + void remove_range_use (live_range_rec *, insn_info_rec *);
> + bool has_single_use_p (live_range_rec *);
> + bool known_last_use_p (live_range_rec *, insn_info_rec *);
> + unsigned int find_earliest_point (insn_info_rec *, insn_info_rec *);
> + unsigned int find_latest_point (insn_info_rec *, insn_info_rec *);
> + bool start_combination (combination_attempt_rec &, insn_info_rec *,
> + insn_info_rec *, live_range_rec * = NULL);
> + bool verify_combination (combination_attempt_rec &);
> + int estimate_reg_pressure_delta (insn_info_rec *);
> + void commit_combination (combination_attempt_rec &, bool);
> + bool try_parallel_sets (combination_attempt_rec &, rtx, rtx);
> + bool try_parallelize_insns (combination_attempt_rec &);
> + bool try_combine_def_use_1 (combination_attempt_rec &, rtx, rtx, bool);
> + bool try_combine_def_use (combination_attempt_rec &, rtx, rtx);
> + bool try_combine_two_uses (combination_attempt_rec &);
> + bool try_combine (insn_info_rec *, rtx, unsigned int);
> + bool optimize_insn (insn_info_rec *);
> + void record_defs (insn_info_rec *);
> + void record_reg_use (insn_info_rec *, df_ref);
> + void record_uses (insn_info_rec *);
> + void process_insn (insn_info_rec *);
> + void start_sequence ();
> +
> + /* The function we're optimizing. */
> + function *m_fn;
> +
> + /* The highest pseudo register number plus one. */
> + unsigned int m_num_regs;
> +
> + /* The current basic block. */
> + basic_block m_bb;
> +
> + /* True if we should optimize the current basic block for speed. */
> + bool m_optimize_for_speed_p;
> +
> + /* The point number to allocate to the next instruction we visit
> + in the backward traversal. */
> + unsigned int m_point;
> +
> + /* The point number corresponding to the end of the current
> + instruction sequence, i.e. the lowest point number about which
> + we still have valid information. */
> + unsigned int m_end_of_sequence;
> +
> + /* The point number corresponding to the end of the current basic block.
> + This is the same as M_END_OF_SEQUENCE when processing the last
> + instruction sequence in a basic block. */
> + unsigned int m_end_of_bb;
> +
> + /* The memory live range, or null if we haven't yet found a memory
> + reference in the current instruction sequence. */
> + live_range_rec *m_mem_range;
> +
> + /* Gives information about each register. We track both hard and
> + pseudo registers. */
> + auto_vec<reg_info_rec> m_reg_info;
> +
> + /* A bitmap of registers whose entry in m_reg_info is valid. */
> + auto_sbitmap m_valid_regs;
> +
> + /* If nonnuull, an unused 2-element PARALLEL that we can use to test
> + instruction combinations. */
> + rtx m_spare_parallel;
> +
> + /* A bitmap of instructions that we've already tried to combine with. */
> + auto_bitmap m_tried_insns;
> +
> + /* A temporary bitmap used to hold register numbers. */
> + auto_bitmap m_true_deps;
> +
> + /* An obstack used for allocating insn_info_recs and for building
> + up their lists of definitions and uses. */
> + obstack m_insn_obstack;
> +
> + /* An obstack used for allocating live_range_recs. */
> + obstack m_range_obstack;
> +
> + /* Start-of-object pointers for the two obstacks. */
> + char *m_insn_obstack_start;
> + char *m_range_obstack_start;
> +
> + /* A list of instructions that we've optimized and whose new forms
> + change the cfg. */
> + auto_vec<rtx_insn *> m_cfg_altering_insns;
> +
> + /* The INSN_UIDs of all instructions in M_CFG_ALTERING_INSNS. */
> + auto_bitmap m_cfg_altering_insn_ids;
> +
> + /* We can insert new instructions at point P * 2 by inserting them
> + after M_POINTS[P - M_END_OF_SEQUENCE / 2]. We can insert new
> + instructions at point P * 2 + 1 by inserting them before
> + M_POINTS[P - M_END_OF_SEQUENCE / 2]. */
> + auto_vec<rtx_insn *, 256> m_points;
> +};
> +
> +combine2::combine2 (function *fn)
> + : m_fn (fn),
> + m_num_regs (max_reg_num ()),
> + m_bb (NULL),
> + m_optimize_for_speed_p (false),
> + m_point (2),
> + m_end_of_sequence (m_point),
> + m_end_of_bb (m_point),
> + m_mem_range (NULL),
> + m_reg_info (m_num_regs),
> + m_valid_regs (m_num_regs),
> + m_spare_parallel (NULL_RTX)
> +{
> + gcc_obstack_init (&m_insn_obstack);
> + gcc_obstack_init (&m_range_obstack);
> + m_reg_info.quick_grow (m_num_regs);
> + bitmap_clear (m_valid_regs);
> + m_insn_obstack_start = XOBNEWVAR (&m_insn_obstack, char, 0);
> + m_range_obstack_start = XOBNEWVAR (&m_range_obstack, char, 0);
> +}
> +
> +combine2::~combine2 ()
> +{
> + obstack_free (&m_insn_obstack, NULL);
> + obstack_free (&m_range_obstack, NULL);
> +}
> +
> +/* Return true if it's possible in principle to combine INSN with
> + other instructions. ALLOW_ASMS_P is true if the caller can cope
> + with asm statements. */
> +
> +static bool
> +combinable_insn_p (rtx_insn *insn, bool allow_asms_p)
> +{
> + rtx pattern = PATTERN (insn);
> +
> + if (GET_CODE (pattern) == USE || GET_CODE (pattern) == CLOBBER)
> + return false;
> +
> + if (JUMP_P (insn) && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX))
> + return false;
> +
> + if (!allow_asms_p && asm_noperands (PATTERN (insn)) >= 0)
> + return false;
> +
> + return true;
> +}
> +
> +/* Return true if it's possible in principle to move INSN somewhere else,
> + as long as all dependencies are satisfied. */
> +
> +static bool
> +movable_insn_p (rtx_insn *insn)
> +{
> + if (JUMP_P (insn))
> + return false;
> +
> + if (volatile_refs_p (PATTERN (insn)))
> + return false;
> +
> + return true;
> +}
> +
> +/* Create and return a new live range for REGNO. NEXT is the next range
> + in program order, or null if this is the first live range in the
> + sequence. */
> +
> +combine2::live_range_rec *
> +combine2::new_live_range (unsigned int regno, live_range_rec *next)
> +{
> + live_range_rec *range = XOBNEW (&m_range_obstack, live_range_rec);
> + memset (range, 0, sizeof (*range));
> +
> + range->regno = regno;
> + range->next_range = next;
> + if (next)
> + next->prev_range = range;
> + return range;
> +}
> +
> +/* Return the current live range for register REGNO, creating a new
> + one if necessary. */
> +
> +combine2::live_range_rec *
> +combine2::reg_live_range (unsigned int regno)
> +{
> + /* Initialize the liveness flag, if it isn't already valid for this BB. */
> + bool first_ref_p = !bitmap_bit_p (m_valid_regs, regno);
> + if (first_ref_p || m_reg_info[regno].next_ref < m_end_of_bb)
> + m_reg_info[regno].live_p = bitmap_bit_p (df_get_live_out (m_bb), regno);
> +
> + /* See if we already have a live range associated with the current
> + instruction sequence. */
> + live_range_rec *range = NULL;
> + if (!first_ref_p && m_reg_info[regno].next_ref >= m_end_of_sequence)
> + range = m_reg_info[regno].range;
> +
> + /* Create a new range if this is the first reference to REGNO in the
> + current instruction sequence or if the current range has been closed
> + off by a definition. */
> + if (!range || range->producer)
> + {
> + range = new_live_range (regno, range);
> +
> + /* If the register is live after the current sequence, treat that
> + as a fake use at the end of the sequence. */
> + if (!range->next_range && m_reg_info[regno].live_p)
> + range->first_extra_use = range->last_extra_use = m_end_of_sequence;
> +
> + /* Record that this is now the current range for REGNO. */
> + if (first_ref_p)
> + bitmap_set_bit (m_valid_regs, regno);
> + m_reg_info[regno].range = range;
> + m_reg_info[regno].next_ref = m_point;
> + }
> + return range;
> +}
> +
> +/* Return the current live range for memory, treating memory as a single
> + entity. Create a new live range if necessary. */
> +
> +combine2::live_range_rec *
> +combine2::mem_live_range ()
> +{
> + if (!m_mem_range || m_mem_range->producer)
> + m_mem_range = new_live_range (INVALID_REGNUM, m_mem_range);
> + return m_mem_range;
> +}
> +
> +/* Record that instruction USER uses the resource described by RANGE.
> + Return true if this is new information. */
> +
> +bool
> +combine2::add_range_use (live_range_rec *range, insn_info_rec *user)
> +{
> + /* See if we've already recorded the instruction, or if there's a
> + spare use slot we can use. */
> + unsigned int i = 0;
> + for (; i < NUM_RANGE_USERS && range->users[i]; ++i)
> + if (range->users[i] == user)
> + return false;
> +
> + if (i == NUM_RANGE_USERS)
> + {
> + /* Since we've processed USER recently, assume that it's more
> + interesting to record explicitly than the last user in the
> + current list. Evict that last user and describe it in the
> + overflow "extra use" range instead. */
> + insn_info_rec *ousted_user = range->users[--i];
> + if (range->first_extra_use < ousted_user->point)
> + range->first_extra_use = ousted_user->point;
> + if (range->last_extra_use > ousted_user->point)
> + range->last_extra_use = ousted_user->point;
> + }
> +
> + /* Insert USER while keeping the list sorted. */
> + for (; i > 0 && range->users[i - 1]->point < user->point; --i)
> + range->users[i] = range->users[i - 1];
> + range->users[i] = user;
> + return true;
> +}
> +
> +/* Remove USER from the uses recorded for RANGE, if we can.
> + There's nothing we can do if USER was described in the
> + overflow "extra use" range. */
> +
> +void
> +combine2::remove_range_use (live_range_rec *range, insn_info_rec *user)
> +{
> + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i)
> + if (range->users[i] == user)
> + {
> + for (unsigned int j = i; j < NUM_RANGE_USERS - 1; ++j)
> + range->users[j] = range->users[j + 1];
> + range->users[NUM_RANGE_USERS - 1] = NULL;
> + break;
> + }
> +}
> +
> +/* Return true if RANGE has a single known user. */
> +
> +bool
> +combine2::has_single_use_p (live_range_rec *range)
> +{
> + return range->users[0] && !range->users[1] && !range->first_extra_use;
> +}
> +
> +/* Return true if we know that USER is the last user of RANGE. */
> +
> +bool
> +combine2::known_last_use_p (live_range_rec *range, insn_info_rec *user)
> +{
> + if (range->last_extra_use <= user->point)
> + return false;
> +
> + for (unsigned int i = 0; i < NUM_RANGE_USERS && range->users[i]; ++i)
> + if (range->users[i] == user)
> + return i == NUM_RANGE_USERS - 1 || !range->users[i + 1];
> + else if (range->users[i]->point == user->point)
> + return false;
> +
> + gcc_unreachable ();
> +}
> +
> +/* Find the earliest point that we could move I2 up in order to combine
> + it with I1. Ignore any dependencies between I1 and I2; leave the
> + caller to deal with those instead. */
> +
> +unsigned int
> +combine2::find_earliest_point (insn_info_rec *i2, insn_info_rec *i1)
> +{
> + if (!movable_insn_p (i2->insn))
> + return i2->point;
> +
> + /* Start by optimistically assuming that we can move the instruction
> + all the way up to I1. */
> + unsigned int point = i1->point;
> +
> + /* Make sure that the new position preserves all necessary true dependencies
> + on earlier instructions. */
> + for (live_range_rec **use = i2->uses; *use; ++use)
> + {
> + live_range_rec *range = *use;
> + if (range->producer
> + && range->producer != i1
> + && point >= range->producer->point)
> + point = range->producer->point - 1;
> + }
> +
> + /* Make sure that the new position preserves all necessary output and
> + anti dependencies on earlier instructions. */
> + for (live_range_rec **def = i2->defs; *def; ++def)
> + if (live_range_rec *range = (*def)->prev_range)
> + {
> + if (range->producer
> + && range->producer != i1
> + && point >= range->producer->point)
> + point = range->producer->point - 1;
> +
> + for (unsigned int i = NUM_RANGE_USERS - 1; i-- > 0;)
> + if (range->users[i] && range->users[i] != i1)
> + {
> + if (point >= range->users[i]->point)
> + point = range->users[i]->point - 1;
> + break;
> + }
> +
> + if (range->last_extra_use && point >= range->last_extra_use)
> + point = range->last_extra_use - 1;
> + }
> +
> + return point;
> +}
> +
> +/* Find the latest point that we could move I1 down in order to combine
> + it with I2. Ignore any dependencies between I1 and I2; leave the
> + caller to deal with those instead. */
> +
> +unsigned int
> +combine2::find_latest_point (insn_info_rec *i1, insn_info_rec *i2)
> +{
> + if (!movable_insn_p (i1->insn))
> + return i1->point;
> +
> + /* Start by optimistically assuming that we can move the instruction
> + all the way down to I2. */
> + unsigned int point = i2->point;
> +
> + /* Make sure that the new position preserves all necessary anti dependencies
> + on later instructions. */
> + for (live_range_rec **use = i1->uses; *use; ++use)
> + if (live_range_rec *range = (*use)->next_range)
> + if (range->producer != i2 && point <= range->producer->point)
> + point = range->producer->point + 1;
> +
> + /* Make sure that the new position preserves all necessary output and
> + true dependencies on later instructions. */
> + for (live_range_rec **def = i1->defs; *def; ++def)
> + {
> + live_range_rec *range = *def;
> +
> + for (unsigned int i = 0; i < NUM_RANGE_USERS; ++i)
> + if (range->users[i] != i2)
> + {
> + if (range->users[i] && point <= range->users[i]->point)
> + point = range->users[i]->point + 1;
> + break;
> + }
> +
> + if (range->first_extra_use && point <= range->first_extra_use)
> + point = range->first_extra_use + 1;
> +
> + live_range_rec *next_range = range->next_range;
> + if (next_range
> + && next_range->producer != i2
> + && point <= next_range->producer->point)
> + point = next_range->producer->point + 1;
> + }
> +
> + return point;
> +}
> +
> +/* Initialize ATTEMPT for an attempt to combine instructions I1 and I2,
> + where I1 is the instruction that we're currently trying to optimize.
> + If DEF_USE_RANGE is nonnull, I1 defines the value described by
> + DEF_USE_RANGE and I2 uses it. */
> +
> +bool
> +combine2::start_combination (combination_attempt_rec &attempt,
> + insn_info_rec *i1, insn_info_rec *i2,
> + live_range_rec *def_use_range)
> +{
> + attempt.new_home = i1;
> + attempt.sequence[0] = i1;
> + attempt.sequence[1] = i2;
> + if (attempt.sequence[0]->point < attempt.sequence[1]->point)
> + std::swap (attempt.sequence[0], attempt.sequence[1]);
> + attempt.def_use_range = def_use_range;
> +
> + /* Check that the instructions have no true dependencies other than
> + DEF_USE_RANGE. */
> + bitmap_clear (m_true_deps);
> + for (live_range_rec **def = attempt.sequence[0]->defs; *def; ++def)
> + if (*def != def_use_range)
> + bitmap_set_bit (m_true_deps, (*def)->regno);
> + for (live_range_rec **use = attempt.sequence[1]->uses; *use; ++use)
> + if (*use != def_use_range && bitmap_bit_p (m_true_deps, (*use)->regno))
> + return false;
> +
> + /* Calculate the range of points at which the combined instruction
> + could live. */
> + attempt.earliest_point = find_earliest_point (attempt.sequence[1],
> + attempt.sequence[0]);
> + attempt.latest_point = find_latest_point (attempt.sequence[0],
> + attempt.sequence[1]);
> + if (attempt.earliest_point < attempt.latest_point)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "cannot combine %d and %d: no suitable"
> + " location for combined insn\n",
> + INSN_UID (attempt.sequence[0]->insn),
> + INSN_UID (attempt.sequence[1]->insn));
> + return false;
> + }
> +
> + /* Make sure we have valid costs for the original instructions before
> + we start changing their patterns. */
> + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> + if (attempt.sequence[i]->cost == UNKNOWN_COST)
> + attempt.sequence[i]->cost = insn_cost (attempt.sequence[i]->insn,
> + m_optimize_for_speed_p);
> + return true;
> +}
> +
> +/* Check whether the combination attempt described by ATTEMPT matches
> + an .md instruction (or matches its constraints, in the case of an
> + asm statement). If so, calculate the cost of the new instruction
> + and check whether it's cheap enough. */
> +
> +bool
> +combine2::verify_combination (combination_attempt_rec &attempt)
> +{
> + rtx_insn *insn = attempt.sequence[1]->insn;
> +
> + bool ok_p = verify_changes (0);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + if (!ok_p)
> + fprintf (dump_file, "failed to match this instruction:\n");
> + else if (const char *name = get_insn_name (INSN_CODE (insn)))
> + fprintf (dump_file, "successfully matched this instruction to %s:\n",
> + name);
> + else
> + fprintf (dump_file, "successfully matched this instruction:\n");
> + print_rtl_single (dump_file, PATTERN (insn));
> + }
> + if (!ok_p)
> + return false;
> +
> + unsigned int cost1 = attempt.sequence[0]->cost;
> + unsigned int cost2 = attempt.sequence[1]->cost;
> + attempt.new_cost = insn_cost (insn, m_optimize_for_speed_p);
> + ok_p = (attempt.new_cost <= cost1 + cost2);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "original cost = %d + %d, replacement cost = %d; %s\n",
> + cost1, cost2, attempt.new_cost,
> + ok_p ? "keeping replacement" : "rejecting replacement");
> + if (!ok_p)
> + return false;
> +
> + confirm_change_group ();
> + return true;
> +}
> +
> +/* Return true if we should consider register REGNO when calculating
> + register pressure estimates. */
> +
> +static bool
> +count_reg_pressure_p (unsigned int regno)
> +{
> + if (regno == INVALID_REGNUM)
> + return false;
> +
> + /* Unallocatable registers aren't interesting. */
> + if (HARD_REGISTER_NUM_P (regno) && fixed_regs[regno])
> + return false;
> +
> + return true;
> +}
> +
> +/* Try to estimate the effect that the original form of INSN_INFO
> + had on register pressure, in the form "born - dying". */
> +
> +int
> +combine2::estimate_reg_pressure_delta (insn_info_rec *insn_info)
> +{
> + int delta = 0;
> +
> + for (live_range_rec **def = insn_info->defs; *def; ++def)
> + if (count_reg_pressure_p ((*def)->regno))
> + delta += 1;
> +
> + for (live_range_rec **use = insn_info->uses; *use; ++use)
> + if (count_reg_pressure_p ((*use)->regno)
> + && known_last_use_p (*use, insn_info))
> + delta -= 1;
> +
> + return delta;
> +}
> +
> +/* We've moved FROM_INSN's pattern to TO_INSN and are about to delete
> + FROM_INSN. Copy any useful information to TO_INSN before doing that. */
> +
> +static void
> +transfer_insn (rtx_insn *to_insn, rtx_insn *from_insn)
> +{
> + INSN_LOCATION (to_insn) = INSN_LOCATION (from_insn);
> + INSN_CODE (to_insn) = INSN_CODE (from_insn);
> + REG_NOTES (to_insn) = REG_NOTES (from_insn);
> +}
> +
> +/* The combination attempt in ATTEMPT has succeeded and is currently
> + part of an open validate_change group. Commit to making the change
> + and decide where the new instruction should go.
> +
> + KEPT_DEF_P is true if the new instruction continues to perform
> + the definition described by ATTEMPT.def_use_range. */
> +
> +void
> +combine2::commit_combination (combination_attempt_rec &attempt,
> + bool kept_def_p)
> +{
> + insn_info_rec *new_home = attempt.new_home;
> + rtx_insn *old_insn = attempt.sequence[0]->insn;
> + rtx_insn *new_insn = attempt.sequence[1]->insn;
> +
> + /* Remove any notes that are no longer relevant. */
> + bool single_set_p = single_set (new_insn);
> + for (rtx *note_ptr = ®_NOTES (new_insn); *note_ptr; )
> + {
> + rtx note = *note_ptr;
> + bool keep_p = true;
> + switch (REG_NOTE_KIND (note))
> + {
> + case REG_EQUAL:
> + case REG_EQUIV:
> + case REG_NOALIAS:
> + keep_p = single_set_p;
> + break;
> +
> + case REG_UNUSED:
> + keep_p = false;
> + break;
> +
> + default:
> + break;
> + }
> + if (keep_p)
> + note_ptr = &XEXP (*note_ptr, 1);
> + else
> + {
> + *note_ptr = XEXP (*note_ptr, 1);
> + free_EXPR_LIST_node (note);
> + }
> + }
> +
> + /* Complete the open validate_change group. */
> + confirm_change_group ();
> +
> + /* Decide where the new instruction should go. */
> + unsigned int new_point = attempt.latest_point;
> + if (new_point != attempt.earliest_point
> + && prev_real_insn (new_insn) != old_insn)
> + {
> + /* Prefer the earlier point if the combined instruction reduces
> + register pressure and the latest point if it increases register
> + pressure.
> +
> + The choice isn't obvious in the event of a tie, but picking
> + the earliest point should reduce the number of times that
> + we need to invalidate debug insns. */
> + int delta1 = estimate_reg_pressure_delta (attempt.sequence[0]);
> + int delta2 = estimate_reg_pressure_delta (attempt.sequence[1]);
> + bool move_up_p = (delta1 + delta2 <= 0);
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> + "register pressure delta = %d + %d; using %s position\n",
> + delta1, delta2, move_up_p ? "earliest" : "latest");
> + if (move_up_p)
> + new_point = attempt.earliest_point;
> + }
> +
> + /* Translate inserting at NEW_POINT into inserting before or after
> + a particular insn. */
> + rtx_insn *anchor = NULL;
> + bool before_p = (new_point & 1);
> + if (new_point != attempt.sequence[1]->point
> + && new_point != attempt.sequence[0]->point)
> + {
> + anchor = m_points[(new_point - m_end_of_sequence) / 2];
> + rtx_insn *other_side = (before_p
> + ? prev_real_insn (anchor)
> + : next_real_insn (anchor));
> + /* Inserting next to an insn X and then deleting X is just a
> + roundabout way of using X as the insertion point. */
> + if (anchor == new_insn || other_side == new_insn)
> + new_point = attempt.sequence[1]->point;
> + else if (anchor == old_insn || other_side == old_insn)
> + new_point = attempt.sequence[0]->point;
> + }
> +
> + /* Actually perform the move. */
> + if (new_point == attempt.sequence[1]->point)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "using insn %d to hold the combined pattern\n",
> + INSN_UID (new_insn));
> + set_insn_deleted (old_insn);
> + }
> + else if (new_point == attempt.sequence[0]->point)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "using insn %d to hold the combined pattern\n",
> + INSN_UID (old_insn));
> + PATTERN (old_insn) = PATTERN (new_insn);
> + transfer_insn (old_insn, new_insn);
> + std::swap (old_insn, new_insn);
> + set_insn_deleted (old_insn);
> + }
> + else
> + {
> + /* We need to insert a new instruction. We can't simply move
> + NEW_INSN because it acts as an insertion anchor in m_points. */
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "inserting combined insn %s insn %d\n",
> + before_p ? "before" : "after", INSN_UID (anchor));
> +
> + rtx_insn *added_insn = (before_p
> + ? emit_insn_before (PATTERN (new_insn), anchor)
> + : emit_insn_after (PATTERN (new_insn), anchor));
> + transfer_insn (added_insn, new_insn);
> + set_insn_deleted (old_insn);
> + set_insn_deleted (new_insn);
> + new_insn = added_insn;
> + }
> + df_insn_rescan (new_insn);
> +
> + /* Unlink the old uses. */
> + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> + for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use)
> + remove_range_use (*use, attempt.sequence[i]);
> +
> + /* Work out which registers the new pattern uses. */
> + bitmap_clear (m_true_deps);
> + df_ref use;
> + FOR_EACH_INSN_USE (use, new_insn)
> + {
> + rtx reg = DF_REF_REAL_REG (use);
> + bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg));
> + }
> + FOR_EACH_INSN_EQ_USE (use, new_insn)
> + {
> + rtx reg = DF_REF_REAL_REG (use);
> + bitmap_set_range (m_true_deps, REGNO (reg), REG_NREGS (reg));
> + }
> +
> + /* Describe the combined instruction in NEW_HOME. */
> + new_home->insn = new_insn;
> + new_home->point = new_point;
> + new_home->cost = attempt.new_cost;
> +
> + /* Build up a list of definitions for the combined instructions
> + and update all the ranges accordingly. It shouldn't matter
> + which order we do this in. */
> + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> + for (live_range_rec **def = attempt.sequence[i]->defs; *def; ++def)
> + if (kept_def_p || *def != attempt.def_use_range)
> + {
> + obstack_ptr_grow (&m_insn_obstack, *def);
> + (*def)->producer = new_home;
> + }
> + obstack_ptr_grow (&m_insn_obstack, NULL);
> + new_home->defs = (live_range_rec **) obstack_finish (&m_insn_obstack);
> +
> + /* Build up a list of uses for the combined instructions and update
> + all the ranges accordingly. Again, it shouldn't matter which
> + order we do this in. */
> + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> + for (live_range_rec **use = attempt.sequence[i]->uses; *use; ++use)
> + if (*use != attempt.def_use_range
> + && add_range_use (*use, new_home))
> + obstack_ptr_grow (&m_insn_obstack, *use);
> + obstack_ptr_grow (&m_insn_obstack, NULL);
> + new_home->uses = (live_range_rec **) obstack_finish (&m_insn_obstack);
> +
> + /* There shouldn't be any remaining references to other instructions
> + in the combination. Invalidate their contents to make lingering
> + references a noisy failure. */
> + for (unsigned int i = 0; i < MAX_COMBINE_INSNS; ++i)
> + if (attempt.sequence[i] != new_home)
> + {
> + attempt.sequence[i]->insn = NULL;
> + attempt.sequence[i]->point = ~0U;
> + }
> +
> + /* Unlink the def-use range. */
> + if (!kept_def_p && attempt.def_use_range)
> + {
> + live_range_rec *range = attempt.def_use_range;
> + if (range->prev_range)
> + range->prev_range->next_range = range->next_range;
> + else
> + m_reg_info[range->regno].range = range->next_range;
> + if (range->next_range)
> + range->next_range->prev_range = range->prev_range;
> + }
> +
> + /* Record instructions whose new form alters the cfg. */
> + rtx pattern = PATTERN (new_insn);
> + if ((returnjump_p (new_insn)
> + || any_uncondjump_p (new_insn)
> + || (GET_CODE (pattern) == TRAP_IF && XEXP (pattern, 0) == const1_rtx))
> + && bitmap_set_bit (m_cfg_altering_insn_ids, INSN_UID (new_insn)))
> + m_cfg_altering_insns.safe_push (new_insn);
> +}
> +
> +/* Return true if X1 and X2 are memories and if X1 does not have
> + a higher alignment than X2. */
> +
> +static bool
> +dubious_mem_pair_p (rtx x1, rtx x2)
> +{
> + return MEM_P (x1) && MEM_P (x2) && MEM_ALIGN (x1) <= MEM_ALIGN (x2);
> +}
> +
> +/* Try implement ATTEMPT using (parallel [SET1 SET2]). */
> +
> +bool
> +combine2::try_parallel_sets (combination_attempt_rec &attempt,
> + rtx set1, rtx set2)
> +{
> + rtx_insn *insn = attempt.sequence[1]->insn;
> +
> + /* Combining two loads or two stores can be useful on targets that
> + allow them to be treated as a single access. However, we use a
> + very peephole approach to picking the pairs, so we need to be
> + relatively confident that we're making a good choice.
> +
> + For now just aim for cases in which the memory references are
> + consecutive and the first reference has a higher alignment.
> + We can leave the target to test the consecutive part; whatever test
> + we added here might be different from the target's, and in any case
> + it's fine if the target accepts other well-aligned cases too. */
> + if (dubious_mem_pair_p (SET_DEST (set1), SET_DEST (set2))
> + || dubious_mem_pair_p (SET_SRC (set1), SET_SRC (set2)))
> + return false;
> +
> + /* Cache the PARALLEL rtx between attempts so that we don't generate
> + too much garbage rtl. */
> + if (!m_spare_parallel)
> + {
> + rtvec vec = gen_rtvec (2, set1, set2);
> + m_spare_parallel = gen_rtx_PARALLEL (VOIDmode, vec);
> + }
> + else
> + {
> + XVECEXP (m_spare_parallel, 0, 0) = set1;
> + XVECEXP (m_spare_parallel, 0, 1) = set2;
> + }
> +
> + unsigned int num_changes = num_validated_changes ();
> + validate_change (insn, &PATTERN (insn), m_spare_parallel, true);
> + if (verify_combination (attempt))
> + {
> + m_spare_parallel = NULL_RTX;
> + return true;
> + }
> + cancel_changes (num_changes);
> + return false;
> +}
> +
> +/* Try to parallelize the two instructions in ATTEMPT. */
> +
> +bool
> +combine2::try_parallelize_insns (combination_attempt_rec &attempt)
> +{
> + rtx_insn *i1_insn = attempt.sequence[0]->insn;
> + rtx_insn *i2_insn = attempt.sequence[1]->insn;
> +
> + /* Can't parallelize asm statements. */
> + if (asm_noperands (PATTERN (i1_insn)) >= 0
> + || asm_noperands (PATTERN (i2_insn)) >= 0)
> + return false;
> +
> + /* For now, just handle the case in which both instructions are
> + single sets. We could handle more than 2 sets as well, but few
> + targets support that anyway. */
> + rtx set1 = single_set (i1_insn);
> + if (!set1)
> + return false;
> + rtx set2 = single_set (i2_insn);
> + if (!set2)
> + return false;
> +
> + /* Make sure that we have structural proof that the destinations
> + are independent. Things like alias analysis rely on semantic
> + information and assume no undefined behavior, which is rarely a
> + good enough guarantee to allow a useful instruction combination. */
> + rtx dest1 = SET_DEST (set1);
> + rtx dest2 = SET_DEST (set2);
> + if (MEM_P (dest1)
> + ? MEM_P (dest2) && nonoverlapping_memrefs_p (dest1, dest2, false)
> + : !MEM_P (dest2) && reg_overlap_mentioned_p (dest1, dest2))
> + return false;
> +
> + /* Try the sets in both orders. */
> + if (try_parallel_sets (attempt, set1, set2)
> + || try_parallel_sets (attempt, set2, set1))
> + {
> + commit_combination (attempt, true);
> + if (MAY_HAVE_DEBUG_BIND_INSNS
> + && attempt.new_home->insn != i1_insn)
> + propagate_for_debug (i1_insn, attempt.new_home->insn,
> +
next prev parent reply other threads:[~2019-11-18 0:46 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-18 0:21 Richard Sandiford
2019-11-18 2:04 ` Andrew Pinski [this message]
2019-11-18 17:58 ` Richard Sandiford
2019-11-23 23:01 ` Segher Boessenkool
2019-11-23 23:09 ` Nicholas Krause
2019-11-23 23:43 ` Segher Boessenkool
2019-11-24 0:11 ` Nicholas Krause
2019-11-25 22:09 ` Richard Sandiford
2019-11-25 22:52 ` Segher Boessenkool
2019-12-03 13:33 ` Oleg Endo
2019-12-03 18:05 ` Segher Boessenkool
2019-12-04 10:43 ` Oleg Endo
2019-12-06 22:51 ` Segher Boessenkool
2019-12-06 23:47 ` Oleg Endo
2019-11-19 0:11 ` Segher Boessenkool
2019-11-19 11:36 ` Richard Sandiford
2019-11-19 21:13 ` Segher Boessenkool
2019-11-20 18:29 ` Richard Sandiford
2019-11-20 20:49 ` Segher Boessenkool
2019-11-21 21:12 ` Richard Sandiford
2019-11-22 16:39 ` Segher Boessenkool
2019-11-25 21:25 ` Richard Sandiford
2019-11-25 22:17 ` Segher Boessenkool
2019-11-25 23:26 ` Richard Sandiford
2019-11-26 1:44 ` Segher Boessenkool
2019-11-27 8:32 ` Richard Biener
2019-11-27 10:18 ` Richard Sandiford
2019-11-27 19:36 ` Segher Boessenkool
2019-11-21 19:02 ` Nicholas Krause
2019-11-21 19:47 ` Richard Sandiford
2019-11-22 16:27 ` Segher Boessenkool
2019-12-05 10:17 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+=Sn1kz221_SyF5FdoX1JuydzgHsTifNQc2UpViVm59aF8HDw@mail.gmail.com' \
--to=pinskia@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).