From: Richard Sandiford <richard.sandiford@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, Nicholas Krause <xerofoify@gmail.com>
Subject: Re: Add a new combine pass
Date: Mon, 25 Nov 2019 21:25:00 -0000 [thread overview]
Message-ID: <mptftibac6z.fsf@arm.com> (raw)
In-Reply-To: <20191122163911.GE9491@gate.crashing.org> (Segher Boessenkool's message of "Fri, 22 Nov 2019 10:39:11 -0600")
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Thu, Nov 21, 2019 at 08:32:14PM +0000, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > It's not great if every pass invents its own version of some common
>> > infrastructure thing because that common one is not suitable.
>> >
>> > I.e., can this be fixed somehow? Maybe just by having a restricted DU
>> > chains df problem?
>>
>> Well, it'd probably make sense to make fwprop.c's approach available
>> as a "proper" df interface at some point. Hopefully if anyone wants the
>> same thing as fwprop.c, they'd do that rather than copy the code. :-)
>
>> >> * Updating a full, ordered def-use chain after a move is a linear-time
>> >> operation, so whatever happens, we'd need to apply some kind of limit
>> >> on the number of uses we maintain, with something like that integer
>> >> point range for the rest.
>
> Yeah.
>
>> >> * Once we've analysed the insn and built its def-use chains, we don't
>> >> look at the df_refs again until we update the chains after a successful
>> >> combination. So it should be more efficient to maintain a small array
>> >> of insn_info_rec pointers alongside the numerical range, rather than
>> >> walk and pollute chains of df_refs and then link back the insn uids
>> >> to the pass-local info.
>> >
>> > So you need something like combine's LOG_LINKS? Not that handling those
>> > is not quadratic in the worst case, but in practice it works well. And
>> > it *could* be made linear.
>>
>> Not sure why what I've used isn't what I need though :-)
>
> I am wondering the other way around :-) Is what you do for combine2
> something that would be more generally applicable/useful? That's what
> I'm trying to find out :-)
>
> What combine does could use some improvement, if you want to hear a
> more direct motivations. LOG_LINKS just skip references we cannot
> handle (and some more), so we always have to do modified_between etc.,
> which hurts.
The trade-offs behind the choice of representation are very specific
to the pass. You'd only pick this if you wanted both to propagate
definitions into uses and to move insns around. You'd also only pick
it if you were happy with tracking a small number of named uses per
definition. I can't think of any other passes that would prefer this
over what they already use. (Combine itself is an exception, since
the new pass started out as a deliberate attempt to start from scratch.)
>> >> Target Tests Delta Best Worst Median
>> >> avr-elf 1341 -111401 -13824 680 -10
>> >
>> > Things like this are kind of suspicious :-)
>>
>> Yeah. This mostly seems to come from mopping up the extra moves created
>> by make_more_copies. So we have combinations like:
>>
>> 58: r70:SF=r94:SF
>> REG_DEAD r94:SF
>> 60: r22:SF=r70:SF
>> REG_DEAD r70:SF
>
> Why didn't combine do this? A target problem?
Seems to be because combine rejects hard-reg destinations whose classes
are likely spilled (cant_combine_insn_p). This SF argument register
happens to overlap POINTER_X_REGS and POINTER_Y_REGS and so we reject
the combination based on POINTER_X_REGS being likely spilled.
I think the same thing could happen on other targets, e.g. for
TAILCALL_ADDR_REGS on aarch64.
>> So there's only one case in which it isn't a win, but the number of
>> tests is tiny. So I agree there's no justification for trying this in
>> combine proper as things stand (and I wasn't arguing otherwise FWIW).
>> I'd still like to keep it in the new pass because it does help
>> *sometimes* and there's no sign yet that it has a noticeable
>> compile-time cost.
>
> So when does it help? I can only think of cases where there are
> problems elsewhere.
The full list of affected tests (all at -O2 -ftree-vectorize) are:
arc-elf gcc.c-torture/compile/pr67506.c
avr-elf gcc.dg/torture/pr77916.c
bpf-elf gcc.dg/torture/vshuf-v8hi.c
bpf-elf gcc.dg/torture/vshuf-v4si.c
bfin-elf gcc.dg/torture/vshuf-v8qi.c
c6x-elf gcc.c-torture/execute/991118-1.c
cr16-elf gcc.c-torture/compile/pr82052.c
epiphany-elf gcc.c-torture/execute/991118-1.c
epiphany-elf gcc.dg/pr77664.c
epiphany-elf gcc.dg/vect/vect-mult-pattern-2.c
epiphany-elf gcc.dg/torture/vshuf-v8hi.c
epiphany-elf gcc.dg/tree-ssa/pr77664.c
epiphany-elf gcc.dg/tree-ssa/negneg-3.c
fr30-elf gcc.dg/torture/vshuf-v4hi.c
fr30-elf gcc.dg/torture/vshuf-v8hi.c
frv-linux-gnu gcc.dg/torture/vshuf-v4hi.c
frv-linux-gnu gcc.dg/torture/vshuf-v8hi.c
h8300-elf gcc.c-torture/execute/20000422-1.c
h8300-elf gcc.dg/torture/pr77916.c
ia64-linux-gnu gcc.c-torture/execute/ieee/pr30704.c
ia64-linux-gnu gcc.dg/vect/pr49478.c
ia64-linux-gnu gcc.dg/tree-ssa/ldist-16.c
i686-apple-darwin gcc.dg/vect/vect-mult-pattern-2.c
m32r-elf gcc.dg/store_merging_8.c
m32r-elf gcc.dg/torture/vshuf-v4hi.c
m32r-elf gcc.dg/torture/vshuf-v8hi.c
m32r-elf gcc.dg/tree-ssa/vrp61.c
mcore-elf gcc.c-torture/execute/991118-1.c
mcore-elf gcc.dg/torture/vshuf-v4hi.c
mcore-elf gcc.dg/torture/vshuf-v8hi.c
mcore-elf gcc.dg/torture/vshuf-v8qi.c
mmix gcc.dg/torture/20181024-1.c
mn10300-elf g++.dg/warn/Warray-bounds-6.C
moxie-rtems gcc.c-torture/execute/930718-1.c
moxie-rtems gcc.c-torture/compile/pr70263-1.c
moxie-rtems gcc.dg/graphite/scop-5.c
moxie-rtems g++.dg/pr80707.C
nds32le-elf gcc.dg/torture/vshuf-v16qi.c
nios2-linux-gnu gcc.dg/torture/vshuf-v8qi.c
or1k-elf gcc.dg/torture/vshuf-v4hi.c
or1k-elf gcc.dg/torture/vshuf-v8hi.c
or1k-elf gcc.dg/tree-ssa/vrp61.c
powerpc-ibm-aix7.0 g++.dg/warn/Wunused-3.C
powerpc-ibm-aix7.0 g++.dg/lto/pr88049_0.C
powerpc-ibm-aix7.0 g++.dg/other/cxa-atexit1.C
s390-linux-gnu gcc.c-torture/compile/20020304-1.c
s390-linux-gnu gcc.dg/atomic-op-1.c
s390-linux-gnu gcc.dg/atomic/stdatomic-op-1.c
s390-linux-gnu gcc.dg/atomic/c11-atomic-exec-2.c
s390-linux-gnu gcc.dg/atomic/c11-atomic-exec-3.c
s390-linux-gnu gcc.dg/ubsan/float-cast-overflow-atomic.c
s390x-linux-gnu gcc.c-torture/compile/20020304-1.c
sh-linux-gnu gcc.c-torture/execute/991118-1.c
sh-linux-gnu gcc.dg/torture/vshuf-v8qi.c
sparc-linux-gnu gcc.dg/pr56890-2.c
sparc-linux-gnu gcc.dg/torture/vshuf-v4hi.c
sparc-linux-gnu gcc.dg/torture/vshuf-v8hi.c
sparc-linux-gnu gcc.dg/torture/20181024-1.c
sparc64-linux-gnu gcc.dg/torture/20181024-1.c
xstormy16-elf gcc.c-torture/execute/strlen-5.c
xstormy16-elf gcc.c-torture/execute/20080424-1.c
xstormy16-elf gcc.c-torture/compile/pr60655-1.c
xstormy16-elf gcc.c-torture/compile/pr60655-2.c
xstormy16-elf gcc.dg/Wrestrict-9.c
xstormy16-elf gcc.dg/graphite/scop-15.c
xstormy16-elf gcc.dg/guality/pr43051-1.c
xstormy16-elf gcc.dg/torture/pr68955.c
xstormy16-elf gcc.dg/torture/pr58955-2.c
xstormy16-elf gcc.dg/tree-ssa/builtin-sprintf-warn-23.c
The s390x-linux-gnu test is one in which we have:
116: {r167:DI=r86:DI-0x1000;clobber %cc:CC;}
REG_DEAD r86:DI
REG_UNUSED %cc:CC
118: %r2:DI=[r167:DI+r155:DI+0x5]
REG_DEAD r167:DI
REG_DEAD r155:DI
REG_EQUAL [r167:DI+0x1005]
and so the 0x1000s cancel each other out. And yeah, you could
definitely argue that it's a problem elsewhere. :-) Expand has:
;; _32 = BGl_equalzf3zf3zz__r4_equivalence_6_2z00 (_31, 2B);
(insn 113 112 114 (set (reg:DI 164)
(const_int -4096 [0xfffffffffffff000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
(insn 114 113 115 (set (reg:DI 165)
(reg:DI 164)) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
(insn 115 114 116 (set (reg:DI 166)
(const_int 4096 [0x1000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
(insn 116 115 117 (parallel [
(set (reg:DI 167)
(plus:DI (reg:DI 86 [ BgL_cdrzd21994zd2_959.10_27 ])
(reg:DI 165)))
(clobber (reg:CC 33 %cc))
]) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
(insn 117 116 118 (set (reg:DI 3 %r3)
(const_int 2 [0x2])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
(insn 118 117 119 (set (reg:DI 2 %r2)
(mem/f/j:DI (plus:DI (plus:DI (reg:DI 167)
(reg:DI 166))
(const_int 5 [0x5])) [2 _30->pair_t.cdr+0 S8 A64])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
(nil))
>> It might also be interesting to see how much difference it makes for
>> run-combine=4 (e.g. to see how much it makes up for the current 2-insn
>> limit)...
>
> Numbers are good :-)
FWIW, it does make more of a difference there, but not massively:
Target Tests Delta Best Worst Median
====== ===== ===== ==== ===== ======
aarch64-linux-gnu 5 -15 -5 -1 -3
aarch64_be-linux-gnu 4 -14 -5 -2 -4
arc-elf 1 -4 -4 -4 -4
arm-linux-gnueabi 4 -22 -10 -2 -8
arm-linux-gnueabihf 4 -22 -10 -2 -8
avr-elf 1 -1 -1 -1 -1
bfin-elf 25 -592 -223 3 -5
bpf-elf 47 -508 -95 -1 -3
c6x-elf 26 -388 -74 1 -4
cr16-elf 18 -142 -82 -1 -2
csky-elf 5 -10 -4 -1 -2
epiphany-elf 30 -514 -155 -1 -4
fr30-elf 28 -416 -140 -1 -3
frv-linux-gnu 45 -1274 -209 -1 -4
ft32-elf 7 -17 -6 -1 -2
h8300-elf 3 -7 -5 -1 -1
hppa64-hp-hpux11.23 1 -1 -1 -1 -1
i686-apple-darwin 1 -3 -3 -3 -3
ia64-linux-gnu 8 -86 -26 -5 -10
iq2000-elf 1 -2 -2 -2 -2
m32r-elf 78 -1692 -308 -2 -4
mcore-elf 58 -1117 -174 3 -5
mipsel-linux-gnu 7 -26 -8 -2 -3
mipsisa64-linux-gnu 30 -136 -18 -2 -3
mmix 5 -7 -2 -1 -1
mn10300-elf 1 -2 -2 -2 -2
moxie-rtems 11 -35 -5 -2 -3
msp430-elf 1 -1 -1 -1 -1
nds32le-elf 15 -142 -88 -1 -2
nios2-linux-gnu 22 -259 -110 -1 -4
nvptx-none 2 -8 -4 -4 -4
or1k-elf 34 -592 -160 -1 -3
powerpc64le-linux-gnu 1 -8 -8 -8 -8
riscv32-elf 4 -11 -6 -1 -2
riscv64-elf 2 -7 -6 -1 -6
rl78-elf 1 -7 -7 -7 -7
rx-elf 1 -2 -2 -2 -2
s390-linux-gnu 35 708 -12 292 -1
s390x-linux-gnu 15 -53 -6 -2 -3
sh-linux-gnu 38 -741 -141 2 -6
sparc-linux-gnu 26 -478 -156 -1 -7
sparc64-linux-gnu 10 -86 -28 -2 -4
vax-netbsdelf 1 -4 -4 -4 -4
visium-elf 30 -467 -159 -1 -4
x86_64-darwin 7 -24 -10 -1 -2
x86_64-linux-gnu 7 -26 -12 -1 -2
xstormy16-elf 15 -70 -45 2 -2
xtensa-elf 26 -682 -226 -2 -4
Thanks,
Richard
next prev parent reply other threads:[~2019-11-25 21:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-18 0:21 Richard Sandiford
2019-11-18 2:04 ` Andrew Pinski
2019-11-18 17:58 ` Richard Sandiford
2019-11-23 23:01 ` Segher Boessenkool
2019-11-23 23:09 ` Nicholas Krause
2019-11-23 23:43 ` Segher Boessenkool
2019-11-24 0:11 ` Nicholas Krause
2019-11-25 22:09 ` Richard Sandiford
2019-11-25 22:52 ` Segher Boessenkool
2019-12-03 13:33 ` Oleg Endo
2019-12-03 18:05 ` Segher Boessenkool
2019-12-04 10:43 ` Oleg Endo
2019-12-06 22:51 ` Segher Boessenkool
2019-12-06 23:47 ` Oleg Endo
2019-11-19 0:11 ` Segher Boessenkool
2019-11-19 11:36 ` Richard Sandiford
2019-11-19 21:13 ` Segher Boessenkool
2019-11-20 18:29 ` Richard Sandiford
2019-11-20 20:49 ` Segher Boessenkool
2019-11-21 21:12 ` Richard Sandiford
2019-11-22 16:39 ` Segher Boessenkool
2019-11-25 21:25 ` Richard Sandiford [this message]
2019-11-25 22:17 ` Segher Boessenkool
2019-11-25 23:26 ` Richard Sandiford
2019-11-26 1:44 ` Segher Boessenkool
2019-11-27 8:32 ` Richard Biener
2019-11-27 10:18 ` Richard Sandiford
2019-11-27 19:36 ` Segher Boessenkool
2019-11-21 19:02 ` Nicholas Krause
2019-11-21 19:47 ` Richard Sandiford
2019-11-22 16:27 ` Segher Boessenkool
2019-12-05 10:17 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptftibac6z.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
--cc=xerofoify@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).