public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, Nicholas Krause <xerofoify@gmail.com>
Subject: Re: Add a new combine pass
Date: Mon, 25 Nov 2019 21:25:00 -0000	[thread overview]
Message-ID: <mptftibac6z.fsf@arm.com> (raw)
In-Reply-To: <20191122163911.GE9491@gate.crashing.org> (Segher Boessenkool's	message of "Fri, 22 Nov 2019 10:39:11 -0600")

Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Thu, Nov 21, 2019 at 08:32:14PM +0000, Richard Sandiford wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > It's not great if every pass invents its own version of some common
>> > infrastructure thing because that common one is not suitable.
>> >
>> > I.e., can this be fixed somehow?  Maybe just by having a restricted DU
>> > chains df problem?
>> 
>> Well, it'd probably make sense to make fwprop.c's approach available
>> as a "proper" df interface at some point.  Hopefully if anyone wants the
>> same thing as fwprop.c, they'd do that rather than copy the code. :-)
>
>> >> * Updating a full, ordered def-use chain after a move is a linear-time
>> >>   operation, so whatever happens, we'd need to apply some kind of limit
>> >>   on the number of uses we maintain, with something like that integer
>> >>   point range for the rest.
>
> Yeah.
>
>> >> * Once we've analysed the insn and built its def-use chains, we don't
>> >>   look at the df_refs again until we update the chains after a successful
>> >>   combination.  So it should be more efficient to maintain a small array
>> >>   of insn_info_rec pointers alongside the numerical range, rather than
>> >>   walk and pollute chains of df_refs and then link back the insn uids
>> >>   to the pass-local info.
>> >
>> > So you need something like combine's LOG_LINKS?  Not that handling those
>> > is not quadratic in the worst case, but in practice it works well.  And
>> > it *could* be made linear.
>> 
>> Not sure why what I've used isn't what I need though :-)
>
> I am wondering the other way around :-)  Is what you do for combine2
> something that would be more generally applicable/useful?  That's what
> I'm trying to find out :-)
>
> What combine does could use some improvement, if you want to hear a
> more direct motivations.  LOG_LINKS just skip references we cannot
> handle (and some more), so we always have to do modified_between etc.,
> which hurts.

The trade-offs behind the choice of representation are very specific
to the pass.  You'd only pick this if you wanted both to propagate
definitions into uses and to move insns around.  You'd also only pick
it if you were happy with tracking a small number of named uses per
definition.  I can't think of any other passes that would prefer this
over what they already use.  (Combine itself is an exception, since
the new pass started out as a deliberate attempt to start from scratch.)

>> >> Target                 Tests   Delta    Best   Worst  Median
>> >> avr-elf                 1341 -111401  -13824     680     -10
>> >
>> > Things like this are kind of suspicious :-)
>> 
>> Yeah.  This mostly seems to come from mopping up the extra moves created
>> by make_more_copies.  So we have combinations like:
>> 
>>    58: r70:SF=r94:SF
>>       REG_DEAD r94:SF
>>    60: r22:SF=r70:SF
>>       REG_DEAD r70:SF
>
> Why didn't combine do this?  A target problem?

Seems to be because combine rejects hard-reg destinations whose classes
are likely spilled (cant_combine_insn_p).  This SF argument register
happens to overlap POINTER_X_REGS and POINTER_Y_REGS and so we reject
the combination based on POINTER_X_REGS being likely spilled.

I think the same thing could happen on other targets, e.g. for
TAILCALL_ADDR_REGS on aarch64.

>> So there's only one case in which it isn't a win, but the number of
>> tests is tiny.  So I agree there's no justification for trying this in
>> combine proper as things stand (and I wasn't arguing otherwise FWIW).
>> I'd still like to keep it in the new pass because it does help
>> *sometimes* and there's no sign yet that it has a noticeable
>> compile-time cost.
>
> So when does it help?  I can only think of cases where there are
> problems elsewhere.

The full list of affected tests (all at -O2 -ftree-vectorize) are:

    arc-elf             gcc.c-torture/compile/pr67506.c
    avr-elf             gcc.dg/torture/pr77916.c
    bpf-elf             gcc.dg/torture/vshuf-v8hi.c
    bpf-elf             gcc.dg/torture/vshuf-v4si.c
    bfin-elf            gcc.dg/torture/vshuf-v8qi.c
    c6x-elf             gcc.c-torture/execute/991118-1.c
    cr16-elf            gcc.c-torture/compile/pr82052.c
    epiphany-elf        gcc.c-torture/execute/991118-1.c
    epiphany-elf        gcc.dg/pr77664.c
    epiphany-elf        gcc.dg/vect/vect-mult-pattern-2.c
    epiphany-elf        gcc.dg/torture/vshuf-v8hi.c
    epiphany-elf        gcc.dg/tree-ssa/pr77664.c
    epiphany-elf        gcc.dg/tree-ssa/negneg-3.c
    fr30-elf            gcc.dg/torture/vshuf-v4hi.c
    fr30-elf            gcc.dg/torture/vshuf-v8hi.c
    frv-linux-gnu       gcc.dg/torture/vshuf-v4hi.c
    frv-linux-gnu       gcc.dg/torture/vshuf-v8hi.c
    h8300-elf           gcc.c-torture/execute/20000422-1.c
    h8300-elf           gcc.dg/torture/pr77916.c
    ia64-linux-gnu      gcc.c-torture/execute/ieee/pr30704.c
    ia64-linux-gnu      gcc.dg/vect/pr49478.c
    ia64-linux-gnu      gcc.dg/tree-ssa/ldist-16.c
    i686-apple-darwin   gcc.dg/vect/vect-mult-pattern-2.c
    m32r-elf            gcc.dg/store_merging_8.c
    m32r-elf            gcc.dg/torture/vshuf-v4hi.c
    m32r-elf            gcc.dg/torture/vshuf-v8hi.c
    m32r-elf            gcc.dg/tree-ssa/vrp61.c
    mcore-elf           gcc.c-torture/execute/991118-1.c
    mcore-elf           gcc.dg/torture/vshuf-v4hi.c
    mcore-elf           gcc.dg/torture/vshuf-v8hi.c
    mcore-elf           gcc.dg/torture/vshuf-v8qi.c
    mmix                gcc.dg/torture/20181024-1.c
    mn10300-elf         g++.dg/warn/Warray-bounds-6.C
    moxie-rtems         gcc.c-torture/execute/930718-1.c
    moxie-rtems         gcc.c-torture/compile/pr70263-1.c
    moxie-rtems         gcc.dg/graphite/scop-5.c
    moxie-rtems         g++.dg/pr80707.C
    nds32le-elf         gcc.dg/torture/vshuf-v16qi.c
    nios2-linux-gnu     gcc.dg/torture/vshuf-v8qi.c
    or1k-elf            gcc.dg/torture/vshuf-v4hi.c
    or1k-elf            gcc.dg/torture/vshuf-v8hi.c
    or1k-elf            gcc.dg/tree-ssa/vrp61.c
    powerpc-ibm-aix7.0  g++.dg/warn/Wunused-3.C
    powerpc-ibm-aix7.0  g++.dg/lto/pr88049_0.C
    powerpc-ibm-aix7.0  g++.dg/other/cxa-atexit1.C
    s390-linux-gnu      gcc.c-torture/compile/20020304-1.c
    s390-linux-gnu      gcc.dg/atomic-op-1.c
    s390-linux-gnu      gcc.dg/atomic/stdatomic-op-1.c
    s390-linux-gnu      gcc.dg/atomic/c11-atomic-exec-2.c
    s390-linux-gnu      gcc.dg/atomic/c11-atomic-exec-3.c
    s390-linux-gnu      gcc.dg/ubsan/float-cast-overflow-atomic.c
    s390x-linux-gnu     gcc.c-torture/compile/20020304-1.c
    sh-linux-gnu        gcc.c-torture/execute/991118-1.c
    sh-linux-gnu        gcc.dg/torture/vshuf-v8qi.c
    sparc-linux-gnu     gcc.dg/pr56890-2.c
    sparc-linux-gnu     gcc.dg/torture/vshuf-v4hi.c
    sparc-linux-gnu     gcc.dg/torture/vshuf-v8hi.c
    sparc-linux-gnu     gcc.dg/torture/20181024-1.c
    sparc64-linux-gnu   gcc.dg/torture/20181024-1.c
    xstormy16-elf       gcc.c-torture/execute/strlen-5.c
    xstormy16-elf       gcc.c-torture/execute/20080424-1.c
    xstormy16-elf       gcc.c-torture/compile/pr60655-1.c
    xstormy16-elf       gcc.c-torture/compile/pr60655-2.c
    xstormy16-elf       gcc.dg/Wrestrict-9.c
    xstormy16-elf       gcc.dg/graphite/scop-15.c
    xstormy16-elf       gcc.dg/guality/pr43051-1.c
    xstormy16-elf       gcc.dg/torture/pr68955.c
    xstormy16-elf       gcc.dg/torture/pr58955-2.c
    xstormy16-elf       gcc.dg/tree-ssa/builtin-sprintf-warn-23.c

The s390x-linux-gnu test is one in which we have:

  116: {r167:DI=r86:DI-0x1000;clobber %cc:CC;}
      REG_DEAD r86:DI
      REG_UNUSED %cc:CC
  118: %r2:DI=[r167:DI+r155:DI+0x5]
      REG_DEAD r167:DI
      REG_DEAD r155:DI
      REG_EQUAL [r167:DI+0x1005]

and so the 0x1000s cancel each other out.  And yeah, you could
definitely argue that it's a problem elsewhere. :-)  Expand has:

;; _32 = BGl_equalzf3zf3zz__r4_equivalence_6_2z00 (_31, 2B);

(insn 113 112 114 (set (reg:DI 164)
        (const_int -4096 [0xfffffffffffff000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

(insn 114 113 115 (set (reg:DI 165)
        (reg:DI 164)) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

(insn 115 114 116 (set (reg:DI 166)
        (const_int 4096 [0x1000])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

(insn 116 115 117 (parallel [
            (set (reg:DI 167)
                (plus:DI (reg:DI 86 [ BgL_cdrzd21994zd2_959.10_27 ])
                    (reg:DI 165)))
            (clobber (reg:CC 33 %cc))
        ]) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

(insn 117 116 118 (set (reg:DI 3 %r3)
        (const_int 2 [0x2])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

(insn 118 117 119 (set (reg:DI 2 %r2)
        (mem/f/j:DI (plus:DI (plus:DI (reg:DI 167)
                    (reg:DI 166))
                (const_int 5 [0x5])) [2 _30->pair_t.cdr+0 S8 A64])) "gcc.c-torture/compile/20020304-1.c":161:9 -1
     (nil))

>> It might also be interesting to see how much difference it makes for
>> run-combine=4 (e.g. to see how much it makes up for the current 2-insn
>> limit)...
>
> Numbers are good :-)

FWIW, it does make more of a difference there, but not massively:

Target                 Tests   Delta    Best   Worst  Median
======                 =====   =====    ====   =====  ======
aarch64-linux-gnu          5     -15      -5      -1      -3
aarch64_be-linux-gnu       4     -14      -5      -2      -4
arc-elf                    1      -4      -4      -4      -4
arm-linux-gnueabi          4     -22     -10      -2      -8
arm-linux-gnueabihf        4     -22     -10      -2      -8
avr-elf                    1      -1      -1      -1      -1
bfin-elf                  25    -592    -223       3      -5
bpf-elf                   47    -508     -95      -1      -3
c6x-elf                   26    -388     -74       1      -4
cr16-elf                  18    -142     -82      -1      -2
csky-elf                   5     -10      -4      -1      -2
epiphany-elf              30    -514    -155      -1      -4
fr30-elf                  28    -416    -140      -1      -3
frv-linux-gnu             45   -1274    -209      -1      -4
ft32-elf                   7     -17      -6      -1      -2
h8300-elf                  3      -7      -5      -1      -1
hppa64-hp-hpux11.23        1      -1      -1      -1      -1
i686-apple-darwin          1      -3      -3      -3      -3
ia64-linux-gnu             8     -86     -26      -5     -10
iq2000-elf                 1      -2      -2      -2      -2
m32r-elf                  78   -1692    -308      -2      -4
mcore-elf                 58   -1117    -174       3      -5
mipsel-linux-gnu           7     -26      -8      -2      -3
mipsisa64-linux-gnu       30    -136     -18      -2      -3
mmix                       5      -7      -2      -1      -1
mn10300-elf                1      -2      -2      -2      -2
moxie-rtems               11     -35      -5      -2      -3
msp430-elf                 1      -1      -1      -1      -1
nds32le-elf               15    -142     -88      -1      -2
nios2-linux-gnu           22    -259    -110      -1      -4
nvptx-none                 2      -8      -4      -4      -4
or1k-elf                  34    -592    -160      -1      -3
powerpc64le-linux-gnu      1      -8      -8      -8      -8
riscv32-elf                4     -11      -6      -1      -2
riscv64-elf                2      -7      -6      -1      -6
rl78-elf                   1      -7      -7      -7      -7
rx-elf                     1      -2      -2      -2      -2
s390-linux-gnu            35     708     -12     292      -1
s390x-linux-gnu           15     -53      -6      -2      -3
sh-linux-gnu              38    -741    -141       2      -6
sparc-linux-gnu           26    -478    -156      -1      -7
sparc64-linux-gnu         10     -86     -28      -2      -4
vax-netbsdelf              1      -4      -4      -4      -4
visium-elf                30    -467    -159      -1      -4
x86_64-darwin              7     -24     -10      -1      -2
x86_64-linux-gnu           7     -26     -12      -1      -2
xstormy16-elf             15     -70     -45       2      -2
xtensa-elf                26    -682    -226      -2      -4

Thanks,
Richard

  reply	other threads:[~2019-11-25 21:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-18  0:21 Richard Sandiford
2019-11-18  2:04 ` Andrew Pinski
2019-11-18 17:58 ` Richard Sandiford
2019-11-23 23:01   ` Segher Boessenkool
2019-11-23 23:09     ` Nicholas Krause
2019-11-23 23:43       ` Segher Boessenkool
2019-11-24  0:11         ` Nicholas Krause
2019-11-25 22:09     ` Richard Sandiford
2019-11-25 22:52       ` Segher Boessenkool
2019-12-03 13:33         ` Oleg Endo
2019-12-03 18:05           ` Segher Boessenkool
2019-12-04 10:43             ` Oleg Endo
2019-12-06 22:51               ` Segher Boessenkool
2019-12-06 23:47                 ` Oleg Endo
2019-11-19  0:11 ` Segher Boessenkool
2019-11-19 11:36   ` Richard Sandiford
2019-11-19 21:13     ` Segher Boessenkool
2019-11-20 18:29       ` Richard Sandiford
2019-11-20 20:49         ` Segher Boessenkool
2019-11-21 21:12           ` Richard Sandiford
2019-11-22 16:39             ` Segher Boessenkool
2019-11-25 21:25               ` Richard Sandiford [this message]
2019-11-25 22:17                 ` Segher Boessenkool
2019-11-25 23:26                   ` Richard Sandiford
2019-11-26  1:44                     ` Segher Boessenkool
2019-11-27  8:32                       ` Richard Biener
2019-11-27 10:18                         ` Richard Sandiford
2019-11-27 19:36                         ` Segher Boessenkool
2019-11-21 19:02 ` Nicholas Krause
2019-11-21 19:47   ` Richard Sandiford
2019-11-22 16:27     ` Segher Boessenkool
2019-12-05 10:17 ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptftibac6z.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=xerofoify@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).