From: Segher Boessenkool <segher@kernel.crashing.org>
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Don't combine param and return value copies
Date: Thu, 28 May 2015 15:33:00 -0000 [thread overview]
Message-ID: <20150528152220.GB15453@gate.crashing.org> (raw)
In-Reply-To: <20150526070746.GD14752@bubble.grove.modra.org>
On Tue, May 26, 2015 at 04:37:46PM +0930, Alan Modra wrote:
> On powerpc64le, modifying the way combine treats function parameters
> and call arguments results in some regressions.
>
> For instance, this testcase from varasm.c
>
> extern int foo3 (void *, ...);
> extern void foo4 (void *, const char *);
> int
> emit_tls_common (void *decl,
> const char *name,
> unsigned long size)
> {
> foo3 (0, "\t%s\t", "..");
> foo4 (0, name);
> foo3 (0, ",%lu,%u\n", size, ((unsigned int *)decl)[88] / 8);
> return 1;
> }
>
> at -O2 produces for the prologue and first call
>
> old new
> mflr 0 mflr 0
> std 29,-24(1) std 29,-24(1)
> std 30,-16(1) std 30,-16(1)
> mr 29,4 addis 9,2,.LC0@toc@ha
> std 31,-8(1) std 31,-8(1)
> addis 4,2,.LC1@toc@ha addis 10,2,.LC1@toc@ha
> mr 31,5 addi 9,9,.LC0@toc@l
> addis 5,2,.LC0@toc@ha addi 10,10,.LC1@toc@l
> mr 30,3 mr 30,3
> addi 5,5,.LC0@toc@l mr 29,4
> addi 4,4,.LC1@toc@l mr 31,5
> li 3,0 mr 4,10
> std 0,16(1) mr 5,9
> stdu 1,-128(1) std 0,16(1)
> bl foo3 stdu 1,-128(1)
> nop li 3,0
> bl foo3
> nop
>
> As you can see, we have some extra register shuffling from keeping a
> pseudo for arg setup insns. I guess the pseudos allow sched more
> freedom to mess around..
... and then RA isn't able to move things back. I see this happening
with all three changes (return value, incoming args, outgoing args);
the changes to combine give sched1 and RA more freedom, but those then
end up generating lots of unnecessary register moves.
> On the positive side, I saw cases where keeping parameter pseudos
> allowed shrink-wrap to occur. varasm.c:decode_reg_name_and_count is
> one of them. More shrink-wrapping is a big win.
>
> Here's a case where changes at the return result in poorer code
> int
> decl_readonly_section_1 (int category)
> {
> switch (category)
> {
> case 1:
> case 2:
> case 3:
> case 4:
> case 5:
> return 1;
> default:
> return 0;
> }
> }
> old new
> addi 9,3,-6 addi 9,3,-6
> neg 3,3 neg 3,3
> and 3,9,3 and 3,9,3
> rldicl 3,3,33,63 srwi 3,3,31
> blr rldicl 3,3,0,32
> blr
>
> Previously this:
> (insn 35 34 36 2 (set (reg:SI 161)
> (lshiftrt:SI (reg:SI 164)
> (const_int 31 [0x1f]))) {lshrsi3})
> (insn 36 35 23 2 (set (reg:DI 155 [ D.2441 ])
> (zero_extend:DI (reg:SI 161))) {zero_extendsidi2})
> (insn 23 36 24 2 (set (reg/i:DI 3 3)
> (reg:DI 155 [ D.2441 ])) {*movdi_internal64})
>
> is first combined to
> (insn 35 34 36 2 (set (reg:SI 161)
> (lshiftrt:SI (reg:SI 164)
> (const_int 31 [0x1f]))) {lshrsi3})
> (insn 23 35 24 2 (set (reg/i:DI 3 3)
> (and:DI (subreg:DI (reg:SI 161) 0)
> (const_int 1 [0x1]))))
> which is somewhat surprising, but from my previous forays into
> combine.c I'd say happens due to known zero bits. (Just looking at
> dumps here, rather than in gdb.)
>
> Then the above is further combined to
> (insn 23 34 24 2 (set (reg/i:DI 3 3)
> (zero_extract:DI (subreg:DI (reg:SI 164) 0)
> (const_int 1 [0x1])
> (const_int 32 [0x20])))
>
> Looks to me like a missed optimization opportunity that insns 35 and
> 36 aren't combined without first going through the intermediate step.
The rs6000 backend doesn't have zero_extend variants of many of its
patterns, only some. Well-known problem, long-term project.
> Anyway, here's the rewritten patch. I've left in some knobs I used
> when testing in case you want to see for yourself what happens with
> various options. Bootstrapped etc. powerpc64le-linux and
> x86_64-linux.
> +#define DONT_COMBINE_PARAMS 1
> +#define DONT_COMBINE_CALL_ARGS 1
I tested with all combinations of those knob settings, building Linux
kernels (mostly defconfigs); these are the resulting text sizes:
master alan00 alan10 alan01 alan11
5432728 5432728 5433848 5435472 5436080 alpha
3851131 3851391 3852495 3852567 3853755 arm
2190716 2190716 2190716 2190708 2190708 blackfin
2191439 2191503 2191983 2192335 2192751 c6x
2213186 2213250 2213154 2213482 2213546 cris
3322420 3322420 3322500 3322564 3322692 frv
10898664 10898664 10898664 10898664 10898664 i386
3253459 3253539 3253599 3255235 3255331 m32r
4708528 4708532 4709772 4708660 4709656 microblaze
3949689 3949745 3950269 3952401 3952857 mips
5693823 5693975 5694443 5697971 5698227 mips64
2374664 2374708 2375126 2375485 2375841 mn10300
7488219 7488419 7492299 7489743 7493431 parisc
6195727 6195935 6196367 6221647 6221687 parisc64
8688975 8689111 8691931 8692619 8695279 powerpc
20252077 20252337 20255185 20266897 20269477 powerpc64
11423734 11421858 11423946 11422726 11424838 s390
6488342 6488406 6488534 6489110 6489302 sh64
1545652 1545652 1545776 1545812 1545944 shnommu
3737005 3736973 3737357 3737585 3737945 sparc
6075342 6074426 6074762 6075570 6075834 sparc64
12301607 12301607 12301543 12301787 12301723 x86_64
1954029 1954061 1954441 1948841 1949305 xtensa
As you see, almost everything regresses code size. s390 and sparc and
xtensa like some of it; everything else generates more register moves.
A typical one for powerpc, with the "00" setting:
before after
rlwinm 3,9,N,N,N rlwinm 9,9,N,N,N
mr 3,9
(and then blr etc., no further uses of r3 or r9; nothing special about
rlwinm here, there also are "and" and "addi" cases, etc.)
I think we should at least ameliorate this regression before we can
apply any variant of this :-(
Segher
prev parent reply other threads:[~2015-05-28 15:22 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-23 11:20 Alan Modra
2015-05-23 11:20 ` Andrew Pinski
2015-05-23 19:40 ` Segher Boessenkool
2015-05-25 5:24 ` Alan Modra
2015-05-25 7:30 ` Alan Modra
2015-05-25 20:19 ` Segher Boessenkool
2015-05-26 8:15 ` Alan Modra
2015-05-28 15:33 ` Segher Boessenkool [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150528152220.GB15453@gate.crashing.org \
--to=segher@kernel.crashing.org \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).