From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 127764 invoked by alias); 23 Nov 2019 22:34:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 127754 invoked by uid 89); 23 Nov 2019 22:34:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=hour, sparc64, regressed, finalc X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 23 Nov 2019 22:34:55 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id xANMYokF021227; Sat, 23 Nov 2019 16:34:51 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id xANMYo7o021225; Sat, 23 Nov 2019 16:34:50 -0600 Date: Sat, 23 Nov 2019 23:01:00 -0000 From: Segher Boessenkool To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Re: Add a new combine pass Message-ID: <20191123223449.GP9491@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg02267.txt.bz2 Hi! On Mon, Nov 18, 2019 at 05:55:13PM +0000, Richard Sandiford wrote: > Richard Sandiford writes: > > (It's 23:35 local time, so it's still just about stage 1. :-)) > > Or actually, just under 1 day after end of stage 1. Oops. > Could have sworn stage 1 ended on the 17th :-( Only realised > I'd got it wrong when catching up on Saturday's email traffic. > > And inevitably, I introduced a couple of stupid mistakes while > trying to clean the patch up for submission by that (non-)deadline. > Here's a version that fixes an inverted overlapping memref check > and that correctly prunes the use list for combined instructions. > (This last one is just a compile-time saving -- the old code was > correct, just suboptimal.) I've build the Linux kernel with the previous version, as well as this one. R0 is unmodified GCC, R1 is the first patch, R2 is this one: (I've forced --param=run-combine=6 for R1 and R2): (Percentages are relative to R0): R0 R1 R2 R1 R2 alpha 6107088 6101088 6101088 99.902% 99.902% arc 4008224 4006568 4006568 99.959% 99.959% arm 9206728 9200936 9201000 99.937% 99.938% arm64 13056174 13018174 13018194 99.709% 99.709% armhf 0 0 0 0 0 c6x 2337237 2337077 2337077 99.993% 99.993% csky 3356602 0 0 0 0 h8300 1166996 1166776 1166776 99.981% 99.981% i386 11352159 0 0 0 0 ia64 18230640 18167000 18167000 99.651% 99.651% m68k 3714271 0 0 0 0 microblaze 4982749 4979945 4979945 99.944% 99.944% mips 8499309 8495205 8495205 99.952% 99.952% mips64 7042036 7039816 7039816 99.968% 99.968% nds32 4486663 0 0 0 0 nios2 3680001 3679417 3679417 99.984% 99.984% openrisc 4226076 4225868 4225868 99.995% 99.995% parisc 7681895 7680063 7680063 99.976% 99.976% parisc64 8677077 8676581 8676581 99.994% 99.994% powerpc 10687611 10682199 10682199 99.949% 99.949% powerpc64 17671082 17658570 17658570 99.929% 99.929% powerpc64le 17671082 17658570 17658570 99.929% 99.929% riscv32 1554938 1554758 1554758 99.988% 99.988% riscv64 6634342 6632788 6632788 99.977% 99.977% s390 13049643 13014939 13014939 99.734% 99.734% sh 3254743 0 0 0 0 shnommu 1632364 1632124 1632124 99.985% 99.985% sparc 4404993 4399593 4399593 99.877% 99.877% sparc64 6796711 6797491 6797491 100.011% 100.011% x86_64 19713174 19712817 19712817 99.998% 99.998% xtensa 0 0 0 0 0 0 means it didn't build. armhf is probably my own problem, not sure yet. xtensa starts with /tmp/ccmJoY7l.s: Assembler messages: /tmp/ccmJoY7l.s:407: Error: cannot represent `BFD_RELOC_8' relocation in object file and it doesn't get better. My powerpc64 config actually built the powerpc64le config, since the kernel since a while looks what the host system is, for its defconfig. Oh well, fixed now. There are fivew new failures, with either of the combine2 patches. And all five are actually different (different symptoms, at least): - csky fails on libgcc build: /home/segher/src/gcc/libgcc/fp-bit.c: In function '__fixdfsi': /home/segher/src/gcc/libgcc/fp-bit.c:1405:1: error: unable to generate reloads for: 1405 | } | ^ (insn 199 86 87 8 (parallel [ (set (reg:SI 101) (plus:SI (reg:SI 98) (const_int -32 [0xffffffffffffffe0]))) (set (reg:CC 33 c) (lt:CC (plus:SI (reg:SI 98) (const_int -32 [0xffffffffffffffe0])) (const_int 0 [0]))) ]) "/home/segher/src/gcc/libgcc/fp-bit.c":1403:23 207 {*cskyv2_declt} (nil)) during RTL pass: reload Target problem? - i386 goes into an infinite loop compiling, or at least an hour or so... Erm I forgot too record what it was compiling. I did attach a GDB... It is something from lra_create_live_ranges. - m68k: /home/segher/src/kernel/fs/exec.c: In function 'copy_strings': /home/segher/src/kernel/fs/exec.c:590:1: internal compiler error: in final_scan_insn_1, at final.c:3048 590 | } | ^ 0x10408307 final_scan_insn_1 /home/segher/src/gcc/gcc/final.c:3048 0x10408383 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) /home/segher/src/gcc/gcc/final.c:3152 0x10408797 final_1 /home/segher/src/gcc/gcc/final.c:2020 0x104091f7 rest_of_handle_final /home/segher/src/gcc/gcc/final.c:4658 0x104091f7 execute /home/segher/src/gcc/gcc/final.c:4736 and that line is gcc_assert (prev_nonnote_insn (insn) == last_ignored_compare); - nds32: /tmp/ccC8Czca.s: Assembler messages: /tmp/ccC8Czca.s:3144: Error: Unrecognized operand/register, lmw.bi [$fp+(-60)],[$fp],$r11,0x0. /tmp/ccl8o20c.s: Assembler messages: /tmp/ccl8o20c.s:2449: Error: Unrecognized operand/register, lmw.bi $r9,[$fp],[$fp+(-132)],0x0. /tmp/ccZxjwHd.s: Assembler messages: /tmp/ccZxjwHd.s:4776: Error: Unrecognized operand/register, lmw.bi [$fp+(-52)],[$fp],[$fp+(-56)],0x0. /tmp/cczjOS3d.s: Assembler messages: /tmp/cczjOS3d.s:2336: Error: Unrecognized operand/register, lmw.bi $r16,[$fp],$r7,0x0. and more. All lmw.bi... target issue? - sh (that's sh4-linux): /home/segher/src/kernel/net/ipv4/af_inet.c: In function 'snmp_get_cpu_field': /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find a register to spill in class 'R0_REGS' 1638 | } | ^ /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the insn: (insn 18 17 19 2 (set (reg:SI 0 r0) (mem:SI (plus:SI (reg:SI 4 r4 [178]) (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i} (expr_list:REG_DEAD (reg:SI 4 r4 [178]) (expr_list:REG_DEAD (reg:SI 6 r6 [171]) (nil)))) /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier errors, bailing out Looking at just binary size, which is a good stand-in for how many insns it combined: R2 arm64 99.709% ia64 99.651% s390 99.734% sparc 99.877% sparc64 100.011% (These are those that are not between 99.9% and 100.0%). So only sparc64 regressed, and just a tiny bit (I can look at what that is, if there is interest). But 32-bit sparc improved, and s390, arm64, and ia64 got actual benefit. Again this is just code size, not analysing the actually changed code. I did look at the powerpc64le changes. It is almost completely load- with-update (and store-with-update) insns that make the difference, but there are also some dot insns. The extra mr. are usually not a good idea, but the extsw. are. Sometimes this causes *more* insns in the end (register move insns), but that is the exception. This mr. problem is there with combine already, btw. In the end it is caused by this just not being something good to do on pseudos, it would be better to do this after RA, in a peephole or similar. OTOH it isn't actually really important for performance either way. Btw, does the new pass use TARGET_LEGITIMATE_COMBINED_INSN? It probably should. (That would be the hook where we would probably want to prevent generating mr. insns). Segher