From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9ED2A3864C6A; Mon, 24 Jun 2024 07:43:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9ED2A3864C6A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719215035; bh=DGiY8sBGWlfC8BXu3fkh2lIBNvXXJ0nsM1jHK41xQV4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=vUrAzg56SXOns0/e4F1xgJYUS4/OXKyKjgmu0zeqKF67pKALAK2kBQZovKawP/+Hg Kh9bT/nsnOd02dxNU4s2SgirgJSkOyUNbRJlpFi67z/Qvg14RyYtZWvF/1gZZbOeFc 4TabVMl5G52/Qr9uLgTmYRUdPHY0d9z3H6ZbjvgY= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/114575] [15 Regression] SVE addressing modes broken since g:839bc42772ba7af66af3bd16efed4a69511312ae Date: Mon, 24 Jun 2024 07:43:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: aarch64-sve, missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 15.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114575 --- Comment #4 from GCC Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7 commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7 Author: Richard Sandiford Date: Mon Jun 24 08:43:19 2024 +0100 Add a late-combine pass [PR106594] This patch adds a combine pass that runs late in the pipeline. There are two instances: one between combine and split1, and one after postreload. The pass currently has a single objective: remove definitions by substituting into all uses. The pre-RA version tries to restrict itself to cases that are likely to have a neutral or beneficial effect on register pressure. The patch fixes PR106594. It also fixes a few FAILs and XFAILs in the aarch64 test results, mostly due to making proper use of MOVPRFX in cases where we didn't previously. This is just a first step. I'm hoping that the pass could be used for other combine-related optimisations in future. In particular, the post-RA version doesn't need to restrict itself to cases where all uses are substitutable, since it doesn't have to worry about register pressure. If we did that, and if we extended it to handle multi-regist= er REGs, the pass might be a viable replacement for regcprop, which in turn might reduce the cost of having a post-RA instance of the new pass. On most targets, the pass is enabled by default at -O2 and above. However, it has a tendency to undo x86's STV and RPAD passes, by folding the more complex post-STV/RPAD form back into the simpler pre-pass form. Also, running a pass after register allocation means that we can now match define_insn_and_splits that were previously only matched before register allocation. This trips things like: (define_insn_and_split "..." [...pattern...] "...cond..." "#" "&& 1" [...pattern...] { ...unconditional use of gen_reg_rtx ()...; } because matching and splitting after RA will call gen_reg_rtx when pseudos are no longer allowed. rs6000 has several instances of this. xtensa has a variation in which the split condition is: "&& can_create_pseudo_p ()" The failure then is that, if we match after RA, we'll never be able to split the instruction. The patch therefore disables the pass by default on i386, rs6000 and xtensa. Hopefully we can fix those ports later (if their maintainers want). It seems better to add the pass first, though, to make it easier to test any such fixes. gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need quite a few updates for the late-combine output. That might be worth doing, but it seems too complex to do as part of this patch. I tried compiling at least one target per CPU directory and comparing the assembly output for parts of the GCC testsuite. This is just a way of getting a flavour of how the pass performs; it obviously isn't a meaningful benchmark. All targets seemed to improve on average: Target Tests Good Bad %Good Delta Median =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D=3D = =3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D aarch64-linux-gnu 2215 1975 240 89.16% -4159 -1 aarch64_be-linux-gnu 1569 1483 86 94.52% -10117 -1 alpha-linux-gnu 1454 1370 84 94.22% -9502 -1 amdgcn-amdhsa 5122 4671 451 91.19% -35737 -1 arc-elf 2166 1932 234 89.20% -37742 -1 arm-linux-gnueabi 1953 1661 292 85.05% -12415 -1 arm-linux-gnueabihf 1834 1549 285 84.46% -11137 -1 avr-elf 4789 4330 459 90.42% -441276 -4 bfin-elf 2795 2394 401 85.65% -19252 -1 bpf-elf 3122 2928 194 93.79% -8785 -1 c6x-elf 2227 1929 298 86.62% -17339 -1 cris-elf 3464 3270 194 94.40% -23263 -2 csky-elf 2915 2591 324 88.89% -22146 -1 epiphany-elf 2399 2304 95 96.04% -28698 -2 fr30-elf 7712 7299 413 94.64% -99830 -2 frv-linux-gnu 3332 2877 455 86.34% -25108 -1 ft32-elf 2775 2667 108 96.11% -25029 -1 h8300-elf 3176 2862 314 90.11% -29305 -2 hppa64-hp-hpux11.23 4287 4247 40 99.07% -45963 -2 ia64-linux-gnu 2343 1946 397 83.06% -9907 -2 iq2000-elf 9684 9637 47 99.51% -126557 -2 lm32-elf 2681 2608 73 97.28% -59884 -3 loongarch64-linux-gnu 1303 1218 85 93.48% -13375 -2 m32r-elf 1626 1517 109 93.30% -9323 -2 m68k-linux-gnu 3022 2620 402 86.70% -21531 -1 mcore-elf 2315 2085 230 90.06% -24160 -1 microblaze-elf 2782 2585 197 92.92% -16530 -1 mipsel-linux-gnu 1958 1827 131 93.31% -15462 -1 mipsisa64-linux-gnu 1655 1488 167 89.91% -16592 -2 mmix 4914 4814 100 97.96% -63021 -1 mn10300-elf 3639 3320 319 91.23% -34752 -2 moxie-rtems 3497 3252 245 92.99% -87305 -3 msp430-elf 4353 3876 477 89.04% -23780 -1 nds32le-elf 3042 2780 262 91.39% -27320 -1 nios2-linux-gnu 1683 1355 328 80.51% -8065 -1 nvptx-none 2114 1781 333 84.25% -12589 -2 or1k-elf 3045 2699 346 88.64% -14328 -2 pdp11 4515 4146 369 91.83% -26047 -2 pru-elf 1585 1245 340 78.55% -5225 -1 riscv32-elf 2122 2000 122 94.25% -101162 -2 riscv64-elf 1841 1726 115 93.75% -49997 -2 rl78-elf 2823 2530 293 89.62% -40742 -4 rx-elf 2614 2480 134 94.87% -18863 -1 s390-linux-gnu 1591 1393 198 87.55% -16696 -1 s390x-linux-gnu 2015 1879 136 93.25% -21134 -1 sh-linux-gnu 1870 1507 363 80.59% -9491 -1 sparc-linux-gnu 1123 1075 48 95.73% -14503 -1 sparc-wrs-vxworks 1121 1073 48 95.72% -14578 -1 sparc64-linux-gnu 1096 1021 75 93.16% -15003 -1 v850-elf 1897 1728 169 91.09% -11078 -1 vax-netbsdelf 3035 2995 40 98.68% -27642 -1 visium-elf 1392 1106 286 79.45% -7984 -2 xstormy16-elf 2577 2071 506 80.36% -13061 -1 gcc/ PR rtl-optimization/106594 PR rtl-optimization/114515 PR rtl-optimization/114575 PR rtl-optimization/114996 PR rtl-optimization/115104 * Makefile.in (OBJS): Add late-combine.o. * common.opt (flate-combine-instructions): New option. * doc/invoke.texi: Document it. * opts.cc (default_options_table): Enable it by default at -O2 and above. * tree-pass.h (make_pass_late_combine): Declare. * late-combine.cc: New file. * passes.def: Add two instances of late_combine. * doc/passes.texi: Document the new passes. * config/i386/i386-options.cc (ix86_override_options_after_chan= ge): Disable late-combine by default. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Likewise. * config/xtensa/xtensa.cc (xtensa_option_override): Likewise. gcc/testsuite/ PR rtl-optimization/106594 * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64 targets. * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise. * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap. * gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add -fno-late-combine-instructions. * gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise. * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs. * gcc.target/aarch64/sve/cond_convert_3.c: Likewise. * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise. * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /= Zs described in the comment. * gcc.target/aarch64/sve/cond_unary_4.c: Likewise. * gcc.target/aarch64/pr106594_1.c: New test.=