From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 9ED2A3864C6A; Mon, 24 Jun 2024 07:43:55 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9ED2A3864C6A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1719215035;
	bh=DGiY8sBGWlfC8BXu3fkh2lIBNvXXJ0nsM1jHK41xQV4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=vUrAzg56SXOns0/e4F1xgJYUS4/OXKyKjgmu0zeqKF67pKALAK2kBQZovKawP/+Hg
	 Kh9bT/nsnOd02dxNU4s2SgirgJSkOyUNbRJlpFi67z/Qvg14RyYtZWvF/1gZZbOeFc
	 4TabVMl5G52/Qr9uLgTmYRUdPHY0d9z3H6ZbjvgY=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/114575] [15 Regression] SVE addressing modes
 broken since g:839bc42772ba7af66af3bd16efed4a69511312ae
Date: Mon, 24 Jun 2024 07:43:54 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: aarch64-sve, missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 15.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114575-4-gKv9DVdeI2@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114575-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114575-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114575
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.or=
g>:

https://gcc.gnu.org/g:792f97b44ffc5e6a967292b3747fd835e99396e7

commit r15-1579-g792f97b44ffc5e6a967292b3747fd835e99396e7
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Mon Jun 24 08:43:19 2024 +0100

    Add a late-combine pass [PR106594]

    This patch adds a combine pass that runs late in the pipeline.
    There are two instances: one between combine and split1, and one
    after postreload.

    The pass currently has a single objective: remove definitions by
    substituting into all uses.  The pre-RA version tries to restrict
    itself to cases that are likely to have a neutral or beneficial
    effect on register pressure.

    The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
    in the aarch64 test results, mostly due to making proper use of
    MOVPRFX in cases where we didn't previously.

    This is just a first step.  I'm hoping that the pass could be
    used for other combine-related optimisations in future.  In particular,
    the post-RA version doesn't need to restrict itself to cases where all
    uses are substitutable, since it doesn't have to worry about register
    pressure.  If we did that, and if we extended it to handle multi-regist=
er
    REGs, the pass might be a viable replacement for regcprop, which in
    turn might reduce the cost of having a post-RA instance of the new pass.

    On most targets, the pass is enabled by default at -O2 and above.
    However, it has a tendency to undo x86's STV and RPAD passes,
    by folding the more complex post-STV/RPAD form back into the
    simpler pre-pass form.

    Also, running a pass after register allocation means that we can
    now match define_insn_and_splits that were previously only matched
    before register allocation.  This trips things like:

      (define_insn_and_split "..."
        [...pattern...]
        "...cond..."
        "#"
        "&& 1"
        [...pattern...]
        {
          ...unconditional use of gen_reg_rtx ()...;
        }

    because matching and splitting after RA will call gen_reg_rtx when
    pseudos are no longer allowed.  rs6000 has several instances of this.

    xtensa has a variation in which the split condition is:

        "&& can_create_pseudo_p ()"

    The failure then is that, if we match after RA, we'll never be
    able to split the instruction.

    The patch therefore disables the pass by default on i386, rs6000
    and xtensa.  Hopefully we can fix those ports later (if their
    maintainers want).  It seems better to add the pass first, though,
    to make it easier to test any such fixes.

    gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
    quite a few updates for the late-combine output.  That might be
    worth doing, but it seems too complex to do as part of this patch.

    I tried compiling at least one target per CPU directory and comparing
    the assembly output for parts of the GCC testsuite.  This is just a way
    of getting a flavour of how the pass performs; it obviously isn't a
    meaningful benchmark.  All targets seemed to improve on average:

    Target                 Tests   Good    Bad   %Good   Delta  Median
    =3D=3D=3D=3D=3D=3D                 =3D=3D=3D=3D=3D   =3D=3D=3D=3D    =
=3D=3D=3D   =3D=3D=3D=3D=3D   =3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D=3D
    aarch64-linux-gnu       2215   1975    240  89.16%   -4159      -1
    aarch64_be-linux-gnu    1569   1483     86  94.52%  -10117      -1
    alpha-linux-gnu         1454   1370     84  94.22%   -9502      -1
    amdgcn-amdhsa           5122   4671    451  91.19%  -35737      -1
    arc-elf                 2166   1932    234  89.20%  -37742      -1
    arm-linux-gnueabi       1953   1661    292  85.05%  -12415      -1
    arm-linux-gnueabihf     1834   1549    285  84.46%  -11137      -1
    avr-elf                 4789   4330    459  90.42% -441276      -4
    bfin-elf                2795   2394    401  85.65%  -19252      -1
    bpf-elf                 3122   2928    194  93.79%   -8785      -1
    c6x-elf                 2227   1929    298  86.62%  -17339      -1
    cris-elf                3464   3270    194  94.40%  -23263      -2
    csky-elf                2915   2591    324  88.89%  -22146      -1
    epiphany-elf            2399   2304     95  96.04%  -28698      -2
    fr30-elf                7712   7299    413  94.64%  -99830      -2
    frv-linux-gnu           3332   2877    455  86.34%  -25108      -1
    ft32-elf                2775   2667    108  96.11%  -25029      -1
    h8300-elf               3176   2862    314  90.11%  -29305      -2
    hppa64-hp-hpux11.23     4287   4247     40  99.07%  -45963      -2
    ia64-linux-gnu          2343   1946    397  83.06%   -9907      -2
    iq2000-elf              9684   9637     47  99.51% -126557      -2
    lm32-elf                2681   2608     73  97.28%  -59884      -3
    loongarch64-linux-gnu   1303   1218     85  93.48%  -13375      -2
    m32r-elf                1626   1517    109  93.30%   -9323      -2
    m68k-linux-gnu          3022   2620    402  86.70%  -21531      -1
    mcore-elf               2315   2085    230  90.06%  -24160      -1
    microblaze-elf          2782   2585    197  92.92%  -16530      -1
    mipsel-linux-gnu        1958   1827    131  93.31%  -15462      -1
    mipsisa64-linux-gnu     1655   1488    167  89.91%  -16592      -2
    mmix                    4914   4814    100  97.96%  -63021      -1
    mn10300-elf             3639   3320    319  91.23%  -34752      -2
    moxie-rtems             3497   3252    245  92.99%  -87305      -3
    msp430-elf              4353   3876    477  89.04%  -23780      -1
    nds32le-elf             3042   2780    262  91.39%  -27320      -1
    nios2-linux-gnu         1683   1355    328  80.51%   -8065      -1
    nvptx-none              2114   1781    333  84.25%  -12589      -2
    or1k-elf                3045   2699    346  88.64%  -14328      -2
    pdp11                   4515   4146    369  91.83%  -26047      -2
    pru-elf                 1585   1245    340  78.55%   -5225      -1
    riscv32-elf             2122   2000    122  94.25% -101162      -2
    riscv64-elf             1841   1726    115  93.75%  -49997      -2
    rl78-elf                2823   2530    293  89.62%  -40742      -4
    rx-elf                  2614   2480    134  94.87%  -18863      -1
    s390-linux-gnu          1591   1393    198  87.55%  -16696      -1
    s390x-linux-gnu         2015   1879    136  93.25%  -21134      -1
    sh-linux-gnu            1870   1507    363  80.59%   -9491      -1
    sparc-linux-gnu         1123   1075     48  95.73%  -14503      -1
    sparc-wrs-vxworks       1121   1073     48  95.72%  -14578      -1
    sparc64-linux-gnu       1096   1021     75  93.16%  -15003      -1
    v850-elf                1897   1728    169  91.09%  -11078      -1
    vax-netbsdelf           3035   2995     40  98.68%  -27642      -1
    visium-elf              1392   1106    286  79.45%   -7984      -2
    xstormy16-elf           2577   2071    506  80.36%  -13061      -1

    gcc/
            PR rtl-optimization/106594
            PR rtl-optimization/114515
            PR rtl-optimization/114575
            PR rtl-optimization/114996
            PR rtl-optimization/115104
            * Makefile.in (OBJS): Add late-combine.o.
            * common.opt (flate-combine-instructions): New option.
            * doc/invoke.texi: Document it.
            * opts.cc (default_options_table): Enable it by default at -O2
            and above.
            * tree-pass.h (make_pass_late_combine): Declare.
            * late-combine.cc: New file.
            * passes.def: Add two instances of late_combine.
            * doc/passes.texi: Document the new passes.
            * config/i386/i386-options.cc (ix86_override_options_after_chan=
ge):
            Disable late-combine by default.
            * config/rs6000/rs6000.cc (rs6000_option_override_internal):
Likewise.
            * config/xtensa/xtensa.cc (xtensa_option_override): Likewise.

    gcc/testsuite/
            PR rtl-optimization/106594
            * gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
            targets.
            * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
            * gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
            * gcc.target/aarch64/bitfield-bitint-abi-align16.c: Add
            -fno-late-combine-instructions.
            * gcc.target/aarch64/bitfield-bitint-abi-align8.c: Likewise.
            * gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
            * gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
            * gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
            * gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /=
Zs
            described in the comment.
            * gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
            * gcc.target/aarch64/pr106594_1.c: New test.=