From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id C417C3890431; Fri, 21 Jun 2024 01:27:43 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C417C3890431
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1718933263;
	bh=pV0fm/SPuRKJ8xE9R+WG4B3aQ+rSUBnzG/5jgeJnJWc=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=eUgBl5Bsw7Xuybp4KDbJX5vQ2BgF7L82iBkd1vPiX08CB+0DxM/1Yvx0FsvAOe71O
	 2aLZWE8Bzn5eIzNdmJRZrrENAS6xAhA6IVmcI7puOgtOuATJLbQA2IvT4i0/PHaZ2F
	 xRDLt54FaTRF/N3T8LfcEzbcGHOSetsMb4l2OELU=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/115355] [12/13/14/15 Regression] vectorization exposes
 wrong code on P9 LE starting from r12-4496
Date: Fri, 21 Jun 2024 01:27:36 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.2.1
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.5
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-115355-4-LICOW0j88J@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-115355-4@http.gcc.gnu.org/bugzilla/>
References: <bug-115355-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115355
--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>:

https://gcc.gnu.org/g:52c112800d9f44457c4832309a48c00945811313

commit r15-1504-g52c112800d9f44457c4832309a48c00945811313
Author: Kewen Lin <linkw@linux.ibm.com>
Date:   Thu Jun 20 20:23:56 2024 -0500

    rs6000: Fix wrong RTL patterns for vector merge high/low word on LE

    Commit r12-4496 changes some define_expands and define_insns
    for vector merge high/low word, which are altivec_vmrg[hl]w,
    vsx_xxmrg[hl]w_<VSX_W:mode>.  These defines are mainly for
    built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw,
    __builtin_vsx_xxmrghw_4si and some internal gen function
    needs.  These functions should consider endianness, taking
    vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges
    the first halves (in element order) of two vectors", it does
    note it's in element order.  So it's mapped into vmrghw on
    BE while vmrglw on LE respectively.  Although the mapped
    insns are different, as the discussion in PR106069, the RTL
    pattern should be still the same, it is conformed before
    commit r12-4496, define_expand altivec_vmrghw got expanded
    into:

      (vec_select:VSX_W
         (vec_concat:<VS_double>
            (match_operand:VSX_W 1 "register_operand" "wa,v")
            (match_operand:VSX_W 2 "register_operand" "wa,v"))
            (parallel [(const_int 0) (const_int 4)
                       (const_int 1) (const_int 5)])))]

    on both BE and LE then.  But commit r12-4496 changed it to
    expand into:

      (vec_select:VSX_W
         (vec_concat:<VS_double>
            (match_operand:VSX_W 1 "register_operand" "wa,v")
            (match_operand:VSX_W 2 "register_operand" "wa,v"))
            (parallel [(const_int 0) (const_int 4)
                       (const_int 1) (const_int 5)])))]

    on BE, and

      (vec_select:VSX_W
         (vec_concat:<VS_double>
            (match_operand:VSX_W 1 "register_operand" "wa,v")
            (match_operand:VSX_W 2 "register_operand" "wa,v"))
            (parallel [(const_int 2) (const_int 6)
                       (const_int 3) (const_int 7)])))]

    on LE, although the mapped insn are still vmrghw on BE and
    vmrglw on LE, the associated RTL pattern is completely
    wrong and inconsistent with the mapped insn.  If optimization
    passes leave this pattern alone, even if its pattern doesn't
    represent its mapped insn, it's still fine, that's why simple
    testing on bif doesn't expose this issue.  But once some
    optimization pass such as combine does some changes basing
    on this wrong pattern, because the pattern doesn't match the
    semantics that the expanded insn is intended to represent,
    it would cause the unexpected result.

    So this patch is to fix the wrong RTL pattern, ensure the
    associated RTL patterns become the same as before which can
    have the same semantic as their mapped insns.  With the
    proposed patch, the expanders like altivec_vmrghw expands
    into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
    depending on endianness, "direct" can easily show which
    insn would be generated, _be and _le are mainly for the
    different RTL patterns as endianness.

    Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>

            PR target/106069
            PR target/115355

    gcc/ChangeLog:

            * config/rs6000/altivec.md (altivec_vmrghw_direct_<VSX_W:mode>):
Rename
            to ...
            (altivec_vmrghw_direct_<VSX_W:mode>_be): ... this.  Add the
condition
            BYTES_BIG_ENDIAN.
            (altivec_vmrghw_direct_<VSX_W:mode>_le): New define_insn.
            (altivec_vmrglw_direct_<VSX_W:mode>): Rename to ...
            (altivec_vmrglw_direct_<VSX_W:mode>_be): ... this.  Add the
condition
            BYTES_BIG_ENDIAN.
            (altivec_vmrglw_direct_<VSX_W:mode>_le): New define_insn.
            (altivec_vmrghw): Adjust by calling
gen_altivec_vmrghw_direct_v4si_be
            for BE and gen_altivec_vmrglw_direct_v4si_le for LE.
            (altivec_vmrglw): Adjust by calling
gen_altivec_vmrglw_direct_v4si_be
            for BE and gen_altivec_vmrghw_direct_v4si_le for LE.
            (vec_widen_umult_hi_v8hi): Adjust the call to
            gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE
            and by gen_altivec_vmrglw for LE.
            (vec_widen_smult_hi_v8hi): Likewise.
            (vec_widen_umult_lo_v8hi): Adjust the call to
            gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE
            and by gen_altivec_vmrghw for LE
            (vec_widen_smult_lo_v8hi): Likewise.
            * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Repl=
ace
            CODE_FOR_altivec_vmrghw_direct_v4si by
            CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and
            CODE_FOR_altivec_vmrghw_direct_v4si_le for LE.  And replace
            CODE_FOR_altivec_vmrglw_direct_v4si by
            CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and
            CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
            * config/rs6000/vsx.md (vsx_xxmrghw_<VSX_W:mode>): Adjust by
calling
            gen_altivec_vmrghw_direct_v4si_be for BE and
            gen_altivec_vmrglw_direct_v4si_le for LE.
            (vsx_xxmrglw_<VSX_W:mode>): Adjust by calling
            gen_altivec_vmrglw_direct_v4si_be for BE and
            gen_altivec_vmrghw_direct_v4si_le for LE.

    gcc/testsuite/ChangeLog:

            * g++.target/powerpc/pr106069.C: New test.
            * gcc.target/powerpc/pr115355.c: New test.=