From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 127433858C54; Thu,  2 Mar 2023 13:26:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 127433858C54
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1677763598;
	bh=Z4DiVWuYCGcOGPTH+mu4nfcfoBJ0nBt3aN9wu7PhCfY=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=awF1uZyZfLqeecZfo+JHRAayxlVAapngzKPVHNiiz7wTPSfs3XZ5z0Y3OJGVtHzme
	 BUtSJVWkwbWKsYtniYZJtdvsMosLOS3YT9fV7S9dCRJsrvMQwz0LEGtnalJjlN2yKZ
	 pEOmc+EdNxPwKhpjYYJ6frr/RooiL8fxkMxL4v5w=
From: "segher at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/106770] powerpc64le: Unnecessary xxpermdi before mfvsrd
Date: Thu, 02 Mar 2023 13:26:37 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: segher at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jskumari at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106770-4-OzleuQ0MWc@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106770-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106770-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106770
--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jens Seifert from comment #6)
> The left part of VSX registers overlaps with floating point registers, th=
at
> is why no register xxpermdi is required and mfvsrd can access all (left)
> parts of VSX registers directly.

The mfvsrd instruction was invented before ELFv2 (at the same time as mfvsr=
wz).
Everything in common use was big-endian then.  The insns to move GPR->VSR t=
hat
initially existed were mtvstrd and mtvsrw[az], all of which write to dword =
0 of
the target VSR.

Dword 0 of vector regs is where 64-bit entities in vector regs are stored in
the ABIs, sure, and that corresponds to the FPRs in the ISA.  mtvsrdd and
mtvsrws
were added in ISA 3.0 (p9), together with mfvsrld, to make little-endian wo=
rk
better with little-endian ELFv2.

> The xxpermdi x,y,y,3 indicates to me that gcc prefers right part of regis=
ter
> which might also cause the xxpermdi at the beginning.

And with -mbig you get ,2 here.  It is accidental.

> At the end the mystery
> is why gcc adds 3 xxpermdi to the code.

As I said, this is constructed during expand, to make correct code.  That is
all
that expand should do: make correct (and well-optimisable, "open structured=
",
easy to transform, code).  We should be able to optimise this to something
better in later passes that *are* supposed to make faster code.  Like the p8
swaps pass, which mostly zaps unnecessary pairs of swaps, or the swiss army
bazooka combine, or even many earlier passes if such an xxpermdi insn is tr=
uly
superfluous.  It usually is not, we are dealing with the full 128-bit VSRs
there, there is no way of saying we do not care about part of the register
contents.  Making infra for that is big work.

We can make things easier by expressing things as 64 bit earlier.  We can (=
and
should) also investigate why the mfvsrd is not combined (as in, what the
instruction combiner pass does) with the xxpermdi.  There are many things n=
ot
quite perfect here.=