From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 127433858C54; Thu, 2 Mar 2023 13:26:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 127433858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677763598; bh=Z4DiVWuYCGcOGPTH+mu4nfcfoBJ0nBt3aN9wu7PhCfY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=awF1uZyZfLqeecZfo+JHRAayxlVAapngzKPVHNiiz7wTPSfs3XZ5z0Y3OJGVtHzme BUtSJVWkwbWKsYtniYZJtdvsMosLOS3YT9fV7S9dCRJsrvMQwz0LEGtnalJjlN2yKZ pEOmc+EdNxPwKhpjYYJ6frr/RooiL8fxkMxL4v5w= From: "segher at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/106770] powerpc64le: Unnecessary xxpermdi before mfvsrd Date: Thu, 02 Mar 2023 13:26:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: segher at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jskumari at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106770 --- Comment #11 from Segher Boessenkool --- (In reply to Jens Seifert from comment #6) > The left part of VSX registers overlaps with floating point registers, th= at > is why no register xxpermdi is required and mfvsrd can access all (left) > parts of VSX registers directly. The mfvsrd instruction was invented before ELFv2 (at the same time as mfvsr= wz). Everything in common use was big-endian then. The insns to move GPR->VSR t= hat initially existed were mtvstrd and mtvsrw[az], all of which write to dword = 0 of the target VSR. Dword 0 of vector regs is where 64-bit entities in vector regs are stored in the ABIs, sure, and that corresponds to the FPRs in the ISA. mtvsrdd and mtvsrws were added in ISA 3.0 (p9), together with mfvsrld, to make little-endian wo= rk better with little-endian ELFv2. > The xxpermdi x,y,y,3 indicates to me that gcc prefers right part of regis= ter > which might also cause the xxpermdi at the beginning. And with -mbig you get ,2 here. It is accidental. > At the end the mystery > is why gcc adds 3 xxpermdi to the code. As I said, this is constructed during expand, to make correct code. That is all that expand should do: make correct (and well-optimisable, "open structured= ", easy to transform, code). We should be able to optimise this to something better in later passes that *are* supposed to make faster code. Like the p8 swaps pass, which mostly zaps unnecessary pairs of swaps, or the swiss army bazooka combine, or even many earlier passes if such an xxpermdi insn is tr= uly superfluous. It usually is not, we are dealing with the full 128-bit VSRs there, there is no way of saying we do not care about part of the register contents. Making infra for that is big work. We can make things easier by expressing things as 64 bit earlier. We can (= and should) also investigate why the mfvsrd is not combined (as in, what the instruction combiner pass does) with the xxpermdi. There are many things n= ot quite perfect here.=