From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 0A6143858D28
 for <gcc-patches@gcc.gnu.org>; Tue, 14 Dec 2021 10:34:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0A6143858D28
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 853D96D;
 Tue, 14 Dec 2021 02:34:51 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0EF9A3F5A1;
 Tue, 14 Dec 2021 02:34:50 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Mail-Followup-To: "Roger Sayle" <roger@nextmovesoftware.com>,
 "'GCC Patches'" <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86: PR target/103611: Splitter for DST:DI =
 (HI:SI<<32)|LO:SI.
References: <018a01d7f02b$39537630$abfa6290$@nextmovesoftware.com>
Date: Tue, 14 Dec 2021 10:34:49 +0000
In-Reply-To: <018a01d7f02b$39537630$abfa6290$@nextmovesoftware.com> (Roger
 Sayle's message of "Mon, 13 Dec 2021 14:10:45 -0000")
Message-ID: <mptlf0njwza.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Dec 2021 10:34:54 -0000

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> A common idiom is to create a DImode value from the "concat" of two SImode
> values, using "(long long)hi << 32 | (long long)lo", where the operation
> may be ior, xor or plus.  On x86, with -m32, the high and low parts of
> a DImode register are actually different SImode registers (typically %edx
> and %eax) so ideally this idiom should reduce to two move instructions
> (or optimally, just clever register allocation).
>
> Unfortunately, GCC currently performs the IOR operation above on -m32,
> and worse allocates DImode registers (split to SImode register pairs)
> for both the zero extended HI and LO values.
>
> Hence, for test1 from the new test case below:
>
> typedef int __v4si __attribute__ ((__vector_size__ (16)));
> long long test1(__v4si v) {
>   unsigned int loVal = (unsigned int)v[0];
>   unsigned int hiVal = (unsigned int)v[1];
>   return (long long)(loVal) | ((long long)(hiVal) << 32);
> }
>
> we currently generate (with -m32 -O2 -msse4.1):
>
> test1:  subl    $28, %esp
>         pextrd  $1, %xmm0, %eax
>         pmovzxdq        %xmm0, %xmm1
>         movq    %xmm1, 8(%esp)
>         movl    %eax, %edx
>         movl    8(%esp), %eax
>         orl     12(%esp), %edx
>         addl    $28, %esp
>         orb     $0, %ah
>         ret
>
> with this patch we now generate:
>
> test1:  pextrd  $1, %xmm0, %edx
>         movd    %xmm0, %eax
>         ret
>
> The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior
> to register allocation on !TARGET_64BIT, simplifying this sequence to
> "highpart(dst) = hi; lowpart(dst) = lo".
>
> The one minor complication is that sse.md's define_insn for
> *vec_extractv4si_0_zext_sse4 can sometimes interfere with this
> optimization.  It turns out that on !TARGET_64BIT, the zero_extend:DI
> following vec_select:SI isn't free, and this insn gets split back
> into multiple instructions during later passes, but too late to
> be optimized away by this patch/reload.  Hence the last hunk of
> this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT.
> Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was
> first added, this seems reasonable (but this patch has been tested
> both with and without this last change, if it's consider controversial).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without "--target_board='unix{-m32}'"
> with no new failures.  OK for mainline?

To play devil's advocate: does this optimisation belong in
target-specific code?  It feels pretty generic and could be keyed
off BITS_PER_WORD.

Thanks,
Richard

>
>
> 2021-12-13  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> 	PR target/103611
> 	* config/i386/i386.md (any_or_plus): New code iterator.
> 	(define_split): Split (HI<<32)|zext(LO) into piece-wise
> 	move instructions on !TARGET_64BIT.
> 	* config/i386/sse.md (*vec_extractv4si_0_zext_sse4):
> 	Restrict to TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
> 	PR target/103611
> 	* gcc.target/i386/pr103611-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --