From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id C1E9C3858409 for ; Mon, 13 Dec 2021 14:10:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C1E9C3858409 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=lgwApjiUzdzkT+6DAGn5HP+RyQkuC6dOODeSvo+q4a4=; b=r7vkz0KAl8kL6JV9SUX2ZERHsk yeRILx0SKgw35TG2XcC20bA9TwoBPuKjI6y/IVtYZ6roisUGHfzq8kt0XxCwhgGH8aJXqt76aC7Gm 8MEXKPMsgxdRQCDkCv5za6mGVnrlyHfEGfTSE9zHyYuaJWLcOioh/oZUkAbOreCKo/aay3g4+58cT 4Mr/YCqglJG7XZtjTIRbB8jdigxVxmILbaWZfhPMFXMHuQuA82xAF6lp/OFdDPxWbVbq4PT6NJeYo O/o89XZmlRCdRsNGWqCUBzvQStrUFqPH3N8dk8egNhSxSiNfGFgqvxMjeQKb7p85fv3PMROiX0AKT ALs7oL+g==; Received: from [185.62.158.67] (port=61060 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mwm2Q-0000QW-9S; Mon, 13 Dec 2021 09:10:46 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI. Date: Mon, 13 Dec 2021 14:10:45 -0000 Message-ID: <018a01d7f02b$39537630$abfa6290$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_018B_01D7F02B.39537630" X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: AdfwKmumSr80YqX/Td20t6N4Tr7qTA== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2021 14:10:48 -0000 This is a multipart message in MIME format. ------=_NextPart_000_018B_01D7F02B.39537630 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit A common idiom is to create a DImode value from the "concat" of two SImode values, using "(long long)hi << 32 | (long long)lo", where the operation may be ior, xor or plus. On x86, with -m32, the high and low parts of a DImode register are actually different SImode registers (typically %edx and %eax) so ideally this idiom should reduce to two move instructions (or optimally, just clever register allocation). Unfortunately, GCC currently performs the IOR operation above on -m32, and worse allocates DImode registers (split to SImode register pairs) for both the zero extended HI and LO values. Hence, for test1 from the new test case below: typedef int __v4si __attribute__ ((__vector_size__ (16))); long long test1(__v4si v) { unsigned int loVal = (unsigned int)v[0]; unsigned int hiVal = (unsigned int)v[1]; return (long long)(loVal) | ((long long)(hiVal) << 32); } we currently generate (with -m32 -O2 -msse4.1): test1: subl $28, %esp pextrd $1, %xmm0, %eax pmovzxdq %xmm0, %xmm1 movq %xmm1, 8(%esp) movl %eax, %edx movl 8(%esp), %eax orl 12(%esp), %edx addl $28, %esp orb $0, %ah ret with this patch we now generate: test1: pextrd $1, %xmm0, %edx movd %xmm0, %eax ret The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior to register allocation on !TARGET_64BIT, simplifying this sequence to "highpart(dst) = hi; lowpart(dst) = lo". The one minor complication is that sse.md's define_insn for *vec_extractv4si_0_zext_sse4 can sometimes interfere with this optimization. It turns out that on !TARGET_64BIT, the zero_extend:DI following vec_select:SI isn't free, and this insn gets split back into multiple instructions during later passes, but too late to be optimized away by this patch/reload. Hence the last hunk of this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT. Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was first added, this seems reasonable (but this patch has been tested both with and without this last change, if it's consider controversial). This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without "--target_board='unix{-m32}'" with no new failures. OK for mainline? 2021-12-13 Roger Sayle gcc/ChangeLog PR target/103611 * config/i386/i386.md (any_or_plus): New code iterator. (define_split): Split (HI<<32)|zext(LO) into piece-wise move instructions on !TARGET_64BIT. * config/i386/sse.md (*vec_extractv4si_0_zext_sse4): Restrict to TARGET_64BIT. gcc/testsuite/ChangeLog PR target/103611 * gcc.target/i386/pr103611-2.c: New test case. Thanks in advance, Roger -- ------=_NextPart_000_018B_01D7F02B.39537630 Content-Type: text/plain; name="patchq23b.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchq23b.txt" diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md=0A= index 9d7d116..8ecf169 100644=0A= --- a/gcc/config/i386/i386.md=0A= +++ b/gcc/config/i386/i386.md=0A= @@ -10620,6 +10620,38 @@=0A= [(set_attr "isa" "*,nox64")=0A= (set_attr "type" "alu")=0A= (set_attr "mode" "QI")])=0A= +=0A= +;; Split DST =3D (HI<<32)|LO early to minimize register usage.=0A= +(define_code_iterator any_or_plus [plus ior xor])=0A= +(define_split=0A= + [(set (match_operand:DI 0 "register_operand")=0A= + (any_or_plus:DI=0A= + (ashift:DI (match_operand:DI 1 "register_operand")=0A= + (const_int 32))=0A= + (zero_extend:DI (match_operand:SI 2 "register_operand"))))]=0A= + "!TARGET_64BIT"=0A= + [(set (match_dup 3) (match_dup 4))=0A= + (set (match_dup 5) (match_dup 2))]=0A= +{=0A= + operands[3] =3D gen_highpart (SImode, operands[0]);=0A= + operands[4] =3D gen_lowpart (SImode, operands[1]);=0A= + operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= +})=0A= +=0A= +(define_split=0A= + [(set (match_operand:DI 0 "register_operand")=0A= + (any_or_plus:DI=0A= + (zero_extend:DI (match_operand:SI 1 "register_operand"))=0A= + (ashift:DI (match_operand:DI 2 "register_operand")=0A= + (const_int 32))))]=0A= + "!TARGET_64BIT"=0A= + [(set (match_dup 3) (match_dup 4))=0A= + (set (match_dup 5) (match_dup 1))]=0A= +{=0A= + operands[3] =3D gen_highpart (SImode, operands[0]);=0A= + operands[4] =3D gen_lowpart (SImode, operands[2]);=0A= + operands[5] =3D gen_lowpart (SImode, operands[0]);=0A= +})=0A= =0C=0A= ;; Negation instructions=0A= =0A= diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md=0A= index 5421fb5..fba0250 100644=0A= --- a/gcc/config/i386/sse.md=0A= +++ b/gcc/config/i386/sse.md=0A= @@ -18700,7 +18700,7 @@=0A= (vec_select:SI=0A= (match_operand:V4SI 1 "register_operand" "v,x,v")=0A= (parallel [(const_int 0)]))))]=0A= - "TARGET_SSE4_1"=0A= + "TARGET_64BIT && TARGET_SSE4_1"=0A= "#"=0A= [(set_attr "isa" "x64,*,avx512f")=0A= (set (attr "preferred_for_speed")=0A= diff --git a/gcc/testsuite/gcc.target/i386/pr103611-2.c = b/gcc/testsuite/gcc.target/i386/pr103611-2.c=0A= new file mode 100644=0A= index 0000000..1555e99=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/i386/pr103611-2.c=0A= @@ -0,0 +1,43 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-m32 -O2 -msse4" } */=0A= +typedef int __v4si __attribute__ ((__vector_size__ (16)));=0A= +=0A= +long long test1(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[0];=0A= + unsigned int hiVal =3D (unsigned int)v[1];=0A= + return (long long)(loVal) | ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +long long test2(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[2];=0A= + unsigned int hiVal =3D (unsigned int)v[3];=0A= + return (long long)(loVal) | ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +long long test3(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[0];=0A= + unsigned int hiVal =3D (unsigned int)v[1];=0A= + return (long long)(loVal) ^ ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +long long test4(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[2];=0A= + unsigned int hiVal =3D (unsigned int)v[3];=0A= + return (long long)(loVal) ^ ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +long long test5(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[0];=0A= + unsigned int hiVal =3D (unsigned int)v[1];=0A= + return (long long)(loVal) + ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +long long test6(__v4si v) {=0A= + unsigned int loVal =3D (unsigned int)v[2];=0A= + unsigned int hiVal =3D (unsigned int)v[3];=0A= + return (long long)(loVal) + ((long long)(hiVal) << 32);=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-not "\tor" } } */=0A= +/* { dg-final { scan-assembler-not "\txor" } } */=0A= +/* { dg-final { scan-assembler-not "\tadd" } } */=0A= ------=_NextPart_000_018B_01D7F02B.39537630--