From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 1A9763858D37 for ; Tue, 14 Mar 2023 16:09:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1A9763858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1678810164; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=/XS31rW689rkZh0eZCKUIb+jnfNUgn172uknjQMtpfc=; b=QaqnucfpYfs87Chi8OruOiJxjrxZC47ihx5kn66MC8aH3cS3Qean8owATLN8QtiSFDjgHu IUEiS1TX8KOF9yUEN9v/uBOFP2CyktF0UWS+DG9H5sPLLG/oDmUhXcsxUPbdB5k8fh1VZH scXEX7EoD6ozi+NQJnZ9rWbnk5uBFAM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-qHHLa_vFP4aFw9MpbF8a1A-1; Tue, 14 Mar 2023 12:09:23 -0400 X-MC-Unique: qHHLa_vFP4aFw9MpbF8a1A-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 26B99800B23; Tue, 14 Mar 2023 16:09:23 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C2AD72A68; Tue, 14 Mar 2023 16:09:22 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 32EG9K6p916571 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 14 Mar 2023 17:09:20 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 32EG9JYB916545; Tue, 14 Mar 2023 17:09:19 +0100 Date: Tue, 14 Mar 2023 17:09:19 +0100 From: Jakub Jelinek To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] i386: Fix up split_double_concat [PR109109] Message-ID: Reply-To: Jakub Jelinek MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! In my PR107627 change I've missed one important case, which causes miscompilation of f4 and f6 in the following tests. Combine matches there *concatsidi3_3 define_insn_and_split (as with all other f* functions in those tests), and RA ends up with: (insn 11 10 17 2 (set (reg:DI 0 ax [89]) (ior:DI (ashift:DI (zero_extend:DI (mem:SI (plus:SI (mult:SI (reg:SI 0 ax [94]) (const_int 4 [0x4])) (symbol_ref:SI ("arr") [flags 0x2] )) [1 arr[ax_6(D)]+0 S4 A32])) (const_int 32 [0x20])) (zero_extend:DI (reg:SI 1 dx [95])))) "pr109109-6.c":24:49 681 {*concatsidi3_3} (nil)) split_double_concat turned that into: movl arr(,%eax,4), %edx movl %edx, %eax which is incorrect, because the first instruction overrides the input %edx value that should be put into output %eax; the two insns can't be swapped because the MEM's address uses %eax. The following patch fixes that case to emit movl arr(,%eax,4), %eax xchgl %edx, %eax instead. Bootstrap/regtest on x86_64-linux and i686-linux pending, ok for trunk if it passes on both? 2023-03-14 Jakub Jelinek PR target/109109 * config/i386/i386-expand.cc (split_double_concat): Fix splitting when lo is equal to dhi and hi is a MEM which uses dlo register. * gcc.target/i386/pr109109-1.c: New test. * gcc.target/i386/pr109109-2.c: New test. --- gcc/config/i386/i386-expand.cc.jj 2023-02-18 12:39:58.334768946 +0100 +++ gcc/config/i386/i386-expand.cc 2023-03-14 15:07:38.672919652 +0100 @@ -197,9 +197,20 @@ split_double_concat (machine_mode mode, { /* In this case, code below would first emit_move_insn (dlo, lo) and then emit_move_insn (dhi, hi). But the former would - invalidate hi's address. Load into dhi first. */ - emit_move_insn (dhi, hi); - hi = dhi; + invalidate hi's address. */ + if (rtx_equal_p (dhi, lo)) + { + /* We can't load into dhi first, so load into dlo + first and we'll swap. */ + emit_move_insn (dlo, hi); + hi = dlo; + } + else + { + /* Load into dhi first. */ + emit_move_insn (dhi, hi); + hi = dhi; + } } if (!rtx_equal_p (dlo, hi)) { --- gcc/testsuite/gcc.target/i386/pr109109-1.c.jj 2023-03-14 15:51:35.104926863 +0100 +++ gcc/testsuite/gcc.target/i386/pr109109-1.c 2023-03-14 15:51:16.715191961 +0100 @@ -0,0 +1,139 @@ +/* PR target/109109 */ +/* { dg-do run { target ia32 } } */ +/* { dg-options "-O2" } */ + +unsigned int arr[64]; + +__attribute__((noipa, regparm (2))) unsigned long long +f1 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) arr[ax]) << 32) | ax; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f2 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) arr[dx]) << 32) | ax; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f3 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) ((unsigned int *) (((char *) arr) + ax))[dx]) << 32) | ax; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f4 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) arr[ax]) << 32) | dx; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f5 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) arr[dx]) << 32) | dx; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f6 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) ((unsigned int *) (((char *) arr) + ax))[dx]) << 32) | dx; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f7 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) arr[ax]) << 32) | cx; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f8 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) arr[dx]) << 32) | cx; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f9 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) ((unsigned int *) (((char *) arr) + ax))[dx]) << 32) | cx; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f10 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) ax) << 32) | arr[ax]; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f11 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) ax) << 32) | arr[dx]; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f12 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) ax) << 32) | ((unsigned int *) (((char *) arr) + ax))[dx]; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f13 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) dx) << 32) | arr[ax]; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f14 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) dx) << 32) | arr[dx]; +} + +__attribute__((noipa, regparm (2))) unsigned long long +f15 (unsigned int ax, unsigned int dx) +{ + return (((unsigned long long) dx) << 32) | ((unsigned int *) (((char *) arr) + ax))[dx]; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f16 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) cx) << 32) | arr[ax]; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f17 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) cx) << 32) | arr[dx]; +} + +__attribute__((noipa, regparm (3))) unsigned long long +f18 (unsigned int ax, unsigned int dx, unsigned int cx) +{ + return (((unsigned long long) cx) << 32) | ((unsigned int *) (((char *) arr) + ax))[dx]; +} + +int +main () +{ + for (int i = 0; i < 64; i++) + arr[i] = 64 + i; +#define CHECK_EQ(x, y) do { if (x != y) __builtin_abort (); } while (0) + CHECK_EQ (f1 (8, 9), 0x4800000008ULL); + CHECK_EQ (f2 (8, 9), 0x4900000008ULL); + CHECK_EQ (f3 (8, 9), 0x4b00000008ULL); + CHECK_EQ (f4 (8, 9), 0x4800000009ULL); + CHECK_EQ (f5 (8, 9), 0x4900000009ULL); + CHECK_EQ (f6 (8, 9), 0x4b00000009ULL); + CHECK_EQ (f7 (8, 9, 10), 0x480000000aULL); + CHECK_EQ (f8 (8, 9, 10), 0x490000000aULL); + CHECK_EQ (f9 (8, 9, 10), 0x4b0000000aULL); + CHECK_EQ (f10 (8, 9), 0x800000048ULL); + CHECK_EQ (f11 (8, 9), 0x800000049ULL); + CHECK_EQ (f12 (8, 9), 0x80000004bULL); + CHECK_EQ (f13 (8, 9), 0x900000048ULL); + CHECK_EQ (f14 (8, 9), 0x900000049ULL); + CHECK_EQ (f15 (8, 9), 0x90000004bULL); + CHECK_EQ (f16 (8, 9, 10), 0xa00000048ULL); + CHECK_EQ (f17 (8, 9, 10), 0xa00000049ULL); + CHECK_EQ (f18 (8, 9, 10), 0xa0000004bULL); +} --- gcc/testsuite/gcc.target/i386/pr109109-2.c.jj 2023-03-14 15:53:08.619578782 +0100 +++ gcc/testsuite/gcc.target/i386/pr109109-2.c 2023-03-14 16:05:22.675995934 +0100 @@ -0,0 +1,175 @@ +/* PR target/109109 */ +/* { dg-do run { target lp64 } } */ +/* { dg-options "-O2" } */ + +unsigned long arr[64]; + +__attribute__((noipa)) unsigned __int128 +f1 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[ax]) << 64) | ax; +} + +__attribute__((noipa)) unsigned __int128 +f2 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[dx]) << 64) | ax; +} + +__attribute__((noipa)) unsigned __int128 +f3 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ((unsigned long *) (((char *) arr) + ax))[dx]) << 64) | ax; +} + +__attribute__((noipa)) unsigned __int128 +f4 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[ax]) << 64) | dx; +} + +__attribute__((noipa)) unsigned __int128 +f5 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[dx]) << 64) | dx; +} + +__attribute__((noipa)) unsigned __int128 +f6 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ((unsigned long *) (((char *) arr) + ax))[dx]) << 64) | dx; +} + +__attribute__((noipa)) unsigned __int128 +f7 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[ax]) << 64) | cx; +} + +__attribute__((noipa)) unsigned __int128 +f8 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) arr[dx]) << 64) | cx; +} + +__attribute__((noipa)) unsigned __int128 +f9 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ((unsigned long *) (((char *) arr) + ax))[dx]) << 64) | cx; +} + +__attribute__((noipa)) unsigned __int128 +f10 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ax) << 64) | arr[ax]; +} + +__attribute__((noipa)) unsigned __int128 +f11 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ax) << 64) | arr[dx]; +} + +__attribute__((noipa)) unsigned __int128 +f12 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) ax) << 64) | ((unsigned long *) (((char *) arr) + ax))[dx]; +} + +__attribute__((noipa)) unsigned __int128 +f13 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) dx) << 64) | arr[ax]; +} + +__attribute__((noipa)) unsigned __int128 +f14 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) dx) << 64) | arr[dx]; +} + +__attribute__((noipa)) unsigned __int128 +f15 (unsigned long di, unsigned long si, unsigned long dx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) dx) << 64) | ((unsigned long *) (((char *) arr) + ax))[dx]; +} + +__attribute__((noipa)) unsigned __int128 +f16 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) cx) << 64) | arr[ax]; +} + +__attribute__((noipa)) unsigned __int128 +f17 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) cx) << 64) | arr[dx]; +} + +__attribute__((noipa)) unsigned __int128 +f18 (unsigned long di, unsigned long si, unsigned long dx, unsigned long cx) +{ + unsigned long ax; + asm ("" : "=a" (ax) : "0" (di)); + return (((unsigned __int128) cx) << 64) | ((unsigned long *) (((char *) arr) + ax))[dx]; +} + +int +main () +{ + for (int i = 0; i < 64; i++) + arr[i] = 64 + i; +#define CHECK_EQ(x, y1, y2) do { unsigned __int128 y = y1; y <<= 64; y += y2; if (x != y) __builtin_abort (); } while (0) + CHECK_EQ (f1 (8, 0, 9), 0x48, 0x8); + CHECK_EQ (f2 (8, 0, 9), 0x49, 0x8); + CHECK_EQ (f3 (8, 0, 9), 0x4a, 0x8); + CHECK_EQ (f4 (8, 0, 9), 0x48, 0x9); + CHECK_EQ (f5 (8, 0, 9), 0x49, 0x9); + CHECK_EQ (f6 (8, 0, 9), 0x4a, 0x9); + CHECK_EQ (f7 (8, 0, 9, 10), 0x48, 0xa); + CHECK_EQ (f8 (8, 0, 9, 10), 0x49, 0xa); + CHECK_EQ (f9 (8, 0, 9, 10), 0x4a, 0xa); + CHECK_EQ (f10 (8, 0, 9), 0x8, 0x48); + CHECK_EQ (f11 (8, 0, 9), 0x8, 0x49); + CHECK_EQ (f12 (8, 0, 9), 0x8, 0x4a); + CHECK_EQ (f13 (8, 0, 9), 0x9, 0x48); + CHECK_EQ (f14 (8, 0, 9), 0x9, 0x49); + CHECK_EQ (f15 (8, 0, 9), 0x9, 0x4a); + CHECK_EQ (f16 (8, 0, 9, 10), 0xa, 0x48); + CHECK_EQ (f17 (8, 0, 9, 10), 0xa, 0x49); + CHECK_EQ (f18 (8, 0, 9, 10), 0xa, 0x4a); +} Jakub