From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id 863DE3858C98 for ; Thu, 4 Apr 2024 10:04:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 863DE3858C98 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 863DE3858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712225085; cv=none; b=pvFfzDluCiEf/ysypcvfcgiROt8VWlJ5T7nDTigrsPXSIcZE4bWiuLUOz59SuxwM7MmAnfWT96tXhnPphWGBiHO5BaMGqvxo6zY2wF6ZJ02wT9dogq8Ntuwua8oeYNBHUPCL6ZCmq3CuzHtdbPK5v8RhiyTGQ6uIe7uil4WH57o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712225085; c=relaxed/simple; bh=obQ3caXJvTocom8BfuPa6gii2W+gaFUyRvJMRGGogvA=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=avE7MU8y8lEcHnf3Ae9ntntup3XHs9jQuChc9AUdVCk7DBRP7hg/TnJyxezpczLaOksNOu9yeyCxKl86MfhRdc8HvIaNATQt9LuLx21NBOyH8hpftnH/asATEoaRVF6HRHSLrCxKYzise70FPTBvZGl7CmgUYs8GFtG3dEdb3tM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-516c97ddcd1so577032e87.2 for ; Thu, 04 Apr 2024 03:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1712225078; x=1712829878; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dNr+aXj5Uaq7701127nRM136mcTtLExwZQvLJUT2GJo=; b=zIO6WnEp23STYQYa3gQD0LembEWZ3GwhpEjrp0z5+wgbaQ7REOKhBFZq6762SkmlOy WJxwf1tL/GGJ3/GZNeMzf7h1pG9temppH9OMmXYOKMb9S6ySbmTj8sbbN1SbGgEF+F3d /wTIH9hSYQ7bXs0K9ucJESJYMqe2P6OMTyBtyuLRhIgwMJvX41LUZTtjZWCFlNrn8xBp kAsanNiD+ObGlsji3aBf6clacE186q+Pl1oxrLFeqXvoV4sdwgVHlm9i4izYHmG+PkK3 1/45e82REcQBbyxD69DfMSjwNLCplwvzXCb1zgJnPUpw8+ibudZYZF3xBiPOad5z49/P FTCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712225078; x=1712829878; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:to:from:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dNr+aXj5Uaq7701127nRM136mcTtLExwZQvLJUT2GJo=; b=S6c/71LxsxAu/X3ZvQ90NMKC9BEUZpO/taSDxTYocidSWSUKzU6IR/CqHOkDeo9z4j 9YSXhZkgxKrrfAkWlAAxFbApjbbvMpHHCuPQmZc7OQ7Y5s1DgOpUh7dycSNXLq2dX3Jx eLL/aEqPlQ11pdapE7d7zh4Lclsl5JjYcSY1XU0isfOAbZ283cQvO4IygweLf7l+queZ 7OU0Jl1n32pA1YkKsAtWcX7yxJa+7PsuuUjeGpggo0d47WvDOWeLJWvHmf+mdVr7eKgq nTcz840wZ5UDOxIxF66abOdsqO6DGaX9mynImZPq9FPuzuvfQYk7HbQjtLIjldNUyra5 W5Xw== X-Forwarded-Encrypted: i=1; AJvYcCUMjNtMYHA0/XpZthpGiaDjvTvU3CvX3QSLPbaYUSlNgke5xz+rpaRGAJEizS1FiAsjITKtQUw5wmd/0FMHt2MNMIgiPN9nJA== X-Gm-Message-State: AOJu0Yz9csU3Qd4wTWyVGYLnbakcbyTeZEMZobh/5CcIgpw+LBuBHhLW //mu2oxqO/ZwFUwa/5IJlCGovk0vqKdVPAcVRb96nGp7u/F23jbJGPvFuJTcAnc= X-Google-Smtp-Source: AGHT+IEkY7FbyhXzCHfmEml48XiETgi+goeaMse+pDPOmtDjJVMfwa5zkE015Y/GcCUO9M4AEW3aZw== X-Received: by 2002:ac2:4db4:0:b0:513:e945:e9a7 with SMTP id h20-20020ac24db4000000b00513e945e9a7mr1413682lfe.4.1712225078278; Thu, 04 Apr 2024 03:04:38 -0700 (PDT) Received: from euler.schwinge.homeip.net (p200300c8b70336000b0134869109dcb1.dip0.t-ipconnect.de. [2003:c8:b703:3600:b01:3486:9109:dcb1]) by smtp.gmail.com with ESMTPSA id je6-20020a05600c1f8600b0041496734318sm2128411wmb.24.2024.04.04.03.04.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 03:04:37 -0700 (PDT) From: Thomas Schwinge To: Andrew Stubbs , gcc-patches@gcc.gnu.org, Richard Biener Subject: Re: [committed] amdgcn: Adjust GFX10/GFX11 cache coherency In-Reply-To: <20240322155449.747518-2-ams@baylibre.com> References: <20240322155449.747518-2-ams@baylibre.com> User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.2 (x86_64-pc-linux-gnu) Date: Thu, 04 Apr 2024 12:04:34 +0200 Message-ID: <87cyr5a1fx.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! To again state this in public: On 2024-03-22T15:54:49+0000, Andrew Stubbs wrote: > The RDNA devices have different cache architectures to the CDNA devices, = and > the differences go deeper than just the assembler mnemonics, so we > probably need to generate different code to maintain coherency across > the whole device. > > I believe this patch is correct according to the documentation in the LLVM > AMDGPU user guide (the ISA manual is less instructive), but I hadn't obse= rved > any real problems before (or after). > > Committed to mainline. Thanks! This commit does repair a lot of the GCN offloading damage noted in "libgomp GCN gfx1030/gfx1100 offloading status" and thereabouts, that is, this recovers to PASS a lot of twinkling libgomp/OpenMP/GCN execution test cases, and their even more annyoing random timeouts. (The commit doesn't affect GCN target testing.) I still have a number of stabilization hacks applied to my sources -- but I've, for example, not seen any 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' or random timeouts in my current GCN offloading test results. Gr=C3=BC=C3=9Fe Thomas > gcc/ChangeLog: > > * config/gcn/gcn.md (*memory_barrier): Split into RDNA and !RDNA. > (atomic_load): Adjust RDNA cache settings. > (atomic_store): Likewise. > (atomic_exchange): Likewise. > --- > gcc/config/gcn/gcn.md | 86 +++++++++++++++++++++++++++---------------- > 1 file changed, 55 insertions(+), 31 deletions(-) > > diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md > index 3b51453aaca..574c2f87e8c 100644 > --- a/gcc/config/gcn/gcn.md > +++ b/gcc/config/gcn/gcn.md > @@ -1960,11 +1960,19 @@ > (define_insn "*memory_barrier" > [(set (match_operand:BLK 0) > (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] > - "" > - "{buffer_wbinvl1_vol|buffer_gl0_inv}" > + "!TARGET_RDNA2_PLUS" > + "buffer_wbinvl1_vol" > [(set_attr "type" "mubuf") > (set_attr "length" "4")]) >=20=20 > +(define_insn "*memory_barrier" > + [(set (match_operand:BLK 0) > + (unspec:BLK [(match_dup 0)] UNSPEC_MEMORY_BARRIER))] > + "TARGET_RDNA2_PLUS" > + "buffer_gl1_inv\;buffer_gl0_inv" > + [(set_attr "type" "mult") > + (set_attr "length" "8")]) > + > ; FIXME: These patterns have been disabled as they do not seem to work > ; reliably - they can cause hangs or incorrect results. > ; TODO: flush caches according to memory model > @@ -2094,9 +2102,13 @@ > case 0: > return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)"; > case 1: > - return "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"; > + return (TARGET_RDNA2 /* Not GFX11. */ > + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0" > + : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0"); > case 2: > - return "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"; > + return (TARGET_RDNA2 /* Not GFX11. */ > + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)" > + : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)"); > } > break; > case MEMMODEL_CONSUME: > @@ -2108,15 +2120,21 @@ > return "s_load%o0\t%0, %A1 glc\;s_waitcnt\tlgkmcnt(0)\;" > "s_dcache_wb_vol"; > case 1: > - return (TARGET_RDNA2_PLUS > + return (TARGET_RDNA2 > + ? "flat_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\t0\;" > + "buffer_gl1_inv\;buffer_gl0_inv" > + : TARGET_RDNA3 > ? "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" > - "buffer_gl0_inv" > + "buffer_gl1_inv\;buffer_gl0_inv" > : "flat_load%o0\t%0, %A1%O1 glc\;s_waitcnt\t0\;" > "buffer_wbinvl1_vol"); > case 2: > - return (TARGET_RDNA2_PLUS > + return (TARGET_RDNA2 > + ? "global_load%o0\t%0, %A1%O1 glc dlc\;s_waitcnt\tvmcnt(0)\;" > + "buffer_gl1_inv\;buffer_gl0_inv" > + : TARGET_RDNA3 > ? "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" > - "buffer_gl0_inv" > + "buffer_gl1_inv\;buffer_gl0_inv" > : "global_load%o0\t%0, %A1%O1 glc\;s_waitcnt\tvmcnt(0)\;" > "buffer_wbinvl1_vol"); > } > @@ -2130,15 +2148,21 @@ > return "s_dcache_wb_vol\;s_load%o0\t%0, %A1 glc\;" > "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; > case 1: > - return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" > - "s_waitcnt\t0\;buffer_gl0_inv" > + return (TARGET_RDNA2 > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc dl= c\;" > + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" > + : TARGET_RDNA3 > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_load%o0\t%0, %A1%O1 glc\;" > + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;flat_load%o0\t%0, %A1%O1 glc\;" > "s_waitcnt\t0\;buffer_wbinvl1_vol"); > case 2: > - return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\;" > - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" > + return (TARGET_RDNA2 > + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc = dlc\;" > + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" > + : TARGET_RDNA3 > + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_load%o0\t%0, %A1%O1 glc\= ;" > + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;global_load%o0\t%0, %A1%O1 glc\;" > "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); > } > @@ -2147,7 +2171,7 @@ > gcc_unreachable (); > } > [(set_attr "type" "smem,flat,flat") > - (set_attr "length" "20") > + (set_attr "length" "28") > (set_attr "gcn_version" "gcn5,*,gcn5") > (set_attr "rdna" "no,*,*")]) >=20=20 > @@ -2180,11 +2204,11 @@ > return "s_dcache_wb_vol\;s_store%o1\t%1, %A0 glc"; > case 1: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc" > : "buffer_wbinvl1_vol\;flat_store%o1\t%A0, %1%O0 glc"); > case 2: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc" > : "buffer_wbinvl1_vol\;global_store%o1\t%A0, %1%O0 glc"); > } > break; > @@ -2198,14 +2222,14 @@ > "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; > case 1: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc\;" > - "s_waitcnt\t0\;buffer_gl0_inv" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_store%o1\t%A0, %1%O0 glc\;" > + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;flat_store%o1\t%A0, %1%O0 glc\;" > "s_waitcnt\t0\;buffer_wbinvl1_vol"); > case 2: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc\;" > - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;global_store%o1\t%A0, %1%O0 glc= \;" > + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;global_store%o1\t%A0, %1%O0 glc\;" > "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); > } > @@ -2214,7 +2238,7 @@ > gcc_unreachable (); > } > [(set_attr "type" "smem,flat,flat") > - (set_attr "length" "20") > + (set_attr "length" "28") > (set_attr "gcn_version" "gcn5,*,gcn5") > (set_attr "rdna" "no,*,*")]) >=20=20 > @@ -2253,13 +2277,13 @@ > case 1: > return (TARGET_RDNA2_PLUS > ? "flat_atomic_swap\t%0, %1, %2 glc\;s_waitcnt\t0\;" > - "buffer_gl0_inv" > + "buffer_gl1_inv\;buffer_gl0_inv" > : "flat_atomic_swap\t%0, %1, %2 glc\;s_waitcnt\t0\;" > "buffer_wbinvl1_vol"); > case 2: > return (TARGET_RDNA2_PLUS > ? "global_atomic_swap\t%0, %A1, %2%O1 glc\;" > - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" > + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" > : "global_atomic_swap\t%0, %A1, %2%O1 glc\;" > "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); > } > @@ -2273,13 +2297,13 @@ > "s_waitcnt\tlgkmcnt(0)"; > case 1: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2= glc\;" > "s_waitcnt\t0" > : "buffer_wbinvl1_vol\;flat_atomic_swap\t%0, %1, %2 glc\;" > "s_waitcnt\t0"); > case 2: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;" > "global_atomic_swap\t%0, %A1, %2%O1 glc\;" > "s_waitcnt\tvmcnt(0)" > : "buffer_wbinvl1_vol\;" > @@ -2297,15 +2321,15 @@ > "s_waitcnt\tlgkmcnt(0)\;s_dcache_inv_vol"; > case 1: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2 glc\;" > - "s_waitcnt\t0\;buffer_gl0_inv" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;flat_atomic_swap\t%0, %1, %2= glc\;" > + "s_waitcnt\t0\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;flat_atomic_swap\t%0, %1, %2 glc\;" > "s_waitcnt\t0\;buffer_wbinvl1_vol"); > case 2: > return (TARGET_RDNA2_PLUS > - ? "buffer_gl0_inv\;" > + ? "buffer_gl1_inv\;buffer_gl0_inv\;" > "global_atomic_swap\t%0, %A1, %2%O1 glc\;" > - "s_waitcnt\tvmcnt(0)\;buffer_gl0_inv" > + "s_waitcnt\tvmcnt(0)\;buffer_gl1_inv\;buffer_gl0_inv" > : "buffer_wbinvl1_vol\;" > "global_atomic_swap\t%0, %A1, %2%O1 glc\;" > "s_waitcnt\tvmcnt(0)\;buffer_wbinvl1_vol"); > @@ -2315,7 +2339,7 @@ > gcc_unreachable (); > } > [(set_attr "type" "smem,flat,flat") > - (set_attr "length" "20") > + (set_attr "length" "28") > (set_attr "gcn_version" "gcn5,*,gcn5") > (set_attr "rdna" "no,*,*")]) >=20=20 > --=20 > 2.41.0