From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 64F2F3851897 for ; Mon, 14 Nov 2022 08:19:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 64F2F3851897 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1668413991; bh=l5KM7AbgEtx6756vqJiLEkIMCoPDN2t6eHVtTSusv4A=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=bPNLGntTGCATg0Y47LGKgCylur7vSqEQNqx/9Gol4yfIHy+ahp7vQVjKKPeXM4U47 X/4yT53TFKR0bhrUkvEFSq9lY2srJn9qSo1DVs8DDPngoAqK44jVyM818/UjKcs5F6 RkfeHeTQXRzBCf62htH/LoHuFDfMNwIpDl6GAwZY= Received: from localhost.localdomain (xry111.site [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384)) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 95C63667B7; Mon, 14 Nov 2022 03:19:49 -0500 (EST) Message-ID: <0fa5e4e5ce325a8e432e9e0bd2e598aa48666501.camel@xry111.site> Subject: Re: [PATCH] libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688] From: Xi Ruoyao To: Uros Bizjak , Jakub Jelinek , Mayshao-oc Cc: Richard Biener , Jeff Law , gcc-patches@gcc.gnu.org, Florian Weimer , "H.J. Lu" Date: Mon, 14 Nov 2022 16:19:48 +0800 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.0 MIME-Version: 1.0 X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,KAM_SHORT,LIKELY_SPAM_FROM,PDS_OTHER_BAD_TLD,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2022-11-14 at 08:55 +0100, Uros Bizjak via Gcc-patches wrote: > On Mon, Nov 14, 2022 at 8:48 AM Jakub Jelinek > wrote: > >=20 > > Hi! > >=20 > > Working virtually out of Baker Island. > >=20 > > We got a response from AMD in > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104688#c10 > > so the following patch starts treating AMD with AVX and CMPXCHG16B > > ISAs like Intel by using vmovdqa for atomic load/store in libatomic. > >=20 > > Ok for trunk if it passes bootstrap/regtest? > >=20 > > 2022-11-13=C2=A0 Jakub Jelinek=C2=A0 > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 PR target/104688 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * config/x86/init.c (__libat= _feat1_init): Revert 2022-03-17 > > change > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - on x86_64 no longer clear = bit_AVX if CPU vendor is not > > Intel. > >=20 > > --- libatomic/config/x86/init.c.jj=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2022-0= 3-17 > > 18:48:56.708723194 +0100 > > +++ libatomic/config/x86/init.c 2022-11-13 18:23:26.315440071 -1200 > > @@ -34,18 +34,6 @@ __libat_feat1_init (void) > > =C2=A0=C2=A0 unsigned int eax, ebx, ecx, edx; > > =C2=A0=C2=A0 FEAT1_REGISTER =3D 0; > > =C2=A0=C2=A0 __get_cpuid (1, &eax, &ebx, &ecx, &edx); > > -#ifdef __x86_64__ > > -=C2=A0 if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B)) > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D (bit_AVX | bit_CMPXCHG16B)) > > -=C2=A0=C2=A0=C2=A0 { > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Intel SDM guarantees that 16-byte VM= OVDQA on 16-byte > > aligned address > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 is atomic, but so far we do= n't have this guarantee from > > AMD.=C2=A0 */ > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned int ecx2 =3D 0; > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __get_cpuid (0, &eax, &ebx, &ecx2, &edx= ); > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (ecx2 !=3D signature_INTEL_ecx) > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 FEAT1_REGISTER &=3D ~bit_AVX; >=20 > We still need this, but also bypass it for AMD signature. There are > other vendors than Intel and AMD. Mayshao: how about the status of this feature on Zhaoxin product lines? IIRC they support AVX (but disabled by default in GCC for Lujiazui), but we don't know if they make the guarantee about atomicity of 16B aligned access. >=20 > OK with the above addition. >=20 > Thanks, > Uros. >=20 > > -=C2=A0=C2=A0=C2=A0 } > > -#endif > > =C2=A0=C2=A0 /* See the load in load_feat1.=C2=A0 */ > > =C2=A0=C2=A0 __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, > > __ATOMIC_RELAXED); > > =C2=A0=C2=A0 return FEAT1_REGISTER; > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Jakub > >=20 --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University