From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by sourceware.org (Postfix) with ESMTPS id 84BC63882172 for ; Fri, 14 Jun 2024 18:01:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84BC63882172 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=alien8.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=alien8.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 84BC63882172 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=65.109.113.108 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718388089; cv=none; b=V+YbIublETUAzgTmeO6hwlZOQN4+GAQPP6KEpZMhYwkmJ8oZlpZJq//NAJHliDwo11TnhsiZ8Yu6RjuvsVZ2/WRMusoj3sxnhP9y3xIRgh5y7RvJgMyvObWfPUu258xJadf1zyzGL3FwJwKuFJu96MqE+LGxt3EtDVsQ4jzg5xY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718388089; c=relaxed/simple; bh=w40VbtFxFa6O22Xm6GydNYfziE0LGEYoQRtqAbLIoGg=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vCvcHxRYq2/Cj3GCntowGgdjPwvPpxQ9UcWfpdjFmoy9q+bYTIZK8OP0+zzsSGvy6bNTFE3JtMYVlJtx5wMQqhgZ6MZ5OcJnzu5dXcJ2fFb5WXZXdlfwGsv4kyrkFt2GJQiE2sGgTBygWiB+xpIkyG9FQSlMxwiMV8Cpby7IwMw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 0A4DF40E0184; Fri, 14 Jun 2024 18:01:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Authentication-Results: mail.alien8.de (amavisd-new); dkim=fail (4096-bit key) reason="fail (body has been altered)" header.d=alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SFBR6v1Wgqu6; Fri, 14 Jun 2024 18:01:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1718388075; bh=6LNlgjjov3Y9DECDr4B+A1x0yDqe838P4adw8zRMDfw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CLGxNAFzcrZsBsMPWLuJ2v4Bn4gTdqtAdW7fKMQ3tN4sy8mFB5JEL7EipL3u/URPQ +HDFGF6p+EFlvjXMC6aAIo7meX9A8cIcPio/VsLfpZStgrn4/3JbWFJy186XlWie9N Jxi8g74HT0p10fgebPYw9o02ktVCBtwZ0Uf0YyjXno1BLZKIorKrCFG+ORfBlNw3aN J3XME2ekPIXHIO4Vgzl9f65nYsLE13wTloYTkycDEmpMjieBZZIWrqnecz9Dr68vgL C+N80i/p6bLcxxwhiWwiyt2vjkSwXlOrx9RSt99XqYBXvHo5WwPnu7IraCWrC9MPrh cb5oFteTcOHTLZOOKBVnRuPL8tuwaCO3a/3ljXj3QzrTzolcJ2xsrRv9SgUBUKlZjf vdX1cu5Sk3pt8kH+mzVqBmsPYkv+frJjdz1oL64bn8KHVfqlKiCifJ6QRneUkThZEo zkxfYlXoxPb/1BViYCMLYwMPGytvKVpt+lb1zblh9D7Dx1kng+oYX+NeLjGdjLPlTs 9lnn9AQsYzg325wZO6gy7uT3LyFUNFjJkGARojry5DghV9YuHKu2vsniXdBkvFrVPl 4Q766Jcqzha8LoSAKThpQEkuec6PU92Jc4cvd2zqL5o1/zCbOQCBS2O6YSXhYKUsqK JgNu0SHg6KAiXRAcObkAWP8M= Received: from zn.tnic (p5de8ee85.dip0.t-ipconnect.de [93.232.238.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 3A14F40E016A; Fri, 14 Jun 2024 18:01:11 +0000 (UTC) Date: Fri, 14 Jun 2024 20:01:05 +0200 From: Borislav Petkov To: Noah Goldstein Cc: "H.J. Lu" , libc-alpha@sourceware.org, Michael Matz Subject: Re: [PATCH v2 2/2] x86: Add seperate non-temporal tunable for memset Message-ID: <20240614180105.GDZmyFYc42xBeRL99p@fat_crate.local> References: <20240519004347.2759850-1-goldstein.w.n@gmail.com> <20240524173851.2483952-1-goldstein.w.n@gmail.com> <20240524173851.2483952-2-goldstein.w.n@gmail.com> <20240614104040.GAZmweKPkLa5YuGOG7@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00,DKIM_INVALID,DKIM_SIGNED,GIT_PATCH_0,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Jun 14, 2024 at 11:39:07AM -0500, Noah Goldstein wrote: > On Fri, Jun 14, 2024 at 5:41=E2=80=AFAM Borislav Petkov = wrote: > > > > Hi, > > > > I'm not subscribed to the glibc list - pls CC me directly on replies. > > > > On Wed, May 29, 2024 at 03:53:20PM -0700, H.J. Lu wrote: > > > On Fri, May 24, 2024 at 10:39?AM Noah Goldstein wrote: > > > > > > > > The tuning for non-temporal stores for memset vs memcpy is not al= ways > > > > the same. This includes both the exact value and whether non-temp= oral > > > > stores are profitable at all for a given arch. > > > > > > > > This patch add `x86_memset_non_temporal_threshold`. Currently we > > > > disable non-temporal stores for non Intel vendors as the only > > > > benchmarks showing its benefit have been on Intel hardware. > > > > --- > > > > manual/tunables.texi | 16 ++++++++++= +++++- > > > > sysdeps/x86/cacheinfo.h | 8 +++++++- > > > > sysdeps/x86/dl-cacheinfo.h | 16 ++++++++++= ++++++ > > > > sysdeps/x86/dl-diagnostics-cpu.c | 2 ++ > > > > sysdeps/x86/dl-tunables.list | 3 +++ > > > > sysdeps/x86/include/cpu-features.h | 4 +++- > > > > .../x86_64/multiarch/memset-vec-unaligned-erms.S | 6 +++--- > > > > 7 files changed, 49 insertions(+), 6 deletions(-) > > > > ... > > > > > > + /* Non-temporal stores in memset have only been tested on Inte= l hardware. > > > > + Until we benchmark data on other x86 processor, disable non= -temporal > > > > + stores in memset. */ > > > > Well, something's fishy here: > > > > $ ./elf/ld.so --list-tunables | grep threshold > > glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff) > > glibc.cpu.x86_rep_movsb_threshold: 0x600000 (min: 0x100, max: 0xfffff= fffffffffff) > > glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x4040, max: 0xf= ffffffffffffff) > > glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff) > > glibc.cpu.x86_rep_stosb_threshold: 0xffffffffffffffff (min: 0x1, max:= 0xffffffffffffffff) > > glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xff= ffffffffffffff) > > ^^^^^^^^^ > > > > on glibc-2.39.9000-300-g54c1efdac55b from git. > > > > That's on a AMD Zen1 so I'd expect that memset NT threshold to be > > 0xffffffffffffffff by default... > > > > Thx. > > >=20 > Thanks for bringing this up, looking into it. Thx, so Michael did debug it yesterday to the ranges mismatching: diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c index 147cc4cf23f5..ecf3c1d3736e 100644 --- a/elf/dl-tunables.c +++ b/elf/dl-tunables.c @@ -110,8 +110,11 @@ do_tunable_update_val (tunable_t *cur, const tunable= _val_t *valp, =20 /* Bail out if the bounds are not valid. */ if (tunable_val_lt (val, min, unsigned_cmp) - || tunable_val_lt (max, val, unsigned_cmp)) + || tunable_val_lt (max, val, unsigned_cmp)) { + _dl_printf("bail out due to: 0x%lx, min: 0x%lx, max: 0x%lx\n", + val, min, max); return; + } =20 cur->val.numval =3D val; cur->type.min =3D min; $ ./elf/ld.so --list-tunables | grep -E "(threshold|bail)" dl_init_cacheinfo: memset_non_temporal_threshold: 0xffffffffffffffff dl_init_cacheinfo: memset_non_temporal_threshold, tunable_size: 0xfffffff= fffffffff bail out due to: 0xffffffffffffffff, min: 0x4040, max: 0xfffffffffffffff ^^^^^^^ dl_init_cacheinfo: memset_non_temporal_threshold, tunable set: 0xffffffff= ffffffff, min: 0x4040, max: 0xfffffffffffffff glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffff= ffffffffff) but you guys probably should do the right fix here. Thx. --=20 Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette