From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id 2921C3857038 for ; Wed, 15 Jun 2022 01:03:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2921C3857038 Received: by mail-pj1-x1036.google.com with SMTP id 3-20020a17090a174300b001e426a02ac5so626000pjm.2 for ; Tue, 14 Jun 2022 18:03:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SDooRrImazzTyTkSut1eqcysUebjkRnHEZ6grABlIyA=; b=g34IVgql6IhmFNUkzj8RylSOyTNg9qAAlJOxqaLW27PgbRENydPlweNPCgSC7XgQjM Ux1USC5qjlV0ZNZ1/Nt+8jIfwg1HOEGu4n2HFRLUU94OuQ8GF9oqScgZN4hIwii/KV8b iuZRV2L2MIVxkhCCeloc/K9D7VJsfwU1fAPYb/9iDIol0G3gl8BxDmYJ/CqfCngjMaol V+shJ+4FrBZ+EhDgComWOIyrii2NrgHerVj90IGEGEWPvMuuvFRDrDcx1E5DaeQ5NWh3 sCfTpd6HPhXS0dg8kxxamIN8ozqXin7/tB3UULHY9cbLwSrDN2p74YSr/BD2x6JF2xtB fqKg== X-Gm-Message-State: AJIora9Yr5FyOMjOSozF6FEx1gkpkXPZ2B62UIT+L9RK7+TiB/+4/Hj1 gOtfLqJVIercVUJ4lT3zmHA2kgHVe1xp1Y+Ol+DfUvZt X-Google-Smtp-Source: AGRyM1vLb8zJva3KTYtIgdzDw8ijY4x44TL97CU37SLDikCwKyjiPKUt8uN0gRgZrUE5n539P3vLvIbBxhBcDQkXmBo= X-Received: by 2002:a17:902:7088:b0:167:78c0:e05e with SMTP id z8-20020a170902708800b0016778c0e05emr6704007plk.149.1655254995976; Tue, 14 Jun 2022 18:03:15 -0700 (PDT) MIME-Version: 1.0 References: <20220615002533.1741934-1-goldstein.w.n@gmail.com> In-Reply-To: <20220615002533.1741934-1-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Tue, 14 Jun 2022 18:02:40 -0700 Message-ID: Subject: Re: [PATCH v1 1/3] x86: Fix misordered logic for setting `rep_movsb_stop_threshold` To: Noah Goldstein Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3025.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2022 01:03:18 -0000 On Tue, Jun 14, 2022 at 5:25 PM Noah Goldstein wrote: > > Move the setting of `rep_movsb_stop_threshold` to after the tunables > have been collected so that the `rep_movsb_stop_threshold` (which > is used to redirect control flow to the non_temporal case) will > use any user value for `non_temporal_threshold` (set using > glibc.cpu.x86_non_temporal_threshold) > --- > sysdeps/x86/dl-cacheinfo.h | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index f64a2fb0ba..cc3b840f9c 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -898,18 +898,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) > rep_movsb_threshold = 2112; > > - unsigned long int rep_movsb_stop_threshold; > - /* ERMS feature is implemented from AMD Zen3 architecture and it is > - performing poorly for data above L2 cache size. Henceforth, adding > - an upper bound threshold parameter to limit the usage of Enhanced > - REP MOVSB operations and setting its value to L2 cache size. */ > - if (cpu_features->basic.kind == arch_kind_amd) > - rep_movsb_stop_threshold = core; > - /* Setting the upper bound of ERMS to the computed value of > - non-temporal threshold for architectures other than AMD. */ > - else > - rep_movsb_stop_threshold = non_temporal_threshold; > - > /* The default threshold to use Enhanced REP STOSB. */ > unsigned long int rep_stosb_threshold = 2048; > > @@ -951,6 +939,18 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > SIZE_MAX); > #endif > > + unsigned long int rep_movsb_stop_threshold; > + /* ERMS feature is implemented from AMD Zen3 architecture and it is > + performing poorly for data above L2 cache size. Henceforth, adding > + an upper bound threshold parameter to limit the usage of Enhanced > + REP MOVSB operations and setting its value to L2 cache size. */ > + if (cpu_features->basic.kind == arch_kind_amd) > + rep_movsb_stop_threshold = core; > + /* Setting the upper bound of ERMS to the computed value of > + non-temporal threshold for architectures other than AMD. */ > + else > + rep_movsb_stop_threshold = non_temporal_threshold; > + > cpu_features->data_cache_size = data; > cpu_features->shared_cache_size = shared; > cpu_features->non_temporal_threshold = non_temporal_threshold; > -- > 2.34.1 > LGTM. Thanks. -- H.J.