From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id B11733858D1E for ; Tue, 3 Jan 2023 20:50:23 +0000 (GMT) Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-14fb7fdb977so31543084fac.12 for ; Tue, 03 Jan 2023 12:50:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=58hTHORjyRYF2LwwLocPR+xdEEpvS6OMueCYULiq2zs=; b=Z3m8Gy6xjS9pfjcBom12dLfqeIsPG9bikqfum74nvPGqv7iCoepaowE/AUAR4ciC8S awq3VZ+oC6PBKWFD8F8oPY5q/m5893ynBZfGigG/FHY/bPW7AR6QC9xOproUZPTMfcDD 9pLpS4aaPbxSoMiQEVcvmNRqfgrghY5qPyxfNEVCqXiVIlzna8Ih2CscNZ83MeZ4bEBX B2XUhu8tPKIPaW04KoDLg3iPJrs7D7KSI9N7F5j2RpiX4pVRHTZc8EjPspsSpsfpTr96 LBOBqYbgC7sQT/+l7iJFKhirCN7I/GkEO7CVUxGZPTVZd1M21pcWUvxuhSSRZM1OxPc7 DAMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=58hTHORjyRYF2LwwLocPR+xdEEpvS6OMueCYULiq2zs=; b=yMyFYR3t0Jpv6pvIH2gXHTWJB+0KGWGSg3EoxDid3gtu7GWEzOMqPN19YJrrfe9Raj rFx0sv/UhzQyFTK3ZiQ1mXP5W3X045WQROYGeX6DUrm7+C+4PW2p7CWq5AqnD/hdN/gk wVJ0XeBz3bokdUM2QbSCWYIhT1lKdg6prKy1bJrXtqSPqo37v9IqHwyrGrcxQnkOC2Qp 7dYYucfKMIo1Tf+eptT1SKzantwQvYobt3IKPUaWDOiF9hJGQH6XAr/W8tu8+Zu1SumQ 6oK+VTd8f4QDKvvDx7eMSAO84T7nbimk57sVDyWosl7GopGc+zFxydTEh7PA2VYu4Vm8 Wz9w== X-Gm-Message-State: AFqh2koT9n1rdqHEz8aDgqeUA+aQBeaN1FFgs7sw2FLOohPHSJt9+LZg XqbH6b2cVD064NNVj0WP3JfluUDnUHRtIha/YK8= X-Google-Smtp-Source: AMrXdXvoipmcchN0EqollK4bP/YtwKqdo42perIfrs2BqAA5u7NRqinq05tRKrFlSFOMEEA6YuvL/laeetMdlrVXTSo= X-Received: by 2002:a05:6870:d202:b0:148:2c02:5323 with SMTP id g2-20020a056870d20200b001482c025323mr4400593oac.298.1672779023032; Tue, 03 Jan 2023 12:50:23 -0800 (PST) MIME-Version: 1.0 References: <20230103193715.2549830-1-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Tue, 3 Jan 2023 12:49:46 -0800 Message-ID: Subject: Re: [PATCH] x86: Check the minimum non_temporal_threshold [BZ #29953] To: Noah Goldstein Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3022.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jan 3, 2023 at 12:24 PM Noah Goldstein wrote: > > On Tue, Jan 3, 2023 at 11:37 AM H.J. Lu wrote: > > > > The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may > > be set to less than the minimum value when the shared cache size isn't > > available (e.g., in an emulator) or by the tunable. Add a check for > > the minimum non_temporal_threshold. > > > > This fixes BZ #29953. > > --- > > sysdeps/x86/dl-cacheinfo.h | 23 ++++++++++++++--------- > > 1 file changed, 14 insertions(+), 9 deletions(-) > > > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > > index e9f3382108..92e8e40752 100644 > > --- a/sysdeps/x86/dl-cacheinfo.h > > +++ b/sysdeps/x86/dl-cacheinfo.h > > @@ -861,8 +861,18 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > share of the cache, it has a substantial risk of negatively > > impacting the performance of other threads running on the chip. */ > > unsigned long int non_temporal_threshold = shared * 3 / 4; > > + /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of > > + 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best > > + if that operation cannot overflow. Minimum of 0x4040 (16448) because the > > + L(large_memset_4x) loops need 64-byte to cache align and enough space for > > + at least 1 iteration of 4x PAGE_SIZE unrolled loop. Both values are > > + reflected in the manual. */ > > + unsigned long int minimum_non_temporal_threshold = 0x4040; > > + if (non_temporal_threshold < minimum_non_temporal_threshold) > > + non_temporal_threshold = minimum_non_temporal_threshold; > > > Should we have equivalent logic for max incase shared is somehow > > SIZE_MAX / 12? Good point. Will be updated in v2. > > #if HAVE_TUNABLES > > + unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; > > /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8. */ > > unsigned int minimum_rep_movsb_threshold; > > #endif > > @@ -915,8 +925,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > shared = tunable_size; > > > > tunable_size = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL); > > - /* NB: Ignore the default value 0. */ > > - if (tunable_size != 0) > > + if (tunable_size > minimum_non_temporal_threshold > > + && tunable_size <= maximum_non_temporal_threshold) > > non_temporal_threshold = tunable_size; > > > > tunable_size = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL); > > @@ -931,14 +941,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > > > TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); > > TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); > > - /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of > > - 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best > > - if that operation cannot overflow. Minimum of 0x4040 (16448) because the > > - L(large_memset_4x) loops need 64-byte to cache align and enough space for > > - at least 1 iteration of 4x PAGE_SIZE unrolled loop. Both values are > > - reflected in the manual. */ > > TUNABLE_SET_WITH_BOUNDS (x86_non_temporal_threshold, non_temporal_threshold, > > - 0x4040, SIZE_MAX >> 4); > > + minimum_non_temporal_threshold, > > + maximum_non_temporal_threshold); > > TUNABLE_SET_WITH_BOUNDS (x86_rep_movsb_threshold, rep_movsb_threshold, > > minimum_rep_movsb_threshold, SIZE_MAX); > > TUNABLE_SET_WITH_BOUNDS (x86_rep_stosb_threshold, rep_stosb_threshold, 1, > > -- > > 2.39.0 > > -- H.J.