From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by sourceware.org (Postfix) with ESMTPS id 4769D3858404 for ; Tue, 9 May 2023 21:59:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4769D3858404 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-ba63c352c2aso307568276.2 for ; Tue, 09 May 2023 14:59:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683669547; x=1686261547; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5CCmOnfEJNQygShdQ9dOjFQsH/e0UcTnQWDPgSgzRAs=; b=MnNzPbwr5K4HGjC2aevGG+PCV33MB5mN500vmvbTQIp2DvxTuz9oz/O+dqbJ8weV0d 0/IeLjPE7garHf73tNmhM0m62OAgTsZvMR3hCIQMbZWcrALgMABwufOPYgQ/No+NvA/C WAv8D+LWsRUUvgOE4iYgN11Y6z+cw4tmYL3t5HmY02EKmQ3gV6yvX4itXqUAIGtHHuuF Tdzy+Y8ca/cX0dm5AakRXNLgPw3KWe7POJ9jLwmmBNGN+e+tQEeMkSPfAJtgWYIsZ4sb JpFLt06zdpGzI5+4Ce+KsfeFjLOeoGen9bTfJ/ZsNoY/WKQtqVo5qGoOPgiYNe4SC5vP 2pfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683669547; x=1686261547; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5CCmOnfEJNQygShdQ9dOjFQsH/e0UcTnQWDPgSgzRAs=; b=EHnWL7XqKBmERcQEODRCow0NKF4f5OMwpeRASn+/lxvxIgjM9y1S7gYBzy0OI3yw7k acRv69KEKGvRMbmV2E8GT2bHBq2SVII6naHRxOfjcOw3cDllFJ9pu1DDEXkT2N4rzLfR Hm6d3zNncM16d8fXu2WyIrGYDMBAL/Lf7Mr0WROeZs8D0CskH/jSZ9mIWiLvvVYjd8yN h8+RYL9hAayLPZb/3ntoV1PxMD7sd3X++nJZ9heNL1dAsPA2xHdi4UtNN1CsHpfWubCN 2kitel82FLC++rCMe5UBZ02ZKGVAABH1N/UaBkPm+EFXnprMjK1za5Q4Ec2AM+k1S16Y SujA== X-Gm-Message-State: AC+VfDzB/Y3AzXUh2sGDQ6sEu4MX9nx77xzFMhcWBCTSneT1c9Vu/qvp CMVlnRAxG/5MkDUv69JJfoa5EBuYWKpHghnRwBE= X-Google-Smtp-Source: ACHHUZ7abiue56gPadfUu/UqVVlTBjYQW2laSSOyl3Zi0LjSLWRmBizKCTL8xIqMwbSClRg7LGZx2yRZ3NbT/L6SHRI= X-Received: by 2002:a25:69d1:0:b0:b9d:eed4:7999 with SMTP id e200-20020a2569d1000000b00b9deed47999mr15187941ybc.25.1683669547110; Tue, 09 May 2023 14:59:07 -0700 (PDT) MIME-Version: 1.0 References: <20230424050329.1501348-1-goldstein.w.n@gmail.com> <20230509031313.3497001-1-goldstein.w.n@gmail.com> <20230509031313.3497001-2-goldstein.w.n@gmail.com> In-Reply-To: <20230509031313.3497001-2-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Tue, 9 May 2023 14:58:31 -0700 Message-ID: Subject: Re: [PATCH v5 2/3] x86: Refactor Intel `init_cpu_features` To: Noah Goldstein Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3022.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, May 8, 2023 at 8:13=E2=80=AFPM Noah Goldstein wrote: > > This patch should have no affect on existing functionality. > > The current code, which has a single switch for model detection and > setting prefered features, is difficult to follow/extend. The cases > use magic numbers and many microarchitectures are missing. This makes > it difficult to reason about what is implemented so far and/or > how/where to add support for new features. > > This patch splits the model detection and preference setting stages so > that CPU preferences can be set based on a complete list of available > microarchitectures, rather than based on model magic numbers. > --- > sysdeps/x86/cpu-features.c | 401 +++++++++++++++++++++++++++++-------- > 1 file changed, 316 insertions(+), 85 deletions(-) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 5bff8ec0b4..bec70c3c49 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -417,6 +417,217 @@ _Static_assert (((index_arch_Fast_Unaligned_Load > =3D=3D index_arch_Fast_Copy_Backward)), > "Incorrect index_arch_Fast_Unaligned_Load"); > > + > +/* Intel Family-6 microarch list. */ > +enum > +{ > + /* Atom processors. */ > + INTEL_ATOM_BONNELL, > + INTEL_ATOM_SALTWELL, > + INTEL_ATOM_SILVERMONT, > + INTEL_ATOM_AIRMONT, > + INTEL_ATOM_GOLDMONT, > + INTEL_ATOM_GOLDMONT_PLUS, > + INTEL_ATOM_SIERRAFOREST, > + INTEL_ATOM_GRANDRIDGE, > + INTEL_ATOM_TREMONT, > + > + /* Bigcore processors. */ > + INTEL_BIGCORE_MEROM, > + INTEL_BIGCORE_PENRYN, > + INTEL_BIGCORE_DUNNINGTON, > + INTEL_BIGCORE_NEHALEM, > + INTEL_BIGCORE_WESTMERE, > + INTEL_BIGCORE_SANDYBRIDGE, > + INTEL_BIGCORE_IVYBRIDGE, > + INTEL_BIGCORE_HASWELL, > + INTEL_BIGCORE_BROADWELL, > + INTEL_BIGCORE_SKYLAKE, > + INTEL_BIGCORE_AMBERLAKE, > + INTEL_BIGCORE_COFFEELAKE, > + INTEL_BIGCORE_WHISKEYLAKE, > + INTEL_BIGCORE_KABYLAKE, > + INTEL_BIGCORE_COMETLAKE, > + INTEL_BIGCORE_SKYLAKE_AVX512, > + INTEL_BIGCORE_CANNONLAKE, > + INTEL_BIGCORE_CASCADELAKE, > + INTEL_BIGCORE_COOPERLAKE, > + INTEL_BIGCORE_ICELAKE, > + INTEL_BIGCORE_TIGERLAKE, > + INTEL_BIGCORE_ROCKETLAKE, > + INTEL_BIGCORE_SAPPHIRERAPIDS, > + INTEL_BIGCORE_RAPTORLAKE, > + INTEL_BIGCORE_EMERALDRAPIDS, > + INTEL_BIGCORE_METEORLAKE, > + INTEL_BIGCORE_LUNARLAKE, > + INTEL_BIGCORE_ARROWLAKE, > + INTEL_BIGCORE_GRANITERAPIDS, > + > + /* Mixed (bigcore + atom SOC). */ > + INTEL_MIXED_LAKEFIELD, > + INTEL_MIXED_ALDERLAKE, > + > + /* KNL. */ > + INTEL_KNIGHTS_MILL, > + INTEL_KNIGHTS_LANDING, > + > + /* Unknown. */ > + INTEL_UNKNOWN, > +}; > + > +static unsigned int > +intel_get_fam6_microarch (unsigned int model, unsigned int stepping) > +{ > + switch (model) > + { > + case 0x1C: > + case 0x26: > + return INTEL_ATOM_BONNELL; > + case 0x27: > + case 0x35: > + case 0x36: > + return INTEL_ATOM_SALTWELL; > + case 0x37: > + case 0x4A: > + case 0x4D: > + case 0x5D: > + return INTEL_ATOM_SILVERMONT; > + case 0x4C: > + case 0x5A: > + case 0x75: > + return INTEL_ATOM_AIRMONT; > + case 0x5C: > + case 0x5F: > + return INTEL_ATOM_GOLDMONT; > + case 0x7A: > + return INTEL_ATOM_GOLDMONT_PLUS; > + case 0xAF: > + return INTEL_ATOM_SIERRAFOREST; > + case 0xB6: > + return INTEL_ATOM_GRANDRIDGE; > + case 0x86: > + case 0x96: > + case 0x9C: > + return INTEL_ATOM_TREMONT; > + case 0x0F: > + case 0x16: > + return INTEL_BIGCORE_MEROM; > + case 0x17: > + return INTEL_BIGCORE_PENRYN; > + case 0x1D: > + return INTEL_BIGCORE_DUNNINGTON; > + case 0x1A: > + case 0x1E: > + case 0x1F: > + case 0x2E: > + return INTEL_BIGCORE_NEHALEM; > + case 0x25: > + case 0x2C: > + case 0x2F: > + return INTEL_BIGCORE_WESTMERE; > + case 0x2A: > + case 0x2D: > + return INTEL_BIGCORE_SANDYBRIDGE; > + case 0x3A: > + case 0x3E: > + return INTEL_BIGCORE_IVYBRIDGE; > + case 0x3C: > + case 0x3F: > + case 0x45: > + case 0x46: > + return INTEL_BIGCORE_HASWELL; > + case 0x3D: > + case 0x47: > + case 0x4F: > + case 0x56: > + return INTEL_BIGCORE_BROADWELL; > + case 0x4E: > + case 0x5E: > + return INTEL_BIGCORE_SKYLAKE; > + case 0x8E: > + switch (stepping) > + { > + case 0x09: > + return INTEL_BIGCORE_AMBERLAKE; > + case 0x0A: > + return INTEL_BIGCORE_COFFEELAKE; > + case 0x0B: > + case 0x0C: > + return INTEL_BIGCORE_WHISKEYLAKE; > + default: > + return INTEL_BIGCORE_KABYLAKE; > + } > + case 0x9E: > + switch (stepping) > + { > + case 0x0A: > + case 0x0B: > + case 0x0C: > + case 0x0D: > + return INTEL_BIGCORE_COFFEELAKE; > + default: > + return INTEL_BIGCORE_KABYLAKE; > + } > + case 0xA5: > + case 0xA6: > + return INTEL_BIGCORE_COMETLAKE; > + case 0x66: > + return INTEL_BIGCORE_CANNONLAKE; > + case 0x55: > + switch (stepping) > + { > + case 0x06: > + case 0x07: > + return INTEL_BIGCORE_CASCADELAKE; > + case 0x0b: > + return INTEL_BIGCORE_COOPERLAKE; > + default: > + return INTEL_BIGCORE_SKYLAKE_AVX512; > + } > + case 0x6A: > + case 0x6C: > + case 0x7D: > + case 0x7E: > + case 0x9D: > + return INTEL_BIGCORE_ICELAKE; > + case 0x8C: > + case 0x8D: > + return INTEL_BIGCORE_TIGERLAKE; > + case 0xA7: > + return INTEL_BIGCORE_ROCKETLAKE; > + case 0x8F: > + return INTEL_BIGCORE_SAPPHIRERAPIDS; > + case 0xB7: > + case 0xBA: > + case 0xBF: > + return INTEL_BIGCORE_RAPTORLAKE; > + case 0xCF: > + return INTEL_BIGCORE_EMERALDRAPIDS; > + case 0xAA: > + case 0xAC: > + return INTEL_BIGCORE_METEORLAKE; > + case 0xbd: > + return INTEL_BIGCORE_LUNARLAKE; > + case 0xc6: > + return INTEL_BIGCORE_ARROWLAKE; > + case 0xAD: > + case 0xAE: > + return INTEL_BIGCORE_GRANITERAPIDS; > + case 0x8A: > + return INTEL_MIXED_LAKEFIELD; > + case 0x97: > + case 0x9A: > + case 0xBE: > + return INTEL_MIXED_ALDERLAKE; > + case 0x85: > + return INTEL_KNIGHTS_MILL; > + case 0x57: > + return INTEL_KNIGHTS_LANDING; > + default: > + return INTEL_UNKNOWN; > + } > +} > + > static inline void > init_cpu_features (struct cpu_features *cpu_features) > { > @@ -453,129 +664,149 @@ init_cpu_features (struct cpu_features *cpu_featu= res) > if (family =3D=3D 0x06) > { > model +=3D extended_model; > - switch (model) > + unsigned int microarch > + =3D intel_get_fam6_microarch (model, stepping); > + > + switch (microarch) > { > - case 0x1c: > - case 0x26: > - /* BSF is slow on Atom. */ > + /* Atom / KNL tuning. */ > + case INTEL_ATOM_BONNELL: Since Saltwell is a shrink of Bonnell, INTEL_ATOM_SALTWELL should be added here. > + /* BSF is slow on Bonnell. */ > cpu_features->preferred[index_arch_Slow_BSF] > - |=3D bit_arch_Slow_BSF; > + |=3D bit_arch_Slow_BSF; > break; > > - case 0x57: > - /* Knights Landing. Enable Silvermont optimizations. */ > - > - case 0x7a: > - /* Unaligned load versions are faster than SSSE3 > - on Goldmont Plus. */ > - > - case 0x5c: > - case 0x5f: > /* Unaligned load versions are faster than SSSE3 > - on Goldmont. */ > + on Airmont, Silvermont, Goldmont, and Goldmont Plus.= */ > + case INTEL_ATOM_AIRMONT: > + case INTEL_ATOM_SILVERMONT: > + case INTEL_ATOM_GOLDMONT: > + case INTEL_ATOM_GOLDMONT_PLUS: > > - case 0x4c: > - case 0x5a: > - case 0x75: > - /* Airmont is a die shrink of Silvermont. */ > + /* Knights Landing. Enable Silvermont optimizations. */ > + case INTEL_KNIGHTS_LANDING: > > - case 0x37: > - case 0x4a: > - case 0x4d: > - case 0x5d: > - /* Unaligned load versions are faster than SSSE3 > - on Silvermont. */ > cpu_features->preferred[index_arch_Fast_Unaligned_Load] > - |=3D (bit_arch_Fast_Unaligned_Load > - | bit_arch_Fast_Unaligned_Copy > - | bit_arch_Prefer_PMINUB_for_stringop > - | bit_arch_Slow_SSE4_2); > + |=3D (bit_arch_Fast_Unaligned_Load > + | bit_arch_Fast_Unaligned_Copy > + | bit_arch_Prefer_PMINUB_for_stringop > + | bit_arch_Slow_SSE4_2); > break; > > - case 0x86: > - case 0x96: > - case 0x9c: > + case INTEL_ATOM_TREMONT: > /* Enable rep string instructions, unaligned load, unaligne= d > - copy, pminub and avoid SSE 4.2 on Tremont. */ > + copy, pminub and avoid SSE 4.2 on Tremont. */ > cpu_features->preferred[index_arch_Fast_Rep_String] > - |=3D (bit_arch_Fast_Rep_String > - | bit_arch_Fast_Unaligned_Load > - | bit_arch_Fast_Unaligned_Copy > - | bit_arch_Prefer_PMINUB_for_stringop > - | bit_arch_Slow_SSE4_2); > + |=3D (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligne= d_Load > + | bit_arch_Fast_Unaligned_Copy > + | bit_arch_Prefer_PMINUB_for_stringop > + | bit_arch_Slow_SSE4_2); > + break; > + > + /* Untuned KNL microarch. */ > + case INTEL_KNIGHTS_MILL: > + /* Untuned atom microarch. */ > + case INTEL_ATOM_SIERRAFOREST: > + case INTEL_ATOM_GRANDRIDGE: > + case INTEL_ATOM_SALTWELL: > break; "break" should be removed to enable the optimizations for processors with AVX. > > + /* Bigcore Tuning. */ > + case INTEL_UNKNOWN: > default: > /* Unknown family 0x06 processors. Assuming this is one > of Core i3/i5/i7 processors if AVX is available. */ > if (!CPU_FEATURES_CPU_P (cpu_features, AVX)) > break; > - /* Fall through. */ > - > - case 0x1a: > - case 0x1e: > - case 0x1f: > - case 0x25: > - case 0x2c: > - case 0x2e: > - case 0x2f: > + case INTEL_BIGCORE_NEHALEM: > + case INTEL_BIGCORE_WESTMERE: > /* Rep string instructions, unaligned load, unaligned copy, > and pminub are fast on Intel Core i3, i5 and i7. */ > cpu_features->preferred[index_arch_Fast_Rep_String] > - |=3D (bit_arch_Fast_Rep_String > - | bit_arch_Fast_Unaligned_Load > - | bit_arch_Fast_Unaligned_Copy > - | bit_arch_Prefer_PMINUB_for_stringop); > + |=3D (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligne= d_Load > + | bit_arch_Fast_Unaligned_Copy > + | bit_arch_Prefer_PMINUB_for_stringop); > + break; > + > + /* Untuned Bigcore microarch. */ > + case INTEL_BIGCORE_SANDYBRIDGE: > + case INTEL_BIGCORE_IVYBRIDGE: > + case INTEL_BIGCORE_HASWELL: > + case INTEL_BIGCORE_BROADWELL: > + case INTEL_BIGCORE_SKYLAKE: > + case INTEL_BIGCORE_AMBERLAKE: > + case INTEL_BIGCORE_COFFEELAKE: > + case INTEL_BIGCORE_WHISKEYLAKE: > + case INTEL_BIGCORE_KABYLAKE: > + case INTEL_BIGCORE_COMETLAKE: > + case INTEL_BIGCORE_SKYLAKE_AVX512: > + case INTEL_BIGCORE_CASCADELAKE: > + case INTEL_BIGCORE_COOPERLAKE: > + case INTEL_BIGCORE_CANNONLAKE: > + case INTEL_BIGCORE_ICELAKE: > + case INTEL_BIGCORE_TIGERLAKE: > + case INTEL_BIGCORE_ROCKETLAKE: > + case INTEL_BIGCORE_RAPTORLAKE: > + case INTEL_BIGCORE_METEORLAKE: > + case INTEL_BIGCORE_LUNARLAKE: > + case INTEL_BIGCORE_ARROWLAKE: > + case INTEL_BIGCORE_SAPPHIRERAPIDS: > + case INTEL_BIGCORE_EMERALDRAPIDS: > + case INTEL_BIGCORE_GRANITERAPIDS: > + break; > + > + /* Untuned Mixed (bigcore + atom SOC). */ > + case INTEL_MIXED_LAKEFIELD: > + case INTEL_MIXED_ALDERLAKE: All these processors should be treated as default. > break; > } > > - /* Disable TSX on some processors to avoid TSX on kernels that > - weren't updated with the latest microcode package (which > - disables broken feature by default). */ > - switch (model) > + /* Disable TSX on some processors to avoid TSX on kernels t= hat > + weren't updated with the latest microcode package (which > + disables broken feature by default). */ > + switch (microarch) > { > - case 0x55: > - if (stepping <=3D 5) > + case INTEL_BIGCORE_SKYLAKE_AVX512: > + /* 0x55 && stepping <=3D 5 is SKYLAKE_AVX512. Cascadelake a= nd > + Cooperlake also have model =3D=3D 0x55 so double check t= he > + stepping to be safe. */ > + if (model =3D=3D 0x55 && stepping <=3D 5) No need to check model =3D=3D 0x55. > goto disable_tsx; > break; > - case 0x8e: > - /* NB: Although the errata documents that for model =3D=3D = 0x8e, > - only 0xb stepping or lower are impacted, the intention o= f > - the errata was to disable TSX on all client processors o= n > - all steppings. Include 0xc stepping which is an Intel > - Core i7-8665U, a client mobile processor. */ > - case 0x9e: > - if (stepping > 0xc) > + > + case INTEL_BIGCORE_SKYLAKE: > + case INTEL_BIGCORE_AMBERLAKE: > + case INTEL_BIGCORE_COFFEELAKE: > + case INTEL_BIGCORE_WHISKEYLAKE: > + case INTEL_BIGCORE_KABYLAKE: > + /* NB: Although the errata documents that for model =3D= =3D 0x8e > + (skylake client), only 0xb stepping or lower are impac= ted, > + the intention of the errata was to disable TSX on all = client > + processors on all steppings. Include 0xc stepping whi= ch is > + an Intel Core i7-8665U, a client mobile processor. */ > + if ((model =3D=3D 0x8e || model =3D=3D 0x9e) && stepping = > 0xc) > break; > - /* Fall through. */ > - case 0x4e: > - case 0x5e: > - { > + > /* Disable Intel TSX and enable RTM_ALWAYS_ABORT for > processors listed in: > > https://www.intel.com/content/www/us/en/support/articles/000059422/proce= ssors.html > */ > -disable_tsx: > + disable_tsx: > CPU_FEATURE_UNSET (cpu_features, HLE); > CPU_FEATURE_UNSET (cpu_features, RTM); > CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT); > - } > - break; > - case 0x3f: > - /* Xeon E7 v3 with stepping >=3D 4 has working TSX. */ > - if (stepping >=3D 4) > break; > - /* Fall through. */ > - case 0x3c: > - case 0x45: > - case 0x46: > - /* Disable Intel TSX on Haswell processors (except Xeon E7 = v3 > - with stepping >=3D 4) to avoid TSX on kernels that weren= 't > - updated with the latest microcode package (which disable= s > - broken feature by default). */ > - CPU_FEATURE_UNSET (cpu_features, RTM); > - break; > + > + case INTEL_BIGCORE_HASWELL: > + /* Xeon E7 v3 (model =3D=3D 0x3f) with stepping >=3D 4 ha= s working > + TSX. Haswell also include other model numbers that ha= ve > + working TSX. */ > + if (model =3D=3D 0x3f && stepping >=3D 4) > + break; > + > + CPU_FEATURE_UNSET (cpu_features, RTM); > + break; > } > } > > -- > 2.34.1 > --=20 H.J.