From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by sourceware.org (Postfix) with ESMTPS id 8C76C3858D38 for ; Sat, 13 May 2023 05:18:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8C76C3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1929818d7faso61667071fac.0 for ; Fri, 12 May 2023 22:18:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683955127; x=1686547127; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CD0y9YnVa/wf6TFkbRQ6f/qZ7vJU3X4ozxdFrTLStCk=; b=DKGpNBDQBeHwcaN40UPWomyKBCO9dh7m7S3mgbTu5QpP77dj9qTPHsPHVSxKsCfIYA qPlq1YtE3iV6PmL7HSM8MMRktod1YwIE+qP6aauTqwV+1wNcT60omjm9s63ZT6jNwLj6 MXPbKBSF6ITo61jP8F9UHi/kheSrgrNiTlDjm7olx/Bb4GbZEGdGuma2O35J2H7WzcAE qXTbfex7IjUFRzqnvZIYdgpzG9yIXbrAkEmr6og5T/U9IHomMgo60Zfo7NM27aVI008I o8WbhIP6zVLS6/lop6gvQgJrRZzZ/HwL2ISnuBHWZWmwWffDvdtQb7cHd6Jxb8Wid6Rv aL0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683955127; x=1686547127; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CD0y9YnVa/wf6TFkbRQ6f/qZ7vJU3X4ozxdFrTLStCk=; b=QEFcKRoKx+vxDzqtTTpDm9S5ES6YhGOa9rtUzQlbA+FGs9VR1xvZv0SX61Jrv9Izms kBdCP31mjEMQ27fVC/TeNJ9IXzcT7fIj07Bc4jBybe5L8XjwNRZyANTOU3TopDy1faYp IxmUM6PHH5XY6/E9slp8SKh17o4SmDKgPoUn5jU31xllniRX+SP34EWxIp3tOGvwP4I0 Zf6Tyglf/2vzWFPt5tpRnpMiyOTR+IGstyA2zA7gVxe58Sf6XAM1itTKk5Ql0zi9sM7M A375KbYNMBN2dfCgmjjSDYsKnSt9kq9Jn6kIVdO8NIMIrCDvO9sqogafqyTr3WeERA0+ mZFA== X-Gm-Message-State: AC+VfDxotVszKRBw0rqt6EPTbmZBp8iun0hdap1+O1ar1d2n12XEDFlk wgYrv3l6tZCwK5e7nGZ3VqGsVWANkvA3hZ5Lnf6Wvksh X-Google-Smtp-Source: ACHHUZ7yBoAMdJSQVz9oB8lK//ffTzF/+UJSjtQwlAFG7X0ELa/dY8aw4ImS10R19RSKva2KIQsYTun/SKeKIskUv4w= X-Received: by 2002:aca:f287:0:b0:389:5120:9506 with SMTP id q129-20020acaf287000000b0038951209506mr6409222oih.8.1683955126605; Fri, 12 May 2023 22:18:46 -0700 (PDT) MIME-Version: 1.0 References: <20230424050329.1501348-1-goldstein.w.n@gmail.com> <20230512051024.1905173-1-goldstein.w.n@gmail.com> <20230512051024.1905173-2-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Sat, 13 May 2023 00:18:34 -0500 Message-ID: Subject: Re: [PATCH v8 2/3] x86: Refactor Intel `init_cpu_features` To: "H.J. Lu" Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, May 12, 2023 at 5:18=E2=80=AFPM H.J. Lu wrote= : > > On Thu, May 11, 2023 at 10:11=E2=80=AFPM Noah Goldstein wrote: > > > > This patch should have no affect on existing functionality. > > > > The current code, which has a single switch for model detection and > > setting prefered features, is difficult to follow/extend. The cases > > use magic numbers and many microarchitectures are missing. This makes > > it difficult to reason about what is implemented so far and/or > > how/where to add support for new features. > > > > This patch splits the model detection and preference setting stages so > > that CPU preferences can be set based on a complete list of available > > microarchitectures, rather than based on model magic numbers. > > --- > > sysdeps/x86/cpu-features.c | 400 +++++++++++++++++++++++++++++-------- > > 1 file changed, 316 insertions(+), 84 deletions(-) > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index 5bff8ec0b4..264d309dd7 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -417,6 +417,218 @@ _Static_assert (((index_arch_Fast_Unaligned_Load > > =3D=3D index_arch_Fast_Copy_Backward)), > > "Incorrect index_arch_Fast_Unaligned_Load"); > > > > + > > +/* Intel Family-6 microarch list. */ > > +enum > > +{ > > + /* Atom processors. */ > > + INTEL_ATOM_BONNELL, > > + INTEL_ATOM_SILVERMONT, > > + INTEL_ATOM_AIRMONT, > > + INTEL_ATOM_GOLDMONT, > > + INTEL_ATOM_GOLDMONT_PLUS, > > + INTEL_ATOM_SIERRAFOREST, > > + INTEL_ATOM_GRANDRIDGE, > > + INTEL_ATOM_TREMONT, > > + > > + /* Bigcore processors. */ > > + INTEL_BIGCORE_MEROM, > > + INTEL_BIGCORE_PENRYN, > > + INTEL_BIGCORE_DUNNINGTON, > > + INTEL_BIGCORE_NEHALEM, > > + INTEL_BIGCORE_WESTMERE, > > + INTEL_BIGCORE_SANDYBRIDGE, > > + INTEL_BIGCORE_IVYBRIDGE, > > + INTEL_BIGCORE_HASWELL, > > + INTEL_BIGCORE_BROADWELL, > > + INTEL_BIGCORE_SKYLAKE, > > + INTEL_BIGCORE_AMBERLAKE, > > + INTEL_BIGCORE_COFFEELAKE, > > + INTEL_BIGCORE_WHISKEYLAKE, > > + INTEL_BIGCORE_KABYLAKE, > > + INTEL_BIGCORE_COMETLAKE, > > + INTEL_BIGCORE_SKYLAKE_AVX512, > > + INTEL_BIGCORE_CANNONLAKE, > > + INTEL_BIGCORE_ICELAKE, > > + INTEL_BIGCORE_TIGERLAKE, > > + INTEL_BIGCORE_ROCKETLAKE, > > + INTEL_BIGCORE_SAPPHIRERAPIDS, > > + INTEL_BIGCORE_RAPTORLAKE, > > + INTEL_BIGCORE_EMERALDRAPIDS, > > + INTEL_BIGCORE_METEORLAKE, > > + INTEL_BIGCORE_LUNARLAKE, > > + INTEL_BIGCORE_ARROWLAKE, > > + INTEL_BIGCORE_GRANITERAPIDS, > > + > > + /* Mixed (bigcore + atom SOC). */ > > + INTEL_MIXED_LAKEFIELD, > > + INTEL_MIXED_ALDERLAKE, > > + > > + /* KNL. */ > > + INTEL_KNIGHTS_MILL, > > + INTEL_KNIGHTS_LANDING, > > + > > + /* Unknown. */ > > + INTEL_UNKNOWN, > > +}; > > + > > +static unsigned int > > +intel_get_fam6_microarch (unsigned int model, unsigned int stepping) > > +{ > > + switch (model) > > + { > > + case 0x1C: > > + case 0x26: > > + return INTEL_ATOM_BONNELL; > > + case 0x27: > > + case 0x35: > > + case 0x36: > > + /* Really Saltwell, but Saltwell is just a die shrink of Bonnell > > + (microarchitecturally identical). */ > > + return INTEL_ATOM_BONNELL; > > + case 0x37: > > + case 0x4A: > > + case 0x4D: > > + case 0x5D: > > + return INTEL_ATOM_SILVERMONT; > > + case 0x4C: > > + case 0x5A: > > + case 0x75: > > + return INTEL_ATOM_AIRMONT; > > + case 0x5C: > > + case 0x5F: > > + return INTEL_ATOM_GOLDMONT; > > + case 0x7A: > > + return INTEL_ATOM_GOLDMONT_PLUS; > > + case 0xAF: > > + return INTEL_ATOM_SIERRAFOREST; > > + case 0xB6: > > + return INTEL_ATOM_GRANDRIDGE; > > + case 0x86: > > + case 0x96: > > + case 0x9C: > > + return INTEL_ATOM_TREMONT; > > + case 0x0F: > > + case 0x16: > > + return INTEL_BIGCORE_MEROM; > > + case 0x17: > > + return INTEL_BIGCORE_PENRYN; > > + case 0x1D: > > + return INTEL_BIGCORE_DUNNINGTON; > > + case 0x1A: > > + case 0x1E: > > + case 0x1F: > > + case 0x2E: > > + return INTEL_BIGCORE_NEHALEM; > > + case 0x25: > > + case 0x2C: > > + case 0x2F: > > + return INTEL_BIGCORE_WESTMERE; > > + case 0x2A: > > + case 0x2D: > > + return INTEL_BIGCORE_SANDYBRIDGE; > > + case 0x3A: > > + case 0x3E: > > + return INTEL_BIGCORE_IVYBRIDGE; > > + case 0x3C: > > + case 0x3F: > > + case 0x45: > > + case 0x46: > > + return INTEL_BIGCORE_HASWELL; > > + case 0x3D: > > + case 0x47: > > + case 0x4F: > > + case 0x56: > > + return INTEL_BIGCORE_BROADWELL; > > + case 0x4E: > > + case 0x5E: > > + return INTEL_BIGCORE_SKYLAKE; > > + case 0x8E: > > + switch (stepping) > > + { > > + case 0x09: > > + return INTEL_BIGCORE_AMBERLAKE; > > + case 0x0A: > > + return INTEL_BIGCORE_COFFEELAKE; > > + case 0x0B: > > + case 0x0C: > > + return INTEL_BIGCORE_WHISKEYLAKE; > > + default: > > + return INTEL_BIGCORE_KABYLAKE; > > + } > > + case 0x9E: > > + switch (stepping) > > + { > > + case 0x0A: > > + case 0x0B: > > + case 0x0C: > > + case 0x0D: > > + return INTEL_BIGCORE_COFFEELAKE; > > + default: > > + return INTEL_BIGCORE_KABYLAKE; > > + } > > All these stepping checks can be put in comments. Okay. Should we drop commetlake / cannonlake / kabylake identifiers as well and only use skylake/skylake-avx512? > > > + case 0xA5: > > + case 0xA6: > > + return INTEL_BIGCORE_COMETLAKE; > > + case 0x66: > > + return INTEL_BIGCORE_CANNONLAKE; > > + case 0x55: > > + /* > > + Stepping =3D {6, 7} > > + -> Cascadelake > > + Stepping =3D {11} > > + -> Cooperlake > > + else > > + -> Skylake-avx512 > > + > > + These are all microarchitecturally indentical, so use > > + Skylake-avx512 for all of them. > > + */ > > + return INTEL_BIGCORE_SKYLAKE_AVX512; > > + case 0x6A: > > + case 0x6C: > > + case 0x7D: > > + case 0x7E: > > + case 0x9D: > > + return INTEL_BIGCORE_ICELAKE; > > + case 0x8C: > > + case 0x8D: > > + return INTEL_BIGCORE_TIGERLAKE; > > + case 0xA7: > > + return INTEL_BIGCORE_ROCKETLAKE; > > + case 0x8F: > > + return INTEL_BIGCORE_SAPPHIRERAPIDS; > > + case 0xB7: > > + case 0xBA: > > + case 0xBF: > > + return INTEL_BIGCORE_RAPTORLAKE; > > + case 0xCF: > > + return INTEL_BIGCORE_EMERALDRAPIDS; > > + case 0xAA: > > + case 0xAC: > > + return INTEL_BIGCORE_METEORLAKE; > > + case 0xbd: > > + return INTEL_BIGCORE_LUNARLAKE; > > + case 0xc6: > > + return INTEL_BIGCORE_ARROWLAKE; > > + case 0xAD: > > + case 0xAE: > > + return INTEL_BIGCORE_GRANITERAPIDS; > > + case 0x8A: > > + return INTEL_MIXED_LAKEFIELD; > > + case 0x97: > > + case 0x9A: > > + case 0xBE: > > + return INTEL_MIXED_ALDERLAKE; > > + case 0x85: > > + return INTEL_KNIGHTS_MILL; > > + case 0x57: > > + return INTEL_KNIGHTS_LANDING; > > + default: > > + return INTEL_UNKNOWN; > > + } > > +} > > + > > static inline void > > init_cpu_features (struct cpu_features *cpu_features) > > { > > @@ -453,129 +665,149 @@ init_cpu_features (struct cpu_features *cpu_fea= tures) > > if (family =3D=3D 0x06) > > { > > model +=3D extended_model; > > - switch (model) > > + unsigned int microarch > > + =3D intel_get_fam6_microarch (model, stepping); > > + > > + switch (microarch) > > { > > - case 0x1c: > > - case 0x26: > > - /* BSF is slow on Atom. */ > > + /* Atom / KNL tuning. */ > > + case INTEL_ATOM_BONNELL: > > + /* BSF is slow on Bonnell. */ > > cpu_features->preferred[index_arch_Slow_BSF] > > - |=3D bit_arch_Slow_BSF; > > + |=3D bit_arch_Slow_BSF; > > break; > > > > - case 0x57: > > - /* Knights Landing. Enable Silvermont optimizations. */ > > - > > - case 0x7a: > > - /* Unaligned load versions are faster than SSSE3 > > - on Goldmont Plus. */ > > - > > - case 0x5c: > > - case 0x5f: > > /* Unaligned load versions are faster than SSSE3 > > - on Goldmont. */ > > + on Airmont, Silvermont, Goldmont, and Goldmont Plu= s. */ > > + case INTEL_ATOM_AIRMONT: > > + case INTEL_ATOM_SILVERMONT: > > + case INTEL_ATOM_GOLDMONT: > > + case INTEL_ATOM_GOLDMONT_PLUS: > > > > - case 0x4c: > > - case 0x5a: > > - case 0x75: > > - /* Airmont is a die shrink of Silvermont. */ > > + /* Knights Landing. Enable Silvermont optimizations. */ > > + case INTEL_KNIGHTS_LANDING: > > > > - case 0x37: > > - case 0x4a: > > - case 0x4d: > > - case 0x5d: > > - /* Unaligned load versions are faster than SSSE3 > > - on Silvermont. */ > > cpu_features->preferred[index_arch_Fast_Unaligned_Load] > > - |=3D (bit_arch_Fast_Unaligned_Load > > - | bit_arch_Fast_Unaligned_Copy > > - | bit_arch_Prefer_PMINUB_for_stringop > > - | bit_arch_Slow_SSE4_2); > > + |=3D (bit_arch_Fast_Unaligned_Load > > + | bit_arch_Fast_Unaligned_Copy > > + | bit_arch_Prefer_PMINUB_for_stringop > > + | bit_arch_Slow_SSE4_2); > > break; > > > > - case 0x86: > > - case 0x96: > > - case 0x9c: > > + case INTEL_ATOM_TREMONT: > > /* Enable rep string instructions, unaligned load, unalig= ned > > - copy, pminub and avoid SSE 4.2 on Tremont. */ > > + copy, pminub and avoid SSE 4.2 on Tremont. */ > > cpu_features->preferred[index_arch_Fast_Rep_String] > > - |=3D (bit_arch_Fast_Rep_String > > - | bit_arch_Fast_Unaligned_Load > > - | bit_arch_Fast_Unaligned_Copy > > - | bit_arch_Prefer_PMINUB_for_stringop > > - | bit_arch_Slow_SSE4_2); > > + |=3D (bit_arch_Fast_Rep_String | bit_arch_Fast_Unalig= ned_Load > > + | bit_arch_Fast_Unaligned_Copy > > + | bit_arch_Prefer_PMINUB_for_stringop > > + | bit_arch_Slow_SSE4_2); > > break; > > > > + /* > > + Default tuned KNL microarch. > KNM Err, trying to refer to the whole "Knights_..." lineup. Changed to "Knights= ". > > + case INTEL_KNIGHTS_MILL: > > + */ > > + > > + /* > > + Default tuned atom microarch. > > + case INTEL_ATOM_SIERRAFOREST: > > + case INTEL_ATOM_GRANDRIDGE: > > + */ > > + > > + /* Bigcore/Default Tuning. */ > > default: > > /* Unknown family 0x06 processors. Assuming this is one > > of Core i3/i5/i7 processors if AVX is available. */ > > if (!CPU_FEATURES_CPU_P (cpu_features, AVX)) > > break; > > - /* Fall through. */ > > - > > - case 0x1a: > > - case 0x1e: > > - case 0x1f: > > - case 0x25: > > - case 0x2c: > > - case 0x2e: > > - case 0x2f: > > + case INTEL_BIGCORE_NEHALEM: > > + case INTEL_BIGCORE_WESTMERE: > > > > /* Rep string instructions, unaligned load, unaligned cop= y, > > and pminub are fast on Intel Core i3, i5 and i7. */ > > cpu_features->preferred[index_arch_Fast_Rep_String] > > - |=3D (bit_arch_Fast_Rep_String > > - | bit_arch_Fast_Unaligned_Load > > - | bit_arch_Fast_Unaligned_Copy > > - | bit_arch_Prefer_PMINUB_for_stringop); > > + |=3D (bit_arch_Fast_Rep_String | bit_arch_Fast_Unalig= ned_Load > > + | bit_arch_Fast_Unaligned_Copy > > + | bit_arch_Prefer_PMINUB_for_stringop); > > break; > > + > > + /* > > + Default tuned Bigcore microarch. > > + case INTEL_BIGCORE_SANDYBRIDGE: > > + case INTEL_BIGCORE_IVYBRIDGE: > > + case INTEL_BIGCORE_HASWELL: > > + case INTEL_BIGCORE_BROADWELL: > > + case INTEL_BIGCORE_SKYLAKE: > > + case INTEL_BIGCORE_AMBERLAKE: > > + case INTEL_BIGCORE_COFFEELAKE: > > + case INTEL_BIGCORE_WHISKEYLAKE: > > + case INTEL_BIGCORE_KABYLAKE: > > + case INTEL_BIGCORE_COMETLAKE: > > + case INTEL_BIGCORE_SKYLAKE_AVX512: > > + case INTEL_BIGCORE_CANNONLAKE: > > + case INTEL_BIGCORE_ICELAKE: > > + case INTEL_BIGCORE_TIGERLAKE: > > + case INTEL_BIGCORE_ROCKETLAKE: > > + case INTEL_BIGCORE_RAPTORLAKE: > > + case INTEL_BIGCORE_METEORLAKE: > > + case INTEL_BIGCORE_LUNARLAKE: > > + case INTEL_BIGCORE_ARROWLAKE: > > + case INTEL_BIGCORE_SAPPHIRERAPIDS: > > + case INTEL_BIGCORE_EMERALDRAPIDS: > > + case INTEL_BIGCORE_GRANITERAPIDS: > > + */ > > + > > + /* > > + Default tuned Mixed (bigcore + atom SOC). > > + case INTEL_MIXED_LAKEFIELD: > > + case INTEL_MIXED_ALDERLAKE: > > + */ > > } > > > > - /* Disable TSX on some processors to avoid TSX on kernels that > > - weren't updated with the latest microcode package (which > > - disables broken feature by default). */ > > - switch (model) > > + /* Disable TSX on some processors to avoid TSX on kernels= that > > + weren't updated with the latest microcode package (whi= ch > > + disables broken feature by default). */ > > + switch (microarch) > > { > > - case 0x55: > > + case INTEL_BIGCORE_SKYLAKE_AVX512: > > + /* 0x55 (Skylake-avx512) && stepping <=3D 5 disable TSX. = */ > > if (stepping <=3D 5) > > goto disable_tsx; > > break; > > - case 0x8e: > > - /* NB: Although the errata documents that for model =3D= =3D 0x8e, > > - only 0xb stepping or lower are impacted, the intention= of > > - the errata was to disable TSX on all client processors= on > > - all steppings. Include 0xc stepping which is an Intel > > - Core i7-8665U, a client mobile processor. */ > > - case 0x9e: > > - if (stepping > 0xc) > > + > > + case INTEL_BIGCORE_SKYLAKE: > > + case INTEL_BIGCORE_AMBERLAKE: > > + case INTEL_BIGCORE_COFFEELAKE: > > + case INTEL_BIGCORE_WHISKEYLAKE: > > + case INTEL_BIGCORE_KABYLAKE: > > + /* NB: Although the errata documents that for model =3D= =3D 0x8e > > + (skylake client), only 0xb stepping or lower are imp= acted, > > + the intention of the errata was to disable TSX on al= l client > > + processors on all steppings. Include 0xc stepping w= hich is > > + an Intel Core i7-8665U, a client mobile processor. = */ > > + if ((model =3D=3D 0x8e || model =3D=3D 0x9e) && steppin= g > 0xc) > > break; > > - /* Fall through. */ > > - case 0x4e: > > - case 0x5e: > > - { > > + > > /* Disable Intel TSX and enable RTM_ALWAYS_ABORT for > > processors listed in: > > > > https://www.intel.com/content/www/us/en/support/articles/000059422/pro= cessors.html > > */ > > -disable_tsx: > > + disable_tsx: > > CPU_FEATURE_UNSET (cpu_features, HLE); > > CPU_FEATURE_UNSET (cpu_features, RTM); > > CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT); > > - } > > - break; > > - case 0x3f: > > - /* Xeon E7 v3 with stepping >=3D 4 has working TSX. */ > > - if (stepping >=3D 4) > > break; > > - /* Fall through. */ > > - case 0x3c: > > - case 0x45: > > - case 0x46: > > - /* Disable Intel TSX on Haswell processors (except Xeon E= 7 v3 > > - with stepping >=3D 4) to avoid TSX on kernels that wer= en't > > - updated with the latest microcode package (which disab= les > > - broken feature by default). */ > > - CPU_FEATURE_UNSET (cpu_features, RTM); > > - break; > > + > > + case INTEL_BIGCORE_HASWELL: > > + /* Xeon E7 v3 (model =3D=3D 0x3f) with stepping >=3D 4 = has working > > + TSX. Haswell also include other model numbers that = have > > + working TSX. */ > > + if (model =3D=3D 0x3f && stepping >=3D 4) > > + break; > > + > > + CPU_FEATURE_UNSET (cpu_features, RTM); > > + break; > > } > > } > > > > -- > > 2.34.1 > > > > > -- > H.J.