public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: DJ Delorie <dj@redhat.com>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: libc-alpha@sourceware.org, goldstein.w.n@gmail.com,
	hjl.tools@gmail.com, carlos@systemhalted.org
Subject: Re: [PATCH v9 2/3] x86: Refactor Intel `init_cpu_features`
Date: Thu, 25 May 2023 23:34:47 -0400	[thread overview]
Message-ID: <xnv8gf7oko.fsf@greed.delorie.com> (raw)
In-Reply-To: <20230513051906.1287611-2-goldstein.w.n@gmail.com>


LGTM with a few nits about comments that I don't really care about, but
mention for completeness.

Reviewed-by: DJ Delorie <dj@redhat.com>

Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> writes:
> This patch should have no affect on existing functionality.

> +/* Intel Family-6 microarch list.  */
> +enum
> +{
> +  /* Atom processors.  */
> +  INTEL_ATOM_BONNELL,
> +  INTEL_ATOM_SILVERMONT,
> +  INTEL_ATOM_AIRMONT,
> +  INTEL_ATOM_GOLDMONT,
> +  INTEL_ATOM_GOLDMONT_PLUS,
> +  INTEL_ATOM_SIERRAFOREST,
> +  INTEL_ATOM_GRANDRIDGE,
> +  INTEL_ATOM_TREMONT,
> +
> +  /* Bigcore processors.  */
> +  INTEL_BIGCORE_MEROM,
> +  INTEL_BIGCORE_PENRYN,
> +  INTEL_BIGCORE_DUNNINGTON,
> +  INTEL_BIGCORE_NEHALEM,
> +  INTEL_BIGCORE_WESTMERE,
> +  INTEL_BIGCORE_SANDYBRIDGE,
> +  INTEL_BIGCORE_IVYBRIDGE,
> +  INTEL_BIGCORE_HASWELL,
> +  INTEL_BIGCORE_BROADWELL,
> +  INTEL_BIGCORE_SKYLAKE,
> +  INTEL_BIGCORE_KABYLAKE,
> +  INTEL_BIGCORE_COMETLAKE,
> +  INTEL_BIGCORE_SKYLAKE_AVX512,
> +  INTEL_BIGCORE_CANNONLAKE,
> +  INTEL_BIGCORE_ICELAKE,
> +  INTEL_BIGCORE_TIGERLAKE,
> +  INTEL_BIGCORE_ROCKETLAKE,
> +  INTEL_BIGCORE_SAPPHIRERAPIDS,
> +  INTEL_BIGCORE_RAPTORLAKE,
> +  INTEL_BIGCORE_EMERALDRAPIDS,
> +  INTEL_BIGCORE_METEORLAKE,
> +  INTEL_BIGCORE_LUNARLAKE,
> +  INTEL_BIGCORE_ARROWLAKE,
> +  INTEL_BIGCORE_GRANITERAPIDS,
> +
> +  /* Mixed (bigcore + atom SOC).  */
> +  INTEL_MIXED_LAKEFIELD,
> +  INTEL_MIXED_ALDERLAKE,
> +
> +  /* KNL.  */

(Honestly, not a useful comment... ;-)

> +  INTEL_KNIGHTS_MILL,
> +  INTEL_KNIGHTS_LANDING,
> +
> +  /* Unknown.  */
> +  INTEL_UNKNOWN,
> +};

Ok.

> +static unsigned int
> +intel_get_fam6_microarch (unsigned int model, unsigned int stepping)

stepping is never used, could it be marked notused or something?

> +{
> +  switch (model)
> +    {
 . . .
> +    }
> +}

Ok.


> @@ -453,129 +662,143 @@ init_cpu_features (struct cpu_features *cpu_features)
>        if (family == 0x06)
>  	{
>  	  model += extended_model;
> -	  switch (model)
> +	  unsigned int microarch
> +	      = intel_get_fam6_microarch (model, stepping);
> +
> +	  switch (microarch)

Ok.

>  	    {
> -	    case 0x1c:
> -	    case 0x26:
> -	      /* BSF is slow on Atom.  */
> +	      /* Atom / KNL tuning.  */
> +	    case INTEL_ATOM_BONNELL:
> +	      /* BSF is slow on Bonnell.  */

Ok.

> -	    case 0x57:
> -	      /* Knights Landing.  Enable Silvermont optimizations.  */
> -
> -	    case 0x7a:
> -	      /* Unaligned load versions are faster than SSSE3
> -		 on Goldmont Plus.  */
> -
> -	    case 0x5c:
> -	    case 0x5f:
>  	      /* Unaligned load versions are faster than SSSE3
> -		 on Goldmont.  */
> +		     on Airmont, Silvermont, Goldmont, and Goldmont Plus.  */
> +	    case INTEL_ATOM_AIRMONT:
> +	    case INTEL_ATOM_SILVERMONT:
> +	    case INTEL_ATOM_GOLDMONT:
> +	    case INTEL_ATOM_GOLDMONT_PLUS:

Ok.

> -	    case 0x4c:
> -	    case 0x5a:
> -	    case 0x75:
> -	      /* Airmont is a die shrink of Silvermont.  */
> +          /* Knights Landing.  Enable Silvermont optimizations.  */
> +	    case INTEL_KNIGHTS_LANDING:
>  
> -	    case 0x37:
> -	    case 0x4a:
> -	    case 0x4d:
> -	    case 0x5d:
> -	      /* Unaligned load versions are faster than SSSE3
> -		 on Silvermont.  */

Ok.

>  	      cpu_features->preferred[index_arch_Fast_Unaligned_Load]
> -		|= (bit_arch_Fast_Unaligned_Load
> -		    | bit_arch_Fast_Unaligned_Copy
> -		    | bit_arch_Prefer_PMINUB_for_stringop
> -		    | bit_arch_Slow_SSE4_2);
> +		  |= (bit_arch_Fast_Unaligned_Load
> +		      | bit_arch_Fast_Unaligned_Copy
> +		      | bit_arch_Prefer_PMINUB_for_stringop
> +		      | bit_arch_Slow_SSE4_2);
>  	      break;

Ok.

> -	    case 0x86:
> -	    case 0x96:
> -	    case 0x9c:
> +	    case INTEL_ATOM_TREMONT:

Ok.

>  	      cpu_features->preferred[index_arch_Fast_Rep_String]
> -		|= (bit_arch_Fast_Rep_String
> -		    | bit_arch_Fast_Unaligned_Load
> -		    | bit_arch_Fast_Unaligned_Copy
> -		    | bit_arch_Prefer_PMINUB_for_stringop
> -		    | bit_arch_Slow_SSE4_2);
> +		  |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load

Minor nit: I think the one-per-line is easier to follow.  Ok though.

> +		      | bit_arch_Fast_Unaligned_Copy
> +		      | bit_arch_Prefer_PMINUB_for_stringop
> +		      | bit_arch_Slow_SSE4_2);

Ok.

>  
> +	   /*
> +	    Default tuned Knights microarch.
> +	    case INTEL_KNIGHTS_MILL:
> +        */
> +
> +	   /*
> +	    Default tuned atom microarch.
> +	    case INTEL_ATOM_SIERRAFOREST:
> +	    case INTEL_ATOM_GRANDRIDGE:
> +	   */
> +
> +	      /* Bigcore/Default Tuning.  */
>  	    default:

Ok.

>  	      /* Unknown family 0x06 processors.  Assuming this is one
>  		 of Core i3/i5/i7 processors if AVX is available.  */
>  	      if (!CPU_FEATURES_CPU_P (cpu_features, AVX))
>  		break;

> -	      /* Fall through.  */

This should stay, as we're still falling through.

> -	    case 0x1a:
> -	    case 0x1e:
> -	    case 0x1f:
> -	    case 0x25:
> -	    case 0x2c:
> -	    case 0x2e:
> -	    case 0x2f:
> +	    case INTEL_BIGCORE_NEHALEM:
> +	    case INTEL_BIGCORE_WESTMERE:

Ok.

>  	      /* Rep string instructions, unaligned load, unaligned copy,
>  		 and pminub are fast on Intel Core i3, i5 and i7.  */
>  	      cpu_features->preferred[index_arch_Fast_Rep_String]
> -		|= (bit_arch_Fast_Rep_String
> -		    | bit_arch_Fast_Unaligned_Load
> -		    | bit_arch_Fast_Unaligned_Copy
> -		    | bit_arch_Prefer_PMINUB_for_stringop);
> +		  |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load
> +		      | bit_arch_Fast_Unaligned_Copy
> +		      | bit_arch_Prefer_PMINUB_for_stringop);

Same nit as before.

>  	      break;
> +
> +	   /*
> +	    Default tuned Bigcore microarch.

This is a bit confusing, because these cases are not actually active
here.  It seems to mean "these CPUs will end up here due to logic above"
but why not make that explicit, either by saying so, or activating these
cases?
[future me: they're used in 3/3, so ok]

> +	    case INTEL_BIGCORE_SANDYBRIDGE:
> +	    case INTEL_BIGCORE_IVYBRIDGE:
> +	    case INTEL_BIGCORE_HASWELL:
> +	    case INTEL_BIGCORE_BROADWELL:
> +	    case INTEL_BIGCORE_SKYLAKE:
> +	    case INTEL_BIGCORE_KABYLAKE:
> +	    case INTEL_BIGCORE_COMETLAKE:
> +	    case INTEL_BIGCORE_SKYLAKE_AVX512:
> +	    case INTEL_BIGCORE_CANNONLAKE:
> +	    case INTEL_BIGCORE_ICELAKE:
> +	    case INTEL_BIGCORE_TIGERLAKE:
> +	    case INTEL_BIGCORE_ROCKETLAKE:
> +	    case INTEL_BIGCORE_RAPTORLAKE:
> +	    case INTEL_BIGCORE_METEORLAKE:
> +	    case INTEL_BIGCORE_LUNARLAKE:
> +	    case INTEL_BIGCORE_ARROWLAKE:
> +	    case INTEL_BIGCORE_SAPPHIRERAPIDS:
> +	    case INTEL_BIGCORE_EMERALDRAPIDS:
> +	    case INTEL_BIGCORE_GRANITERAPIDS:
> +	    */
> +
> +	   /*
> +	    Default tuned Mixed (bigcore + atom SOC).
> +	    case INTEL_MIXED_LAKEFIELD:
> +	    case INTEL_MIXED_ALDERLAKE:
> +	    */
>  	    }

Ok.

> -	 /* Disable TSX on some processors to avoid TSX on kernels that
> -	    weren't updated with the latest microcode package (which
> -	    disables broken feature by default).  */
> -	 switch (model)
> +	      /* Disable TSX on some processors to avoid TSX on kernels that
> +		 weren't updated with the latest microcode package (which
> +		 disables broken feature by default).  */
> +	  switch (microarch)

Ok.

> -	    case 0x55:
> +	    case INTEL_BIGCORE_SKYLAKE_AVX512:
> +	      /* 0x55 (Skylake-avx512) && stepping <= 5 disable TSX. */
>  	      if (stepping <= 5)
>  		goto disable_tsx;

Ok.

> -	    case 0x8e:
> -	      /* NB: Although the errata documents that for model == 0x8e,
> -		 only 0xb stepping or lower are impacted, the intention of
> -		 the errata was to disable TSX on all client processors on
> -		 all steppings.  Include 0xc stepping which is an Intel
> -		 Core i7-8665U, a client mobile processor.  */
> -	    case 0x9e:
> -	      if (stepping > 0xc)
> +
> +	    case INTEL_BIGCORE_SKYLAKE:
> +	    case INTEL_BIGCORE_KABYLAKE:
> +		/* NB: Although the errata documents that for model == 0x8e
> +		   (skylake client), only 0xb stepping or lower are impacted,
> +		   the intention of the errata was to disable TSX on all client
> +		   processors on all steppings.  Include 0xc stepping which is
> +		   an Intel Core i7-8665U, a client mobile processor.  */
> +		if ((model == 0x8e || model == 0x9e) && stepping > 0xc)

Could have just used INTEL_BIGCORE_KABYLAKE instead of model numbers
here, but ok.

>  		break;
> -	      /* Fall through.  */
> -	    case 0x4e:
> -	    case 0x5e:
> -	      {
> +
Ok.

>  		   processors listed in:
>  
>  https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html
>  		 */
> -disable_tsx:
> +	    disable_tsx:
>  		CPU_FEATURE_UNSET (cpu_features, HLE);
>  		CPU_FEATURE_UNSET (cpu_features, RTM);
>  		CPU_FEATURE_SET (cpu_features, RTM_ALWAYS_ABORT);
> -	      }
> -	      break;

Matches brace removal above.  Ok.

> -	    case 0x3f:
> -	      /* Xeon E7 v3 with stepping >= 4 has working TSX.  */
> -	      if (stepping >= 4)
>  		break;
> -	      /* Fall through.  */
> -	    case 0x3c:
> -	    case 0x45:
> -	    case 0x46:
> -	      /* Disable Intel TSX on Haswell processors (except Xeon E7 v3
> -		 with stepping >= 4) to avoid TSX on kernels that weren't
> -		 updated with the latest microcode package (which disables
> -		 broken feature by default).  */
> -	      CPU_FEATURE_UNSET (cpu_features, RTM);
> -	      break;
> +
> +	    case INTEL_BIGCORE_HASWELL:
> +		/* Xeon E7 v3 (model == 0x3f) with stepping >= 4 has working
> +		   TSX.  Haswell also include other model numbers that have
> +		   working TSX.  */
> +		if (model == 0x3f && stepping >= 4)
> +		break;
> +
> +		CPU_FEATURE_UNSET (cpu_features, RTM);
> +		break;
>  	    }
>  	}

Ok.


  parent reply	other threads:[~2023-05-26  3:34 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24  5:03 [PATCH v1] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 2` Noah Goldstein
2023-04-24 18:09 ` H.J. Lu
2023-04-24 18:34   ` Noah Goldstein
2023-04-24 20:44     ` H.J. Lu
2023-04-24 22:30       ` Noah Goldstein
2023-04-24 22:30 ` [PATCH v2] " Noah Goldstein
2023-04-24 22:48   ` H.J. Lu
2023-04-25  2:05     ` Noah Goldstein
2023-04-25  2:55       ` H.J. Lu
2023-04-25  3:43         ` Noah Goldstein
2023-04-25  3:43 ` [PATCH v3] " Noah Goldstein
2023-04-25 17:42   ` H.J. Lu
2023-04-25 21:45     ` Noah Goldstein
2023-04-25 21:45 ` [PATCH v4] " Noah Goldstein
2023-04-26 15:59   ` H.J. Lu
2023-04-26 17:15     ` Noah Goldstein
2023-05-04  3:28       ` Noah Goldstein
2023-05-05 18:06         ` H.J. Lu
2023-05-09  3:14           ` Noah Goldstein
2023-05-09  3:13 ` [PATCH v5 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-05-09  3:13   ` [PATCH v5 2/3] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-09 21:58     ` H.J. Lu
2023-05-10  0:33       ` Noah Goldstein
2023-05-09  3:13   ` [PATCH v5 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-10  0:33 ` [PATCH v6 1/4] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-05-10  0:33   ` [PATCH v6 2/4] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-10 22:13     ` H.J. Lu
2023-05-10 23:17       ` Noah Goldstein
2023-05-11 21:36         ` H.J. Lu
2023-05-12  5:11           ` Noah Goldstein
2023-05-10  0:33   ` [PATCH v6 3/4] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-10  0:33   ` [PATCH v6 4/4] x86: Tune 'Saltwell' microarch the same was a 'Bonnell' Noah Goldstein
2023-05-10 22:04     ` H.J. Lu
2023-05-10 22:12       ` Noah Goldstein
2023-05-10 15:55   ` [PATCH v6 1/4] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` H.J. Lu
2023-05-10 16:07     ` Noah Goldstein
2023-05-10 22:12 ` [PATCH v7 2/4] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-10 22:12   ` [PATCH v7 3/4] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-10 22:12   ` [PATCH v7 4/4] x86: Tune 'Saltwell' microarch the same was a 'Bonnell' Noah Goldstein
2023-05-12  5:12     ` Noah Goldstein
2023-05-12  5:10 ` [PATCH v8 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-05-12  5:10   ` [PATCH v8 2/3] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-12 22:17     ` H.J. Lu
2023-05-13  5:18       ` Noah Goldstein
2023-05-12 22:03 ` [PATCH v8 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-13  5:19 ` [PATCH v9 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-05-13  5:19   ` [PATCH v9 2/3] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-15 20:57     ` H.J. Lu
2023-05-26  3:34     ` DJ Delorie [this message]
2023-05-27 18:46       ` Noah Goldstein
2023-05-13  5:19   ` [PATCH v9 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-26  3:34     ` DJ Delorie
2023-05-27 18:46       ` Noah Goldstein
2023-05-15 18:29   ` [PATCH v9 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-05-17 12:00     ` Carlos O'Donell
2023-05-26  3:34   ` DJ Delorie
2023-05-27 18:46 ` [PATCH v10 " Noah Goldstein
2023-05-27 18:46   ` [PATCH v10 2/3] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-05-27 18:46   ` [PATCH v10 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-05-31  2:33     ` DJ Delorie
2023-07-10  5:23     ` Sajan Karumanchi
2023-07-10 15:58       ` Noah Goldstein
2023-07-14  2:21         ` Re: Noah Goldstein
2023-07-14  7:39         ` Re: sajan karumanchi
2023-06-07  0:15   ` [PATCH v10 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Carlos O'Donell
2023-06-07 18:18     ` Noah Goldstein
2023-06-07 18:18 ` [PATCH v11 " Noah Goldstein
2023-06-07 18:18   ` [PATCH v11 2/3] x86: Refactor Intel `init_cpu_features` Noah Goldstein
2023-06-07 18:18   ` [PATCH v11 3/3] x86: Make the divisor in setting `non_temporal_threshold` cpu specific Noah Goldstein
2023-06-07 18:19   ` [PATCH v11 1/3] x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Noah Goldstein
2023-08-14 23:00   ` Noah Goldstein
2023-08-22 15:11     ` Noah Goldstein
2023-08-24 17:06       ` Noah Goldstein
2023-08-28 20:02         ` Noah Goldstein
2023-09-05 15:37           ` Noah Goldstein
2023-09-12  3:50             ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xnv8gf7oko.fsf@greed.delorie.com \
    --to=dj@redhat.com \
    --cc=carlos@systemhalted.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).