public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Commit 27d3ce1467990f89126e228559dec8f84b96c60e?
@ 2019-11-26 15:01 Carlos O'Donell
  2019-12-03 17:43 ` H.J. Lu
  0 siblings, 1 reply; 2+ messages in thread
From: Carlos O'Donell @ 2019-11-26 15:01 UTC (permalink / raw)
  To: H.J. Lu, libc-alpha; +Cc: Florian Weimer

HJ,

In commit 27d3ce1467990f89126e228559dec8f84b96c60e we stop
setting bit_arch_Fast_Copy_Backward for Intel Core processors
as an optimization to improve performance.

It turns out that this change also improves performance for
Haswell servers. Was it the intent of this change to *also*
improve performance for Haswell? The comments don't indicate
this and I was worried that it might be an unintentional change
in this case. The particular CPU was a E5-2650 v3.

If we step back and look at the overall sequence of changes and
performance it looks like this:

The performance regression is between this change:
c3d8dc45c9df199b8334599a6cbd98c9950dba62 - Triggers default: handling + TSX handling.
- Causes a 21% lmbench regression for an E5-2650 v3.

and this change (the one we are discussing):
27d3ce1467990f89126e228559dec8f84b96c60e - Removes bit_arch_Fast_Copy_Backward.
- Restores the performance loss.

My worry is that the two are unrelated, and that we've only
just made back performance at the expense of the other change
and we could be doing better.

As our Intel expert what do you think is going on here?

-- 
Cheers,
Carlos.

[1] https://ark.intel.com/content/www/us/en/ark/products/81705/intel-xeon-processor-e5-2650-v3-25m-cache-2-30-ghz.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Commit 27d3ce1467990f89126e228559dec8f84b96c60e?
  2019-11-26 15:01 Commit 27d3ce1467990f89126e228559dec8f84b96c60e? Carlos O'Donell
@ 2019-12-03 17:43 ` H.J. Lu
  0 siblings, 0 replies; 2+ messages in thread
From: H.J. Lu @ 2019-12-03 17:43 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha, Florian Weimer

On Tue, Nov 26, 2019 at 7:01 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> HJ,
>
> In commit 27d3ce1467990f89126e228559dec8f84b96c60e we stop
> setting bit_arch_Fast_Copy_Backward for Intel Core processors
> as an optimization to improve performance.
>
> It turns out that this change also improves performance for
> Haswell servers. Was it the intent of this change to *also*
> improve performance for Haswell? The comments don't indicate
> this and I was worried that it might be an unintentional change
> in this case. The particular CPU was a E5-2650 v3.
>
> If we step back and look at the overall sequence of changes and
> performance it looks like this:
>
> The performance regression is between this change:
> c3d8dc45c9df199b8334599a6cbd98c9950dba62 - Triggers default: handling + TSX handling.
> - Causes a 21% lmbench regression for an E5-2650 v3.
>
> and this change (the one we are discussing):
> 27d3ce1467990f89126e228559dec8f84b96c60e - Removes bit_arch_Fast_Copy_Backward.
> - Restores the performance loss.
>
> My worry is that the two are unrelated, and that we've only
> just made back performance at the expense of the other change
> and we could be doing better.
>
> As our Intel expert what do you think is going on here?

My change should be a NOP on Haswell since Fast_Copy_Backward is
used only in x86_64/multiarch/ifunc-memmove.h:

  if (CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
    {
      if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
        return OPTIMIZE (avx_unaligned_erms);

      return OPTIMIZE (avx_unaligned);
    }

  if (!CPU_FEATURES_CPU_P (cpu_features, SSSE3)
      || CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Copy))
    {
      if (CPU_FEATURES_CPU_P (cpu_features, ERMS))
        return OPTIMIZE (sse2_unaligned_erms);

      return OPTIMIZE (sse2_unaligned);
    }

  if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Copy_Backward))
    return OPTIMIZE (ssse3_back);

  return OPTIMIZE (ssse3);

and AVX_Fast_Unaligned_Load is set on Haswell.


> --
> Cheers,
> Carlos.
>
> [1] https://ark.intel.com/content/www/us/en/ark/products/81705/intel-xeon-processor-e5-2650-v3-25m-cache-2-30-ghz.html
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-12-03 17:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26 15:01 Commit 27d3ce1467990f89126e228559dec8f84b96c60e? Carlos O'Donell
2019-12-03 17:43 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).