public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Sunil Pandey <skpgkp2@gmail.com>
To: abush wang <abushwangs@gmail.com>
Cc: "H.J. Lu" <hjl.tools@gmail.com>,
	Noah Goldstein <goldstein.w.n@gmail.com>,
	 abushwang via Libc-alpha <libc-alpha@sourceware.org>
Subject: Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
Date: Sun, 28 Apr 2024 09:12:23 -0700	[thread overview]
Message-ID: <CAMAf5_cDFEg__3a1ZT4zC864WurO0dPnBP2o427nbbfjeNZQDg@mail.gmail.com> (raw)
In-Reply-To: <CAMLoAPYq1LFgAKD7OG-4ne8yXKXL=6QusUNtYjgg_3CTb9Rk7Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3960 bytes --]

On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:

> Actually, I was handling performance issue from  libmicro in our distro OS.
> I found that the performance degradation of localtime_r benchmark from
> libmicro is blame to strlen.
> So I abstracted this test case.
>
>
Can you consistently reproduce strlen perf behaviour by running multiple
times back-to-back?

You can see high swing from run

> On Sat, Apr 27, 2024 at 12:54 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>>
>>
>> On Fri, Apr 26, 2024 at 6:30 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>> On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
>>> >
>>> > Hi, H.J.
>>> > When I test glibc performance between 2.28 and 2.38,
>>> > I found there is a performance degradation about strlen.
>>> > In fact, this difference comes from __strlen_avx2 and __strlen_evex
>>> >
>>> > ```
>>> > 2.28
>>> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
>>> > 42 ENTRY (STRLEN)
>>> >
>>> >
>>> > 2.38
>>> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
>>> > 79 ENTRY_P2ALIGN (STRLEN, 6)
>>> > ```
>>> >
>>> > This is my test:
>>> > ```
>>> > #include <stdio.h>
>>> > #include <stdlib.h>
>>> > #include <stdint.h>
>>> > #include <string.h>
>>> >
>>> > #define MAX_STRINGS 100
>>> >
>>> > uint64_t rdtsc() {
>>> >     uint32_t lo, hi;
>>> >     __asm__ __volatile__ (
>>> >         "rdtsc" : "=a"(lo), "=d"(hi)
>>> >     );
>>> >     return ((uint64_t)hi << 32) | lo;
>>> > }
>>> >
>>> > int main(int argc, char *argv[]) {
>>> >     char *input_str[MAX_STRINGS];
>>> >     size_t lengths[MAX_STRINGS];
>>> >     int num_strings = 0; // Number of input strings
>>> >     uint64_t start_cycles, end_cycles;
>>> >
>>> >     // Parse command line arguments and store pointers in input_str
>>> array
>>> >     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
>>> >         input_str[num_strings] = argv[i];
>>> >         num_strings++;
>>> >     }
>>> >
>>> >     // Measure the strlen operation for each string
>>> >     start_cycles = rdtsc();
>>> >     for (int i = 0; i < num_strings; ++i) {
>>> >         lengths[i] = strlen(input_str[i]);
>>> >     }
>>> >     end_cycles = rdtsc();
>>> >
>>> >     unsigned long long total_cycle = end_cycles - start_cycles;
>>> >     unsigned long long av_cycle = total_cycle / num_strings;
>>> >     // Print the total cycles taken for the strlen operations
>>> >     printf("Total cycles: %llu av cycle: %llu \n", total_cycle,
>>> av_cycle);
>>> >
>>> >     // Print the recorded lengths
>>> >     printf("Lengths of the input strings:\n");
>>> >     for (int i = 0; i < num_strings; ++i) {
>>> >         printf("String %d length: %zu\n", i, lengths[i]);
>>> >     }
>>> >
>>> >     return 0;
>>> > }
>>> > ```
>>> >
>>> > This is result
>>> > ```
>>> > 2.28
>>> > ./strlen_test str1 str2 str3 str4 str5
>>> > Total cycles: 1468 av cycle: 293
>>> > Lengths of the input strings:
>>> > String 0 length: 4
>>> > String 1 length: 4
>>> > String 2 length: 4
>>> > String 3 length: 4
>>> > String 4 length: 4
>>> >
>>> > 2.38
>>> > ./strlen_test str1 str2 str3 str4 str5
>>> > Total cycles: 1814 av cycle: 362
>>> > Lengths of the input strings:
>>> > String 0 length: 4
>>> > String 1 length: 4
>>> > String 2 length: 4
>>> > String 3 length: 4
>>> > String 4 length: 4
>>> > ```
>>> >
>>> > Thanks,
>>> > abush
>>>
>>
>> I'm not sure how you are measuring the performance of strlen function.
>> Are you making performance conclusion based on these 2 runs?
>>
>> 2.28
>> Total cycles: 1468 av cycle: 293
>>
>> 2.38
>> Total cycles: 1814 av cycle: 362
>>
>> Please use glibc microbenchmark to see if you can reproduce perf drop.
>>
>>
>>>
>>> Which processors did you use?  Sunil, Noah, can we reproduce it?
>>>
>>> --
>>> H.J.
>>>
>>

  reply	other threads:[~2024-04-28 16:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26  4:03 abush wang
2024-04-26 13:30 ` H.J. Lu
2024-04-26 16:53   ` Sunil Pandey
2024-04-28  2:13     ` abush wang
2024-04-28 16:12       ` Sunil Pandey [this message]
2024-04-28 16:16         ` H.J. Lu
2024-04-29 17:41           ` Sunil Pandey
2024-04-29 20:19             ` H.J. Lu
2024-04-30  0:54               ` Sunil Pandey
2024-04-30  2:51                 ` H.J. Lu
2024-04-30 20:16                   ` Sunil Pandey
2024-04-28  2:06   ` abush wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMAf5_cDFEg__3a1ZT4zC864WurO0dPnBP2o427nbbfjeNZQDg@mail.gmail.com \
    --to=skpgkp2@gmail.com \
    --cc=abushwangs@gmail.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).