public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* x86-64: strlen-evex performance performance degradation compared to strlen-avx2
@ 2024-04-26  4:03 abush wang
  2024-04-26 13:30 ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: abush wang @ 2024-04-26  4:03 UTC (permalink / raw)
  To: H.J. Lu, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

Hi, H.J.
When I test glibc performance between 2.28 and 2.38,
I found there is a performance degradation about strlen.
In fact, this difference comes from __strlen_avx2 and __strlen_evex

```
2.28
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
42 ENTRY (STRLEN)


2.38
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
79 ENTRY_P2ALIGN (STRLEN, 6)
```

This is my test:
```
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

#define MAX_STRINGS 100

uint64_t rdtsc() {
    uint32_t lo, hi;
    __asm__ __volatile__ (
        "rdtsc" : "=a"(lo), "=d"(hi)
    );
    return ((uint64_t)hi << 32) | lo;
}

int main(int argc, char *argv[]) {
    char *input_str[MAX_STRINGS];
    size_t lengths[MAX_STRINGS];
    int num_strings = 0; // Number of input strings
    uint64_t start_cycles, end_cycles;

    // Parse command line arguments and store pointers in input_str array
    for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
        input_str[num_strings] = argv[i];
        num_strings++;
    }

    // Measure the strlen operation for each string
    start_cycles = rdtsc();
    for (int i = 0; i < num_strings; ++i) {
        lengths[i] = strlen(input_str[i]);
    }
    end_cycles = rdtsc();

    unsigned long long total_cycle = end_cycles - start_cycles;
    unsigned long long av_cycle = total_cycle / num_strings;

    // Print the total cycles taken for the strlen operations
    printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle);

    // Print the recorded lengths
    printf("Lengths of the input strings:\n");
    for (int i = 0; i < num_strings; ++i) {
        printf("String %d length: %zu\n", i, lengths[i]);
    }

    return 0;
}
```

This is result
```
2.28
./strlen_test str1 str2 str3 str4 str5
Total cycles: 1468 av cycle: 293
Lengths of the input strings:
String 0 length: 4
String 1 length: 4
String 2 length: 4
String 3 length: 4
String 4 length: 4

2.38
./strlen_test str1 str2 str3 str4 str5
Total cycles: 1814 av cycle: 362
Lengths of the input strings:
String 0 length: 4
String 1 length: 4
String 2 length: 4
String 3 length: 4
String 4 length: 4
```

Thanks,
abush

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-26  4:03 x86-64: strlen-evex performance performance degradation compared to strlen-avx2 abush wang
@ 2024-04-26 13:30 ` H.J. Lu
  2024-04-26 16:53   ` Sunil Pandey
  2024-04-28  2:06   ` abush wang
  0 siblings, 2 replies; 12+ messages in thread
From: H.J. Lu @ 2024-04-26 13:30 UTC (permalink / raw)
  To: abush wang, Sunil K Pandey, Noah Goldstein; +Cc: abushwang via Libc-alpha

On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
>
> Hi, H.J.
> When I test glibc performance between 2.28 and 2.38,
> I found there is a performance degradation about strlen.
> In fact, this difference comes from __strlen_avx2 and __strlen_evex
>
> ```
> 2.28
> __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
> 42 ENTRY (STRLEN)
>
>
> 2.38
> __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
> 79 ENTRY_P2ALIGN (STRLEN, 6)
> ```
>
> This is my test:
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdint.h>
> #include <string.h>
>
> #define MAX_STRINGS 100
>
> uint64_t rdtsc() {
>     uint32_t lo, hi;
>     __asm__ __volatile__ (
>         "rdtsc" : "=a"(lo), "=d"(hi)
>     );
>     return ((uint64_t)hi << 32) | lo;
> }
>
> int main(int argc, char *argv[]) {
>     char *input_str[MAX_STRINGS];
>     size_t lengths[MAX_STRINGS];
>     int num_strings = 0; // Number of input strings
>     uint64_t start_cycles, end_cycles;
>
>     // Parse command line arguments and store pointers in input_str array
>     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
>         input_str[num_strings] = argv[i];
>         num_strings++;
>     }
>
>     // Measure the strlen operation for each string
>     start_cycles = rdtsc();
>     for (int i = 0; i < num_strings; ++i) {
>         lengths[i] = strlen(input_str[i]);
>     }
>     end_cycles = rdtsc();
>
>     unsigned long long total_cycle = end_cycles - start_cycles;
>     unsigned long long av_cycle = total_cycle / num_strings;
>     // Print the total cycles taken for the strlen operations
>     printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle);
>
>     // Print the recorded lengths
>     printf("Lengths of the input strings:\n");
>     for (int i = 0; i < num_strings; ++i) {
>         printf("String %d length: %zu\n", i, lengths[i]);
>     }
>
>     return 0;
> }
> ```
>
> This is result
> ```
> 2.28
> ./strlen_test str1 str2 str3 str4 str5
> Total cycles: 1468 av cycle: 293
> Lengths of the input strings:
> String 0 length: 4
> String 1 length: 4
> String 2 length: 4
> String 3 length: 4
> String 4 length: 4
>
> 2.38
> ./strlen_test str1 str2 str3 str4 str5
> Total cycles: 1814 av cycle: 362
> Lengths of the input strings:
> String 0 length: 4
> String 1 length: 4
> String 2 length: 4
> String 3 length: 4
> String 4 length: 4
> ```
>
> Thanks,
> abush

Which processors did you use?  Sunil, Noah, can we reproduce it?

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-26 13:30 ` H.J. Lu
@ 2024-04-26 16:53   ` Sunil Pandey
  2024-04-28  2:13     ` abush wang
  2024-04-28  2:06   ` abush wang
  1 sibling, 1 reply; 12+ messages in thread
From: Sunil Pandey @ 2024-04-26 16:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3184 bytes --]

On Fri, Apr 26, 2024 at 6:30 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
> >
> > Hi, H.J.
> > When I test glibc performance between 2.28 and 2.38,
> > I found there is a performance degradation about strlen.
> > In fact, this difference comes from __strlen_avx2 and __strlen_evex
> >
> > ```
> > 2.28
> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
> > 42 ENTRY (STRLEN)
> >
> >
> > 2.38
> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
> > 79 ENTRY_P2ALIGN (STRLEN, 6)
> > ```
> >
> > This is my test:
> > ```
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <stdint.h>
> > #include <string.h>
> >
> > #define MAX_STRINGS 100
> >
> > uint64_t rdtsc() {
> >     uint32_t lo, hi;
> >     __asm__ __volatile__ (
> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >     );
> >     return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main(int argc, char *argv[]) {
> >     char *input_str[MAX_STRINGS];
> >     size_t lengths[MAX_STRINGS];
> >     int num_strings = 0; // Number of input strings
> >     uint64_t start_cycles, end_cycles;
> >
> >     // Parse command line arguments and store pointers in input_str array
> >     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
> >         input_str[num_strings] = argv[i];
> >         num_strings++;
> >     }
> >
> >     // Measure the strlen operation for each string
> >     start_cycles = rdtsc();
> >     for (int i = 0; i < num_strings; ++i) {
> >         lengths[i] = strlen(input_str[i]);
> >     }
> >     end_cycles = rdtsc();
> >
> >     unsigned long long total_cycle = end_cycles - start_cycles;
> >     unsigned long long av_cycle = total_cycle / num_strings;
> >     // Print the total cycles taken for the strlen operations
> >     printf("Total cycles: %llu av cycle: %llu \n", total_cycle,
> av_cycle);
> >
> >     // Print the recorded lengths
> >     printf("Lengths of the input strings:\n");
> >     for (int i = 0; i < num_strings; ++i) {
> >         printf("String %d length: %zu\n", i, lengths[i]);
> >     }
> >
> >     return 0;
> > }
> > ```
> >
> > This is result
> > ```
> > 2.28
> > ./strlen_test str1 str2 str3 str4 str5
> > Total cycles: 1468 av cycle: 293
> > Lengths of the input strings:
> > String 0 length: 4
> > String 1 length: 4
> > String 2 length: 4
> > String 3 length: 4
> > String 4 length: 4
> >
> > 2.38
> > ./strlen_test str1 str2 str3 str4 str5
> > Total cycles: 1814 av cycle: 362
> > Lengths of the input strings:
> > String 0 length: 4
> > String 1 length: 4
> > String 2 length: 4
> > String 3 length: 4
> > String 4 length: 4
> > ```
> >
> > Thanks,
> > abush
>

I'm not sure how you are measuring the performance of strlen function.
Are you making performance conclusion based on these 2 runs?

2.28
Total cycles: 1468 av cycle: 293

2.38
Total cycles: 1814 av cycle: 362

Please use glibc microbenchmark to see if you can reproduce perf drop.


>
> Which processors did you use?  Sunil, Noah, can we reproduce it?
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-26 13:30 ` H.J. Lu
  2024-04-26 16:53   ` Sunil Pandey
@ 2024-04-28  2:06   ` abush wang
  1 sibling, 0 replies; 12+ messages in thread
From: abush wang @ 2024-04-28  2:06 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Sunil K Pandey, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3184 bytes --]

This is my env:
lscpu
...
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz
    BIOS Model name:     Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz  CPU @
2.5GHz
...
I think you can run my demo in these environments to reproduce it


On Fri, Apr 26, 2024 at 9:30 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
> >
> > Hi, H.J.
> > When I test glibc performance between 2.28 and 2.38,
> > I found there is a performance degradation about strlen.
> > In fact, this difference comes from __strlen_avx2 and __strlen_evex
> >
> > ```
> > 2.28
> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
> > 42 ENTRY (STRLEN)
> >
> >
> > 2.38
> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
> > 79 ENTRY_P2ALIGN (STRLEN, 6)
> > ```
> >
> > This is my test:
> > ```
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <stdint.h>
> > #include <string.h>
> >
> > #define MAX_STRINGS 100
> >
> > uint64_t rdtsc() {
> >     uint32_t lo, hi;
> >     __asm__ __volatile__ (
> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >     );
> >     return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main(int argc, char *argv[]) {
> >     char *input_str[MAX_STRINGS];
> >     size_t lengths[MAX_STRINGS];
> >     int num_strings = 0; // Number of input strings
> >     uint64_t start_cycles, end_cycles;
> >
> >     // Parse command line arguments and store pointers in input_str array
> >     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
> >         input_str[num_strings] = argv[i];
> >         num_strings++;
> >     }
> >
> >     // Measure the strlen operation for each string
> >     start_cycles = rdtsc();
> >     for (int i = 0; i < num_strings; ++i) {
> >         lengths[i] = strlen(input_str[i]);
> >     }
> >     end_cycles = rdtsc();
> >
> >     unsigned long long total_cycle = end_cycles - start_cycles;
> >     unsigned long long av_cycle = total_cycle / num_strings;
> >     // Print the total cycles taken for the strlen operations
> >     printf("Total cycles: %llu av cycle: %llu \n", total_cycle,
> av_cycle);
> >
> >     // Print the recorded lengths
> >     printf("Lengths of the input strings:\n");
> >     for (int i = 0; i < num_strings; ++i) {
> >         printf("String %d length: %zu\n", i, lengths[i]);
> >     }
> >
> >     return 0;
> > }
> > ```
> >
> > This is result
> > ```
> > 2.28
> > ./strlen_test str1 str2 str3 str4 str5
> > Total cycles: 1468 av cycle: 293
> > Lengths of the input strings:
> > String 0 length: 4
> > String 1 length: 4
> > String 2 length: 4
> > String 3 length: 4
> > String 4 length: 4
> >
> > 2.38
> > ./strlen_test str1 str2 str3 str4 str5
> > Total cycles: 1814 av cycle: 362
> > Lengths of the input strings:
> > String 0 length: 4
> > String 1 length: 4
> > String 2 length: 4
> > String 3 length: 4
> > String 4 length: 4
> > ```
> >
> > Thanks,
> > abush
>
> Which processors did you use?  Sunil, Noah, can we reproduce it?
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-26 16:53   ` Sunil Pandey
@ 2024-04-28  2:13     ` abush wang
  2024-04-28 16:12       ` Sunil Pandey
  0 siblings, 1 reply; 12+ messages in thread
From: abush wang @ 2024-04-28  2:13 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: H.J. Lu, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3614 bytes --]

Actually, I was handling performance issue from  libmicro in our distro OS.
I found that the performance degradation of localtime_r benchmark from
libmicro is blame to strlen.
So I abstracted this test case.

On Sat, Apr 27, 2024 at 12:54 AM Sunil Pandey <skpgkp2@gmail.com> wrote:

>
>
> On Fri, Apr 26, 2024 at 6:30 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
>> >
>> > Hi, H.J.
>> > When I test glibc performance between 2.28 and 2.38,
>> > I found there is a performance degradation about strlen.
>> > In fact, this difference comes from __strlen_avx2 and __strlen_evex
>> >
>> > ```
>> > 2.28
>> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
>> > 42 ENTRY (STRLEN)
>> >
>> >
>> > 2.38
>> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
>> > 79 ENTRY_P2ALIGN (STRLEN, 6)
>> > ```
>> >
>> > This is my test:
>> > ```
>> > #include <stdio.h>
>> > #include <stdlib.h>
>> > #include <stdint.h>
>> > #include <string.h>
>> >
>> > #define MAX_STRINGS 100
>> >
>> > uint64_t rdtsc() {
>> >     uint32_t lo, hi;
>> >     __asm__ __volatile__ (
>> >         "rdtsc" : "=a"(lo), "=d"(hi)
>> >     );
>> >     return ((uint64_t)hi << 32) | lo;
>> > }
>> >
>> > int main(int argc, char *argv[]) {
>> >     char *input_str[MAX_STRINGS];
>> >     size_t lengths[MAX_STRINGS];
>> >     int num_strings = 0; // Number of input strings
>> >     uint64_t start_cycles, end_cycles;
>> >
>> >     // Parse command line arguments and store pointers in input_str
>> array
>> >     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
>> >         input_str[num_strings] = argv[i];
>> >         num_strings++;
>> >     }
>> >
>> >     // Measure the strlen operation for each string
>> >     start_cycles = rdtsc();
>> >     for (int i = 0; i < num_strings; ++i) {
>> >         lengths[i] = strlen(input_str[i]);
>> >     }
>> >     end_cycles = rdtsc();
>> >
>> >     unsigned long long total_cycle = end_cycles - start_cycles;
>> >     unsigned long long av_cycle = total_cycle / num_strings;
>> >     // Print the total cycles taken for the strlen operations
>> >     printf("Total cycles: %llu av cycle: %llu \n", total_cycle,
>> av_cycle);
>> >
>> >     // Print the recorded lengths
>> >     printf("Lengths of the input strings:\n");
>> >     for (int i = 0; i < num_strings; ++i) {
>> >         printf("String %d length: %zu\n", i, lengths[i]);
>> >     }
>> >
>> >     return 0;
>> > }
>> > ```
>> >
>> > This is result
>> > ```
>> > 2.28
>> > ./strlen_test str1 str2 str3 str4 str5
>> > Total cycles: 1468 av cycle: 293
>> > Lengths of the input strings:
>> > String 0 length: 4
>> > String 1 length: 4
>> > String 2 length: 4
>> > String 3 length: 4
>> > String 4 length: 4
>> >
>> > 2.38
>> > ./strlen_test str1 str2 str3 str4 str5
>> > Total cycles: 1814 av cycle: 362
>> > Lengths of the input strings:
>> > String 0 length: 4
>> > String 1 length: 4
>> > String 2 length: 4
>> > String 3 length: 4
>> > String 4 length: 4
>> > ```
>> >
>> > Thanks,
>> > abush
>>
>
> I'm not sure how you are measuring the performance of strlen function.
> Are you making performance conclusion based on these 2 runs?
>
> 2.28
> Total cycles: 1468 av cycle: 293
>
> 2.38
> Total cycles: 1814 av cycle: 362
>
> Please use glibc microbenchmark to see if you can reproduce perf drop.
>
>
>>
>> Which processors did you use?  Sunil, Noah, can we reproduce it?
>>
>> --
>> H.J.
>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-28  2:13     ` abush wang
@ 2024-04-28 16:12       ` Sunil Pandey
  2024-04-28 16:16         ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Sunil Pandey @ 2024-04-28 16:12 UTC (permalink / raw)
  To: abush wang; +Cc: H.J. Lu, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3960 bytes --]

On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:

> Actually, I was handling performance issue from  libmicro in our distro OS.
> I found that the performance degradation of localtime_r benchmark from
> libmicro is blame to strlen.
> So I abstracted this test case.
>
>
Can you consistently reproduce strlen perf behaviour by running multiple
times back-to-back?

You can see high swing from run

> On Sat, Apr 27, 2024 at 12:54 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>>
>>
>> On Fri, Apr 26, 2024 at 6:30 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>> On Thu, Apr 25, 2024 at 9:03 PM abush wang <abushwangs@gmail.com> wrote:
>>> >
>>> > Hi, H.J.
>>> > When I test glibc performance between 2.28 and 2.38,
>>> > I found there is a performance degradation about strlen.
>>> > In fact, this difference comes from __strlen_avx2 and __strlen_evex
>>> >
>>> > ```
>>> > 2.28
>>> > __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
>>> > 42 ENTRY (STRLEN)
>>> >
>>> >
>>> > 2.38
>>> > __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
>>> > 79 ENTRY_P2ALIGN (STRLEN, 6)
>>> > ```
>>> >
>>> > This is my test:
>>> > ```
>>> > #include <stdio.h>
>>> > #include <stdlib.h>
>>> > #include <stdint.h>
>>> > #include <string.h>
>>> >
>>> > #define MAX_STRINGS 100
>>> >
>>> > uint64_t rdtsc() {
>>> >     uint32_t lo, hi;
>>> >     __asm__ __volatile__ (
>>> >         "rdtsc" : "=a"(lo), "=d"(hi)
>>> >     );
>>> >     return ((uint64_t)hi << 32) | lo;
>>> > }
>>> >
>>> > int main(int argc, char *argv[]) {
>>> >     char *input_str[MAX_STRINGS];
>>> >     size_t lengths[MAX_STRINGS];
>>> >     int num_strings = 0; // Number of input strings
>>> >     uint64_t start_cycles, end_cycles;
>>> >
>>> >     // Parse command line arguments and store pointers in input_str
>>> array
>>> >     for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
>>> >         input_str[num_strings] = argv[i];
>>> >         num_strings++;
>>> >     }
>>> >
>>> >     // Measure the strlen operation for each string
>>> >     start_cycles = rdtsc();
>>> >     for (int i = 0; i < num_strings; ++i) {
>>> >         lengths[i] = strlen(input_str[i]);
>>> >     }
>>> >     end_cycles = rdtsc();
>>> >
>>> >     unsigned long long total_cycle = end_cycles - start_cycles;
>>> >     unsigned long long av_cycle = total_cycle / num_strings;
>>> >     // Print the total cycles taken for the strlen operations
>>> >     printf("Total cycles: %llu av cycle: %llu \n", total_cycle,
>>> av_cycle);
>>> >
>>> >     // Print the recorded lengths
>>> >     printf("Lengths of the input strings:\n");
>>> >     for (int i = 0; i < num_strings; ++i) {
>>> >         printf("String %d length: %zu\n", i, lengths[i]);
>>> >     }
>>> >
>>> >     return 0;
>>> > }
>>> > ```
>>> >
>>> > This is result
>>> > ```
>>> > 2.28
>>> > ./strlen_test str1 str2 str3 str4 str5
>>> > Total cycles: 1468 av cycle: 293
>>> > Lengths of the input strings:
>>> > String 0 length: 4
>>> > String 1 length: 4
>>> > String 2 length: 4
>>> > String 3 length: 4
>>> > String 4 length: 4
>>> >
>>> > 2.38
>>> > ./strlen_test str1 str2 str3 str4 str5
>>> > Total cycles: 1814 av cycle: 362
>>> > Lengths of the input strings:
>>> > String 0 length: 4
>>> > String 1 length: 4
>>> > String 2 length: 4
>>> > String 3 length: 4
>>> > String 4 length: 4
>>> > ```
>>> >
>>> > Thanks,
>>> > abush
>>>
>>
>> I'm not sure how you are measuring the performance of strlen function.
>> Are you making performance conclusion based on these 2 runs?
>>
>> 2.28
>> Total cycles: 1468 av cycle: 293
>>
>> 2.38
>> Total cycles: 1814 av cycle: 362
>>
>> Please use glibc microbenchmark to see if you can reproduce perf drop.
>>
>>
>>>
>>> Which processors did you use?  Sunil, Noah, can we reproduce it?
>>>
>>> --
>>> H.J.
>>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-28 16:12       ` Sunil Pandey
@ 2024-04-28 16:16         ` H.J. Lu
  2024-04-29 17:41           ` Sunil Pandey
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2024-04-28 16:16 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>
>
> On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:
>>
>> Actually, I was handling performance issue from  libmicro in our distro OS.
>> I found that the performance degradation of localtime_r benchmark from libmicro is blame to strlen.
>> So I abstracted this test case.
>>
>
> Can you consistently reproduce strlen perf behaviour by running multiple times back-to-back?
>
> You can see high swing from run

Hi Sunil,

Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this test to
benchtests/bench-strlen.c and check its performance on SKX.

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-28 16:16         ` H.J. Lu
@ 2024-04-29 17:41           ` Sunil Pandey
  2024-04-29 20:19             ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Sunil Pandey @ 2024-04-29 17:41 UTC (permalink / raw)
  To: H.J. Lu; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Sun, Apr 28, 2024 at 9:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
> >
> >
> >
> > On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:
> >>
> >> Actually, I was handling performance issue from  libmicro in our distro
> OS.
> >> I found that the performance degradation of localtime_r benchmark from
> libmicro is blame to strlen.
> >> So I abstracted this test case.
> >>
> >
> > Can you consistently reproduce strlen perf behaviour by running multiple
> times back-to-back?
> >
> > You can see high swing from run
>
> Hi Sunil,
>
> Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this test to
> benchtests/bench-strlen.c and check its performance on SKX.
>
> --
> H.J.
>

I collected the glibc micro-benchmark data for the string length in
question.

2.38 evex data:

length=4, alignment=4:         4.40
length=4, alignment=0:         4.29
length=4, alignment=0:         3.64
length=4, alignment=7:         3.64
length=4, alignment=2:         3.64

2.28 evex data:

Length    4, alignment  4: 6.46875
Length    4, alignment  0: 6.5
Length    4, alignment  0: 6.53125
Length    4, alignment  7: 6.46875
Length    4, alignment  2: 6.53125

Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

2.38 perf numbers are better than 2.28 as expected.

--Sunil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-29 17:41           ` Sunil Pandey
@ 2024-04-29 20:19             ` H.J. Lu
  2024-04-30  0:54               ` Sunil Pandey
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2024-04-29 20:19 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

On Mon, Apr 29, 2024 at 10:42 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>
>
> On Sun, Apr 28, 2024 at 9:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>> >
>> >
>> >
>> > On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:
>> >>
>> >> Actually, I was handling performance issue from  libmicro in our distro OS.
>> >> I found that the performance degradation of localtime_r benchmark from libmicro is blame to strlen.
>> >> So I abstracted this test case.
>> >>
>> >
>> > Can you consistently reproduce strlen perf behaviour by running multiple times back-to-back?
>> >
>> > You can see high swing from run
>>
>> Hi Sunil,
>>
>> Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this test to
>> benchtests/bench-strlen.c and check its performance on SKX.
>>
>> --
>> H.J.
>
>
> I collected the glibc micro-benchmark data for the string length in question.
>
> 2.38 evex data:
>
> length=4, alignment=4:         4.40
> length=4, alignment=0:         4.29
> length=4, alignment=0:         3.64
> length=4, alignment=7:         3.64
> length=4, alignment=2:         3.64
>
> 2.28 evex data:
>
> Length    4, alignment  4:  6.46875
> Length    4, alignment  0:  6.5
> Length    4, alignment  0:  6.53125
> Length    4, alignment  7:  6.46875
> Length    4, alignment  2:  6.53125
>
> Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
>
> 2.38 perf numbers are better than 2.28 as expected.

1. Please compare AVX2 vs EVEX strlen on glibc master branch.
2. Please check strlen on strings of length == 4 and alignments = 0, 1, 2, 3.

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-29 20:19             ` H.J. Lu
@ 2024-04-30  0:54               ` Sunil Pandey
  2024-04-30  2:51                 ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: Sunil Pandey @ 2024-04-30  0:54 UTC (permalink / raw)
  To: H.J. Lu; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3752 bytes --]

On Mon, Apr 29, 2024 at 1:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Apr 29, 2024 at 10:42 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
> >
> >
> >
> > On Sun, Apr 28, 2024 at 9:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com>
> wrote:
> >> >>
> >> >> Actually, I was handling performance issue from  libmicro in our
> distro OS.
> >> >> I found that the performance degradation of localtime_r benchmark
> from libmicro is blame to strlen.
> >> >> So I abstracted this test case.
> >> >>
> >> >
> >> > Can you consistently reproduce strlen perf behaviour by running
> multiple times back-to-back?
> >> >
> >> > You can see high swing from run
> >>
> >> Hi Sunil,
> >>
> >> Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this test
> to
> >> benchtests/bench-strlen.c and check its performance on SKX.
> >>
> >> --
> >> H.J.
> >
> >
> > I collected the glibc micro-benchmark data for the string length in
> question.
> >
> > 2.38 evex data:
> >
> > length=4, alignment=4:         4.40
> > length=4, alignment=0:         4.29
> > length=4, alignment=0:         3.64
> > length=4, alignment=7:         3.64
> > length=4, alignment=2:         3.64
> >
> > 2.28 evex data:
> >
> > Length    4, alignment  4: 6.46875
> > Length    4, alignment  0: 6.5
> > Length    4, alignment  0: 6.53125
> > Length    4, alignment  7: 6.46875
> > Length    4, alignment  2: 6.53125
> >
> > Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
> >
> > 2.38 perf numbers are better than 2.28 as expected.
>
> 1. Please compare AVX2 vs EVEX strlen on glibc master branch.
> 2. Please check strlen on strings of length == 4 and alignments = 0, 1, 2,
> 3.
>
> --
> H.J.
>

Data from master branch:

Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

                      __strlen_evex __strlen_avx2
=======================================================
 length=4, alignment=0:         5.00        5.11
 length=4, alignment=1:         4.92        4.80
 length=4, alignment=2:         4.82        4.62
 length=4, alignment=3:         4.62        4.92
 length=4, alignment=4:         4.44        4.44
 length=4, alignment=5:         4.59        4.29
 length=4, alignment=6:         4.39        4.29
 length=4, alignment=7:         4.14        4.14
 length=4, alignment=8:         4.19        4.00
 length=4, alignment=9:         4.00        4.00
length=4, alignment=10:         4.31        3.87
length=4, alignment=11:         3.96        3.87
length=4, alignment=12:         3.86        3.75
length=4, alignment=13:         3.75        3.75
length=4, alignment=14:         3.64        3.64
length=4, alignment=15:         3.64        3.72
length=4, alignment=16:         3.64        3.53
length=4, alignment=17:         3.63        3.53
length=4, alignment=18:         4.12        3.53
length=4, alignment=19:         3.43        3.43
length=4, alignment=20:         3.43        3.43
length=4, alignment=21:         3.33        3.33
length=4, alignment=22:         3.33        3.42
length=4, alignment=23:         3.33        3.33
length=4, alignment=24:         3.33        3.33
length=4, alignment=25:         3.33        3.33
length=4, alignment=26:         3.96        3.33
length=4, alignment=27:         3.33        3.41
length=4, alignment=28:         3.33        3.33
length=4, alignment=29:         3.41        3.33
length=4, alignment=30:         3.33        3.41
length=4, alignment=31:         3.33        3.33

--Sunil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-30  0:54               ` Sunil Pandey
@ 2024-04-30  2:51                 ` H.J. Lu
  2024-04-30 20:16                   ` Sunil Pandey
  0 siblings, 1 reply; 12+ messages in thread
From: H.J. Lu @ 2024-04-30  2:51 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

On Mon, Apr 29, 2024 at 5:55 PM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
>
>
> On Mon, Apr 29, 2024 at 1:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Mon, Apr 29, 2024 at 10:42 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>> >
>> >
>> >
>> > On Sun, Apr 28, 2024 at 9:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >>
>> >> On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com> wrote:
>> >> >>
>> >> >> Actually, I was handling performance issue from  libmicro in our distro OS.
>> >> >> I found that the performance degradation of localtime_r benchmark from libmicro is blame to strlen.
>> >> >> So I abstracted this test case.
>> >> >>
>> >> >
>> >> > Can you consistently reproduce strlen perf behaviour by running multiple times back-to-back?
>> >> >
>> >> > You can see high swing from run
>> >>
>> >> Hi Sunil,
>> >>
>> >> Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this test to
>> >> benchtests/bench-strlen.c and check its performance on SKX.
>> >>
>> >> --
>> >> H.J.
>> >
>> >
>> > I collected the glibc micro-benchmark data for the string length in question.
>> >
>> > 2.38 evex data:
>> >
>> > length=4, alignment=4:         4.40
>> > length=4, alignment=0:         4.29
>> > length=4, alignment=0:         3.64
>> > length=4, alignment=7:         3.64
>> > length=4, alignment=2:         3.64
>> >
>> > 2.28 evex data:
>> >
>> > Length    4, alignment  4:  6.46875
>> > Length    4, alignment  0:  6.5
>> > Length    4, alignment  0:  6.53125
>> > Length    4, alignment  7:  6.46875
>> > Length    4, alignment  2:  6.53125
>> >
>> > Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
>> >
>> > 2.38 perf numbers are better than 2.28 as expected.
>>
>> 1. Please compare AVX2 vs EVEX strlen on glibc master branch.
>> 2. Please check strlen on strings of length == 4 and alignments = 0, 1, 2, 3.
>>
>> --
>> H.J.
>
>
> Data from master branch:
>
> Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
>
>                       __strlen_evex __strlen_avx2
> =======================================================
>  length=4, alignment=0:         5.00        5.11
>  length=4, alignment=1:         4.92        4.80
>  length=4, alignment=2:         4.82        4.62
>  length=4, alignment=3:         4.62        4.92
>  length=4, alignment=4:         4.44        4.44
>  length=4, alignment=5:         4.59        4.29
>  length=4, alignment=6:         4.39        4.29
>  length=4, alignment=7:         4.14        4.14
>  length=4, alignment=8:         4.19        4.00
>  length=4, alignment=9:         4.00        4.00
> length=4, alignment=10:         4.31        3.87
> length=4, alignment=11:         3.96        3.87
> length=4, alignment=12:         3.86        3.75
> length=4, alignment=13:         3.75        3.75
> length=4, alignment=14:         3.64        3.64
> length=4, alignment=15:         3.64        3.72
> length=4, alignment=16:         3.64        3.53
> length=4, alignment=17:         3.63        3.53
> length=4, alignment=18:         4.12        3.53
> length=4, alignment=19:         3.43        3.43
> length=4, alignment=20:         3.43        3.43
> length=4, alignment=21:         3.33        3.33
> length=4, alignment=22:         3.33        3.42
> length=4, alignment=23:         3.33        3.33
> length=4, alignment=24:         3.33        3.33
> length=4, alignment=25:         3.33        3.33
> length=4, alignment=26:         3.96        3.33
> length=4, alignment=27:         3.33        3.41
> length=4, alignment=28:         3.33        3.33
> length=4, alignment=29:         3.41        3.33
> length=4, alignment=30:         3.33        3.41
> length=4, alignment=31:         3.33        3.33
>
> --Sunil

Hi Sunil,

strlen-avx2.S in glibc 2.28 release (tag glibc-2.28) is
different from strlen-avx2.S on master branch.   Please
compare their performances.

-- 
H.J.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: x86-64: strlen-evex performance performance degradation compared to strlen-avx2
  2024-04-30  2:51                 ` H.J. Lu
@ 2024-04-30 20:16                   ` Sunil Pandey
  0 siblings, 0 replies; 12+ messages in thread
From: Sunil Pandey @ 2024-04-30 20:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: abush wang, Noah Goldstein, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 6778 bytes --]

On Mon, Apr 29, 2024 at 7:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Apr 29, 2024 at 5:55 PM Sunil Pandey <skpgkp2@gmail.com> wrote:
> >
> >
> >
> > On Mon, Apr 29, 2024 at 1:20 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Mon, Apr 29, 2024 at 10:42 AM Sunil Pandey <skpgkp2@gmail.com>
> wrote:
> >> >
> >> >
> >> >
> >> > On Sun, Apr 28, 2024 at 9:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >>
> >> >> On Sun, Apr 28, 2024 at 9:13 AM Sunil Pandey <skpgkp2@gmail.com>
> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sat, Apr 27, 2024 at 7:13 PM abush wang <abushwangs@gmail.com>
> wrote:
> >> >> >>
> >> >> >> Actually, I was handling performance issue from  libmicro in our
> distro OS.
> >> >> >> I found that the performance degradation of localtime_r benchmark
> from libmicro is blame to strlen.
> >> >> >> So I abstracted this test case.
> >> >> >>
> >> >> >
> >> >> > Can you consistently reproduce strlen perf behaviour by running
> multiple times back-to-back?
> >> >> >
> >> >> > You can see high swing from run
> >> >>
> >> >> Hi Sunil,
> >> >>
> >> >> Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz is SKX.  Please add this
> test to
> >> >> benchtests/bench-strlen.c and check its performance on SKX.
> >> >>
> >> >> --
> >> >> H.J.
> >> >
> >> >
> >> > I collected the glibc micro-benchmark data for the string length in
> question.
> >> >
> >> > 2.38 evex data:
> >> >
> >> > length=4, alignment=4:         4.40
> >> > length=4, alignment=0:         4.29
> >> > length=4, alignment=0:         3.64
> >> > length=4, alignment=7:         3.64
> >> > length=4, alignment=2:         3.64
> >> >
> >> > 2.28 evex data:
> >> >
> >> > Length    4, alignment  4: 6.46875
> >> > Length    4, alignment  0: 6.5
> >> > Length    4, alignment  0: 6.53125
> >> > Length    4, alignment  7: 6.46875
> >> > Length    4, alignment  2: 6.53125
> >> >
> >> > Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
> >> >
> >> > 2.38 perf numbers are better than 2.28 as expected.
> >>
> >> 1. Please compare AVX2 vs EVEX strlen on glibc master branch.
> >> 2. Please check strlen on strings of length == 4 and alignments = 0, 1,
> 2, 3.
> >>
> >> --
> >> H.J.
> >
> >
> > Data from master branch:
> >
> > Data collected on Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
> >
> >                       __strlen_evex __strlen_avx2
> > =======================================================
> >  length=4, alignment=0:         5.00        5.11
> >  length=4, alignment=1:         4.92        4.80
> >  length=4, alignment=2:         4.82        4.62
> >  length=4, alignment=3:         4.62        4.92
> >  length=4, alignment=4:         4.44        4.44
> >  length=4, alignment=5:         4.59        4.29
> >  length=4, alignment=6:         4.39        4.29
> >  length=4, alignment=7:         4.14        4.14
> >  length=4, alignment=8:         4.19        4.00
> >  length=4, alignment=9:         4.00        4.00
> > length=4, alignment=10:         4.31        3.87
> > length=4, alignment=11:         3.96        3.87
> > length=4, alignment=12:         3.86        3.75
> > length=4, alignment=13:         3.75        3.75
> > length=4, alignment=14:         3.64        3.64
> > length=4, alignment=15:         3.64        3.72
> > length=4, alignment=16:         3.64        3.53
> > length=4, alignment=17:         3.63        3.53
> > length=4, alignment=18:         4.12        3.53
> > length=4, alignment=19:         3.43        3.43
> > length=4, alignment=20:         3.43        3.43
> > length=4, alignment=21:         3.33        3.33
> > length=4, alignment=22:         3.33        3.42
> > length=4, alignment=23:         3.33        3.33
> > length=4, alignment=24:         3.33        3.33
> > length=4, alignment=25:         3.33        3.33
> > length=4, alignment=26:         3.96        3.33
> > length=4, alignment=27:         3.33        3.41
> > length=4, alignment=28:         3.33        3.33
> > length=4, alignment=29:         3.41        3.33
> > length=4, alignment=30:         3.33        3.41
> > length=4, alignment=31:         3.33        3.33
> >
> > --Sunil
>
> Hi Sunil,
>
> strlen-avx2.S in glibc 2.28 release (tag glibc-2.28) is
> different from strlen-avx2.S on master branch.   Please
> compare their performances.
>
> --
> H.J.
>

I tested strlen implementations with different alignment combinations.

                   _strlen_evex(master) __strlen_avx2(master)
__strlen_avx2(2.28)
==========================================================
 length=4, alignment=0:         5.00        5.09        8.00
 length=4, alignment=1:         4.80        4.80        7.78
 length=4, alignment=2:         4.71        4.62        7.46
 length=4, alignment=3:         4.44        4.55        7.11
 length=4, alignment=4:         4.44        4.45        7.23
 length=4, alignment=5:         4.29        4.29        6.86
 length=4, alignment=6:         4.14        4.14        6.76
 length=4, alignment=7:         4.00        4.00        6.40
 length=4, alignment=8:         4.00        4.00        6.50
 length=4, alignment=9:         3.87        3.87        6.29
length=4, alignment=10:         3.75        3.85        6.00
length=4, alignment=11:         3.75        3.75        6.00
length=4, alignment=12:         3.76        3.64        5.82
length=4, alignment=13:         3.64        3.64        6.08
length=4, alignment=14:         3.53        3.53        5.74
length=4, alignment=15:         3.53        3.53        5.74
length=4, alignment=16:         3.43        3.43        5.57
length=4, alignment=17:         3.43        3.43        5.67
length=4, alignment=18:         3.33        3.33        5.41
length=4, alignment=19:         3.33        3.33        5.44
length=4, alignment=20:         3.33        3.33        5.41
length=4, alignment=21:         3.33        3.33        5.43
length=4, alignment=22:         3.33        3.33        5.41
length=4, alignment=23:         3.33        3.33        5.41
length=4, alignment=24:         3.33        3.33        5.33
length=4, alignment=25:         3.41        3.33        5.33
length=4, alignment=26:         3.86        3.33        5.33
length=4, alignment=27:         3.42        3.33        5.33
length=4, alignment=28:         3.33        3.33        5.33
length=4, alignment=29:         3.33        3.33        5.33
length=4, alignment=30:         3.33        3.33        5.33
length=4, alignment=31:         3.33        3.33        5.33

Based on the data

- avx2/evex version in master is faster than avx2 version in glibc-2.28 as
expected.

--Sunil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-30 20:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26  4:03 x86-64: strlen-evex performance performance degradation compared to strlen-avx2 abush wang
2024-04-26 13:30 ` H.J. Lu
2024-04-26 16:53   ` Sunil Pandey
2024-04-28  2:13     ` abush wang
2024-04-28 16:12       ` Sunil Pandey
2024-04-28 16:16         ` H.J. Lu
2024-04-29 17:41           ` Sunil Pandey
2024-04-29 20:19             ` H.J. Lu
2024-04-30  0:54               ` Sunil Pandey
2024-04-30  2:51                 ` H.J. Lu
2024-04-30 20:16                   ` Sunil Pandey
2024-04-28  2:06   ` abush wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).