public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* x86-64: strlen-evex performance performance degradation compared to strlen-avx2
@ 2024-04-26  4:03 abush wang
  2024-04-26 13:30 ` H.J. Lu
  0 siblings, 1 reply; 12+ messages in thread
From: abush wang @ 2024-04-26  4:03 UTC (permalink / raw)
  To: H.J. Lu, abushwang via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

Hi, H.J.
When I test glibc performance between 2.28 and 2.38,
I found there is a performance degradation about strlen.
In fact, this difference comes from __strlen_avx2 and __strlen_evex

```
2.28
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:42
42 ENTRY (STRLEN)


2.38
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:79
79 ENTRY_P2ALIGN (STRLEN, 6)
```

This is my test:
```
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

#define MAX_STRINGS 100

uint64_t rdtsc() {
    uint32_t lo, hi;
    __asm__ __volatile__ (
        "rdtsc" : "=a"(lo), "=d"(hi)
    );
    return ((uint64_t)hi << 32) | lo;
}

int main(int argc, char *argv[]) {
    char *input_str[MAX_STRINGS];
    size_t lengths[MAX_STRINGS];
    int num_strings = 0; // Number of input strings
    uint64_t start_cycles, end_cycles;

    // Parse command line arguments and store pointers in input_str array
    for (int i = 1; i < argc && num_strings < MAX_STRINGS; ++i) {
        input_str[num_strings] = argv[i];
        num_strings++;
    }

    // Measure the strlen operation for each string
    start_cycles = rdtsc();
    for (int i = 0; i < num_strings; ++i) {
        lengths[i] = strlen(input_str[i]);
    }
    end_cycles = rdtsc();

    unsigned long long total_cycle = end_cycles - start_cycles;
    unsigned long long av_cycle = total_cycle / num_strings;

    // Print the total cycles taken for the strlen operations
    printf("Total cycles: %llu av cycle: %llu \n", total_cycle, av_cycle);

    // Print the recorded lengths
    printf("Lengths of the input strings:\n");
    for (int i = 0; i < num_strings; ++i) {
        printf("String %d length: %zu\n", i, lengths[i]);
    }

    return 0;
}
```

This is result
```
2.28
./strlen_test str1 str2 str3 str4 str5
Total cycles: 1468 av cycle: 293
Lengths of the input strings:
String 0 length: 4
String 1 length: 4
String 2 length: 4
String 3 length: 4
String 4 length: 4

2.38
./strlen_test str1 str2 str3 str4 str5
Total cycles: 1814 av cycle: 362
Lengths of the input strings:
String 0 length: 4
String 1 length: 4
String 2 length: 4
String 3 length: 4
String 4 length: 4
```

Thanks,
abush

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-30 20:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26  4:03 x86-64: strlen-evex performance performance degradation compared to strlen-avx2 abush wang
2024-04-26 13:30 ` H.J. Lu
2024-04-26 16:53   ` Sunil Pandey
2024-04-28  2:13     ` abush wang
2024-04-28 16:12       ` Sunil Pandey
2024-04-28 16:16         ` H.J. Lu
2024-04-29 17:41           ` Sunil Pandey
2024-04-29 20:19             ` H.J. Lu
2024-04-30  0:54               ` Sunil Pandey
2024-04-30  2:51                 ` H.J. Lu
2024-04-30 20:16                   ` Sunil Pandey
2024-04-28  2:06   ` abush wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).