public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
@ 2024-04-01 11:47 abush wang
  2024-04-01 13:12 ` Florian Weimer
  2024-04-02 14:15 ` Adhemerval Zanella Netto
  0 siblings, 2 replies; 14+ messages in thread
From: abush wang @ 2024-04-01 11:47 UTC (permalink / raw)
  To: abushwang via Libc-alpha; +Cc: adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

This is test:
```
uint64_t getnsecs() {
    uint32_t lo, hi;
    __asm__ __volatile__ (
        "rdtsc" : "=a"(lo), "=d"(hi)
    );
    return ((uint64_t)hi << 32) | lo;
}

int main() {
    const int num_iterations = 1;
    uint64_t start, end, total_time = 0;

    start = getnsecs();
    for (int i = 0; i < num_iterations; i++) {
        (void) lrand48();
    }
    end = getnsecs();
    total_time += (end - start);

    printf("Average time for lrand48: %lu cycles\n", total_time /
num_iterations);
    return 0;
}
```
before:
Average time for lrand48: 21418 cycles

after:
Average time for lrand48: 9892 cycles

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
@ 2024-04-01 13:12 ` Florian Weimer
  2024-04-01 13:17   ` H.J. Lu
  2024-04-02  2:17   ` abush wang
  2024-04-02 14:15 ` Adhemerval Zanella Netto
  1 sibling, 2 replies; 14+ messages in thread
From: Florian Weimer @ 2024-04-01 13:12 UTC (permalink / raw)
  To: abush wang; +Cc: abushwang via Libc-alpha, adhemerval.zanella

* abush wang:

> This is test:
> ```
> uint64_t getnsecs() {
>     uint32_t lo, hi; 
>     __asm__ __volatile__ (
>         "rdtsc" : "=a"(lo), "=d"(hi)
>     );  
>     return ((uint64_t)hi << 32) | lo;
> }
>
> int main() {
>     const int num_iterations = 1;
>     uint64_t start, end, total_time = 0;
>
>     start = getnsecs();
>     for (int i = 0; i < num_iterations; i++) {
>         (void) lrand48();
>     }
>     end = getnsecs();
>     total_time += (end - start);
>
>     printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>     return 0;
> }
> ```
> before:
> Average time for lrand48: 21418 cycles
>
> after:
> Average time for lrand48: 9892 cycles

Do you see this on x86-64?  So this isn't a displacement range issue?

It could be that this is a random performance change due to code
alignment, and not actually caused by the direct call distance.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 13:12 ` Florian Weimer
@ 2024-04-01 13:17   ` H.J. Lu
  2024-04-01 13:46     ` Adhemerval Zanella Netto
  2024-04-02  3:54     ` abush wang
  2024-04-02  2:17   ` abush wang
  1 sibling, 2 replies; 14+ messages in thread
From: H.J. Lu @ 2024-04-01 13:17 UTC (permalink / raw)
  To: Florian Weimer; +Cc: abush wang, abushwang via Libc-alpha, adhemerval.zanella

On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * abush wang:
>
> > This is test:
> > ```
> > uint64_t getnsecs() {
> >     uint32_t lo, hi;
> >     __asm__ __volatile__ (
> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >     );
> >     return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> >     const int num_iterations = 1;
> >     uint64_t start, end, total_time = 0;
> >
> >     start = getnsecs();
> >     for (int i = 0; i < num_iterations; i++) {
> >         (void) lrand48();
> >     }
> >     end = getnsecs();
> >     total_time += (end - start);
> >
> >     printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
> >     return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>
> Do you see this on x86-64?  So this isn't a displacement range issue?
>
> It could be that this is a random performance change due to code
> alignment, and not actually caused by the direct call distance.
>

I have a linker patch to control section layout:

https://patchwork.sourceware.org/project/binutils/list/?series=29973

It can

1. Reduce gaps between text sections.
2. Put hot text sections close to each other.

If it can solve this issue, we should add this feature to ld.

-- 
H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 13:17   ` H.J. Lu
@ 2024-04-01 13:46     ` Adhemerval Zanella Netto
  2024-04-02  3:54     ` abush wang
  1 sibling, 0 replies; 14+ messages in thread
From: Adhemerval Zanella Netto @ 2024-04-01 13:46 UTC (permalink / raw)
  To: H.J. Lu, Florian Weimer; +Cc: abush wang, abushwang via Libc-alpha



On 01/04/24 10:17, H.J. Lu wrote:
> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * abush wang:
>>
>>> This is test:
>>> ```
>>> uint64_t getnsecs() {
>>>     uint32_t lo, hi;
>>>     __asm__ __volatile__ (
>>>         "rdtsc" : "=a"(lo), "=d"(hi)
>>>     );
>>>     return ((uint64_t)hi << 32) | lo;
>>> }
>>>
>>> int main() {
>>>     const int num_iterations = 1;
>>>     uint64_t start, end, total_time = 0;
>>>
>>>     start = getnsecs();
>>>     for (int i = 0; i < num_iterations; i++) {
>>>         (void) lrand48();
>>>     }
>>>     end = getnsecs();
>>>     total_time += (end - start);
>>>
>>>     printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>>>     return 0;
>>> }
>>> ```
>>> before:
>>> Average time for lrand48: 21418 cycles
>>>
>>> after:
>>> Average time for lrand48: 9892 cycles
>>
>> Do you see this on x86-64?  So this isn't a displacement range issue?
>>
>> It could be that this is a random performance change due to code
>> alignment, and not actually caused by the direct call distance.
>>
> 
> I have a linker patch to control section layout:
> 
> https://patchwork.sourceware.org/project/binutils/list/?series=29973
> 
> It can
> 
> 1. Reduce gaps between text sections.
> 2. Put hot text sections close to each other.
> 
> If it can solve this issue, we should add this feature to ld.
> 

Another possibility, if this is related to a displacement range due some
ISA limitation; would to move the lrand entrypoint to the same TU (at
least the one that are simple wrapper that ended up being tail calls).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 13:12 ` Florian Weimer
  2024-04-01 13:17   ` H.J. Lu
@ 2024-04-02  2:17   ` abush wang
  2024-04-02  2:28     ` abush wang
  1 sibling, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02  2:17 UTC (permalink / raw)
  To: Florian Weimer; +Cc: abushwang via Libc-alpha, adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

Yes,on x86-64.
I just compare the disassemble between d275970ab and before commit by
objdump.
And __drand48_iterate will be more long distance after d275970ab, so I
revert this
commit and found the performance will recover a little.

Thanks,
abush


On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:

> * abush wang:
>
> > This is test:
> > ```
> > uint64_t getnsecs() {
> >     uint32_t lo, hi;
> >     __asm__ __volatile__ (
> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >     );
> >     return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> >     const int num_iterations = 1;
> >     uint64_t start, end, total_time = 0;
> >
> >     start = getnsecs();
> >     for (int i = 0; i < num_iterations; i++) {
> >         (void) lrand48();
> >     }
> >     end = getnsecs();
> >     total_time += (end - start);
> >
> >     printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> >     return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>
> Do you see this on x86-64?  So this isn't a displacement range issue?
>
> It could be that this is a random performance change due to code
> alignment, and not actually caused by the direct call distance.
>
> Thanks,
> Florian
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-02  2:17   ` abush wang
@ 2024-04-02  2:28     ` abush wang
  2024-04-02  3:13       ` H.J. Lu
  0 siblings, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02  2:28 UTC (permalink / raw)
  To: Florian Weimer; +Cc: abushwang via Libc-alpha, adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 1885 bytes --]

Actually, not just d275970ab
I found after a91bf4e0ff, there is also performance degradation on x86-64,
even if this commit has nothing  to do with lrand48.
This is my test data:
before  a91bf4e0ff:
Average time for lrand48: 1940 cycles

after:
Average time for lrand48: 5626 cycles

It seems like there is a gradual performance degradation for lrand48.


On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:

> Yes,on x86-64.
> I just compare the disassemble between d275970ab and before commit by
> objdump.
> And __drand48_iterate will be more long distance after d275970ab, so I
> revert this
> commit and found the performance will recover a little.
>
> Thanks,
> abush
>
>
> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:
>
>> * abush wang:
>>
>> > This is test:
>> > ```
>> > uint64_t getnsecs() {
>> >     uint32_t lo, hi;
>> >     __asm__ __volatile__ (
>> >         "rdtsc" : "=a"(lo), "=d"(hi)
>> >     );
>> >     return ((uint64_t)hi << 32) | lo;
>> > }
>> >
>> > int main() {
>> >     const int num_iterations = 1;
>> >     uint64_t start, end, total_time = 0;
>> >
>> >     start = getnsecs();
>> >     for (int i = 0; i < num_iterations; i++) {
>> >         (void) lrand48();
>> >     }
>> >     end = getnsecs();
>> >     total_time += (end - start);
>> >
>> >     printf("Average time for lrand48: %lu cycles\n", total_time /
>> num_iterations);
>> >     return 0;
>> > }
>> > ```
>> > before:
>> > Average time for lrand48: 21418 cycles
>> >
>> > after:
>> > Average time for lrand48: 9892 cycles
>>
>> Do you see this on x86-64?  So this isn't a displacement range issue?
>>
>> It could be that this is a random performance change due to code
>> alignment, and not actually caused by the direct call distance.
>>
>> Thanks,
>> Florian
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-02  2:28     ` abush wang
@ 2024-04-02  3:13       ` H.J. Lu
  2024-04-02  6:18         ` abush wang
  0 siblings, 1 reply; 14+ messages in thread
From: H.J. Lu @ 2024-04-02  3:13 UTC (permalink / raw)
  To: abush wang; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella

On Mon, Apr 1, 2024 at 7:28 PM abush wang <abushwangs@gmail.com> wrote:
>
> Actually, not just d275970ab
> I found after a91bf4e0ff, there is also performance degradation on x86-64,
> even if this commit has nothing  to do with lrand48.
> This is my test data:
> before  a91bf4e0ff:
> Average time for lrand48: 1940 cycles
>
> after:
> Average time for lrand48: 5626 cycles

Please compare alignments of 2 versions of lrand48.

> It seems like there is a gradual performance degradation for lrand48.
>
>
> On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:
>>
>> Yes,on x86-64.
>> I just compare the disassemble between d275970ab and before commit by objdump.
>> And __drand48_iterate will be more long distance after d275970ab, so I revert this
>> commit and found the performance will recover a little.
>>
>> Thanks,
>> abush
>>
>>
>> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>> * abush wang:
>>>
>>> > This is test:
>>> > ```
>>> > uint64_t getnsecs() {
>>> >     uint32_t lo, hi;
>>> >     __asm__ __volatile__ (
>>> >         "rdtsc" : "=a"(lo), "=d"(hi)
>>> >     );
>>> >     return ((uint64_t)hi << 32) | lo;
>>> > }
>>> >
>>> > int main() {
>>> >     const int num_iterations = 1;
>>> >     uint64_t start, end, total_time = 0;
>>> >
>>> >     start = getnsecs();
>>> >     for (int i = 0; i < num_iterations; i++) {
>>> >         (void) lrand48();
>>> >     }
>>> >     end = getnsecs();
>>> >     total_time += (end - start);
>>> >
>>> >     printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>>> >     return 0;
>>> > }
>>> > ```
>>> > before:
>>> > Average time for lrand48: 21418 cycles
>>> >
>>> > after:
>>> > Average time for lrand48: 9892 cycles
>>>
>>> Do you see this on x86-64?  So this isn't a displacement range issue?
>>>
>>> It could be that this is a random performance change due to code
>>> alignment, and not actually caused by the direct call distance.
>>>
>>> Thanks,
>>> Florian
>>>


-- 
H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 13:17   ` H.J. Lu
  2024-04-01 13:46     ` Adhemerval Zanella Netto
@ 2024-04-02  3:54     ` abush wang
  2024-04-08  2:48       ` abush wang
  1 sibling, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02  3:54 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 3793 bytes --]

Hi, Lu
it seems like there is some build issue:
```
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: warning: relocation against
`seen_eof_include_file' in read-only section `.text'
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans1.ltrans.o: in function `lang_add_wild':
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8518:(.text+0xa108):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8552:(.text+0xa3aa):
undefined reference to `seen_eof_include_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8553:(.text+0xa3b8):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8551:(.text+0xa48f):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: in function
`ldfile_open_command_file_1.lto_priv.0':
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:939:(.text+0x9fc2):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:965:(.text+0x9ff3):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:998:(.text+0xa0e7):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:995:(.text+0xa0f7):
undefined reference to `seen_eof_include_file'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:1265: ld-new] Error 1
make[3]: *** [Makefile:1903: all-recursive] Error 1
make[2]: *** [Makefile:1092: all] Error 2
make[1]: *** [Makefile:8046: all-ld] Error 2
```

test by binutils-2.42.50-6.fc41.src.rpm
this is my repo
https://mirrors.aliyun.com/fedora/development/rawhide/Everything/x86_64/os/
I have verified that the error reported is caused by these patches.


On Mon, Apr 1, 2024 at 9:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * abush wang:
> >
> > > This is test:
> > > ```
> > > uint64_t getnsecs() {
> > >     uint32_t lo, hi;
> > >     __asm__ __volatile__ (
> > >         "rdtsc" : "=a"(lo), "=d"(hi)
> > >     );
> > >     return ((uint64_t)hi << 32) | lo;
> > > }
> > >
> > > int main() {
> > >     const int num_iterations = 1;
> > >     uint64_t start, end, total_time = 0;
> > >
> > >     start = getnsecs();
> > >     for (int i = 0; i < num_iterations; i++) {
> > >         (void) lrand48();
> > >     }
> > >     end = getnsecs();
> > >     total_time += (end - start);
> > >
> > >     printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> > >     return 0;
> > > }
> > > ```
> > > before:
> > > Average time for lrand48: 21418 cycles
> > >
> > > after:
> > > Average time for lrand48: 9892 cycles
> >
> > Do you see this on x86-64?  So this isn't a displacement range issue?
> >
> > It could be that this is a random performance change due to code
> > alignment, and not actually caused by the direct call distance.
> >
>
> I have a linker patch to control section layout:
>
> https://patchwork.sourceware.org/project/binutils/list/?series=29973
>
> It can
>
> 1. Reduce gaps between text sections.
> 2. Put hot text sections close to each other.
>
> If it can solve this issue, we should add this feature to ld.
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-02  3:13       ` H.J. Lu
@ 2024-04-02  6:18         ` abush wang
  0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-02  6:18 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]

I have compared the address by nm, readelf and objdump,
it seems like there is no different in lrand48

This is my so
https://github.com/wswsmao/glibc_so

7a7229de1d:
Average time for lrand48: 1940 cycles

a91bf4e0ff:
Average time for lrand48: 5626 cycles


On Tue, Apr 2, 2024 at 11:14 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> On Mon, Apr 1, 2024 at 7:28 PM abush wang <abushwangs@gmail.com> wrote:
> >
> > Actually, not just d275970ab
> > I found after a91bf4e0ff, there is also performance degradation on
> x86-64,
> > even if this commit has nothing  to do with lrand48.
> > This is my test data:
> > before  a91bf4e0ff:
> > Average time for lrand48: 1940 cycles
> >
> > after:
> > Average time for lrand48: 5626 cycles
>
> Please compare alignments of 2 versions of lrand48.
>
> > It seems like there is a gradual performance degradation for lrand48.
> >
> >
> > On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:
> >>
> >> Yes,on x86-64.
> >> I just compare the disassemble between d275970ab and before commit by
> objdump.
> >> And __drand48_iterate will be more long distance after d275970ab, so I
> revert this
> >> commit and found the performance will recover a little.
> >>
> >> Thanks,
> >> abush
> >>
> >>
> >> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com>
> wrote:
> >>>
> >>> * abush wang:
> >>>
> >>> > This is test:
> >>> > ```
> >>> > uint64_t getnsecs() {
> >>> >     uint32_t lo, hi;
> >>> >     __asm__ __volatile__ (
> >>> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >>> >     );
> >>> >     return ((uint64_t)hi << 32) | lo;
> >>> > }
> >>> >
> >>> > int main() {
> >>> >     const int num_iterations = 1;
> >>> >     uint64_t start, end, total_time = 0;
> >>> >
> >>> >     start = getnsecs();
> >>> >     for (int i = 0; i < num_iterations; i++) {
> >>> >         (void) lrand48();
> >>> >     }
> >>> >     end = getnsecs();
> >>> >     total_time += (end - start);
> >>> >
> >>> >     printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> >>> >     return 0;
> >>> > }
> >>> > ```
> >>> > before:
> >>> > Average time for lrand48: 21418 cycles
> >>> >
> >>> > after:
> >>> > Average time for lrand48: 9892 cycles
> >>>
> >>> Do you see this on x86-64?  So this isn't a displacement range issue?
> >>>
> >>> It could be that this is a random performance change due to code
> >>> alignment, and not actually caused by the direct call distance.
> >>>
> >>> Thanks,
> >>> Florian
> >>>
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
  2024-04-01 13:12 ` Florian Weimer
@ 2024-04-02 14:15 ` Adhemerval Zanella Netto
  2024-04-03  1:57   ` abush wang
  1 sibling, 1 reply; 14+ messages in thread
From: Adhemerval Zanella Netto @ 2024-04-02 14:15 UTC (permalink / raw)
  To: abush wang, abushwang via Libc-alpha, H.J. Lu, Florian Weimer



On 01/04/24 08:47, abush wang wrote:
> This is test:
> ```
> uint64_t getnsecs() {
>     uint32_t lo, hi;
>     __asm__ __volatile__ (
>         "rdtsc" : "=a"(lo), "=d"(hi)
>     );  
>     return ((uint64_t)hi << 32) | lo;
> }
> 
> int main() {
>     const int num_iterations = 1;

This low number of iteration makes the benchmark pretty much useless
on modern hardware with frequency scaling.  By raising to something
like 1000000000 I see no variation on my workstation (Ryzen 5900).

>     uint64_t start, end, total_time = 0;
> 
>     start = getnsecs();
>     for (int i = 0; i < num_iterations; i++) {
>         (void) lrand48();
>     }
>     end = getnsecs();
>     total_time += (end - start);
> 
>     printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>     return 0;
> }
> ```
> before:
> Average time for lrand48: 21418 cycles
> 
> after:
> Average time for lrand48: 9892 cycles

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-02 14:15 ` Adhemerval Zanella Netto
@ 2024-04-03  1:57   ` abush wang
  0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-03  1:57 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: abushwang via Libc-alpha, H.J. Lu, Florian Weimer

[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]

I have try to add iteration like this
const int num_iterations = 100;

and I get:
Average time for lrand48: 37 cycles

there is a huge gap about the order of magnitude  of cycles.
It seems like the first call for lrand48 do more thing than subsequent calls



On Tue, Apr 2, 2024 at 10:16 PM Adhemerval Zanella Netto <
adhemerval.zanella@linaro.org> wrote:

>
>
> On 01/04/24 08:47, abush wang wrote:
> > This is test:
> > ```
> > uint64_t getnsecs() {
> >     uint32_t lo, hi;
> >     __asm__ __volatile__ (
> >         "rdtsc" : "=a"(lo), "=d"(hi)
> >     );
> >     return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> >     const int num_iterations = 1;
>
> This low number of iteration makes the benchmark pretty much useless
> on modern hardware with frequency scaling.  By raising to something
> like 1000000000 I see no variation on my workstation (Ryzen 5900).
>
> >     uint64_t start, end, total_time = 0;
> >
> >     start = getnsecs();
> >     for (int i = 0; i < num_iterations; i++) {
> >         (void) lrand48();
> >     }
> >     end = getnsecs();
> >     total_time += (end - start);
> >
> >     printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> >     return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-02  3:54     ` abush wang
@ 2024-04-08  2:48       ` abush wang
  0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-08  2:48 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella

[-- Attachment #1: Type: text/plain, Size: 4081 bytes --]

Hi H.J.
Is there an updated version to solve this problem?

On Tue, Apr 2, 2024 at 11:54 AM abush wang <abushwangs@gmail.com> wrote:

> Hi, Lu
> it seems like there is some build issue:
> ```
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: warning: relocation against
> `seen_eof_include_file' in read-only section `.text'
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans1.ltrans.o: in function `lang_add_wild':
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8518:(.text+0xa108):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8552:(.text+0xa3aa):
> undefined reference to `seen_eof_include_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8553:(.text+0xa3b8):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8551:(.text+0xa48f):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: in function
> `ldfile_open_command_file_1.lto_priv.0':
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:939:(.text+0x9fc2):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:965:(.text+0x9ff3):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:998:(.text+0xa0e7):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:995:(.text+0xa0f7):
> undefined reference to `seen_eof_include_file'
> /usr/bin/ld: warning: creating DT_TEXTREL in a PIE
> collect2: error: ld returned 1 exit status
> make[4]: *** [Makefile:1265: ld-new] Error 1
> make[3]: *** [Makefile:1903: all-recursive] Error 1
> make[2]: *** [Makefile:1092: all] Error 2
> make[1]: *** [Makefile:8046: all-ld] Error 2
> ```
>
> test by binutils-2.42.50-6.fc41.src.rpm
> this is my repo
> https://mirrors.aliyun.com/fedora/development/rawhide/Everything/x86_64/os/
> I have verified that the error reported is caused by these patches.
>
>
> On Mon, Apr 1, 2024 at 9:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >
>> > * abush wang:
>> >
>> > > This is test:
>> > > ```
>> > > uint64_t getnsecs() {
>> > >     uint32_t lo, hi;
>> > >     __asm__ __volatile__ (
>> > >         "rdtsc" : "=a"(lo), "=d"(hi)
>> > >     );
>> > >     return ((uint64_t)hi << 32) | lo;
>> > > }
>> > >
>> > > int main() {
>> > >     const int num_iterations = 1;
>> > >     uint64_t start, end, total_time = 0;
>> > >
>> > >     start = getnsecs();
>> > >     for (int i = 0; i < num_iterations; i++) {
>> > >         (void) lrand48();
>> > >     }
>> > >     end = getnsecs();
>> > >     total_time += (end - start);
>> > >
>> > >     printf("Average time for lrand48: %lu cycles\n", total_time /
>> num_iterations);
>> > >     return 0;
>> > > }
>> > > ```
>> > > before:
>> > > Average time for lrand48: 21418 cycles
>> > >
>> > > after:
>> > > Average time for lrand48: 9892 cycles
>> >
>> > Do you see this on x86-64?  So this isn't a displacement range issue?
>> >
>> > It could be that this is a random performance change due to code
>> > alignment, and not actually caused by the direct call distance.
>> >
>>
>> I have a linker patch to control section layout:
>>
>> https://patchwork.sourceware.org/project/binutils/list/?series=29973
>>
>> It can
>>
>> 1. Reduce gaps between text sections.
>> 2. Put hot text sections close to each other.
>>
>> If it can solve this issue, we should add this feature to ld.
>>
>> --
>> H.J.
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
  2024-04-01 11:44 abushwang
@ 2024-04-01 12:03 ` Xi Ruoyao
  0 siblings, 0 replies; 14+ messages in thread
From: Xi Ruoyao @ 2024-04-01 12:03 UTC (permalink / raw)
  To: abushwang, libc-alpha; +Cc: adhemerval.zanella, Shuo Wang

On Mon, 2024-04-01 at 19:44 +0800, abushwang wrote:
> +routines        :=                                                                           \
> +        atof atoi atol atoll                                                                 \

Don't make lines exceed 80 characters.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
@ 2024-04-01 11:44 abushwang
  2024-04-01 12:03 ` Xi Ruoyao
  0 siblings, 1 reply; 14+ messages in thread
From: abushwang @ 2024-04-01 11:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: adhemerval.zanella, Shuo Wang

From: Shuo Wang <abushwang@tencent.com>

Commit d275970ab sort all functions by lexicographic Order,
which potentially impacts performance (such as 'lrand48')
due to increased distance in the compiled output

Signed-off-by: Shuo Wang <abushwang@tencent.com>
---
 stdlib/Makefile | 215 ++++++++++++------------------------------------
 1 file changed, 51 insertions(+), 164 deletions(-)

diff --git a/stdlib/Makefile b/stdlib/Makefile
index 8b0ac63ddb..d2d912db27 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -51,170 +51,57 @@ headers := \
   ucontext.h \
   # headers
 
-routines := \
-  a64l \
-  abort \
-  abs \
-  arc4random \
-  arc4random_uniform \
-  at_quick_exit \
-  atof \
-  atoi \
-  atol\
-  atoll \
-  bsearch \
-  canonicalize \
-  cxa_at_quick_exit \
-  cxa_atexit \
-  cxa_finalize \
-  cxa_thread_atexit_impl \
-  div \
-  drand48 \
-  drand48-iter \
-  drand48_r \
-  erand48 \
-  erand48_r \
-  exit \
-  fmtmsg \
-  getcontext \
-  getentropy \
-  getenv \
-  getrandom \
-  getsubopt \
-  jrand48 \
-  jrand48_r \
-  l64a \
-  labs \
-  lcong48 \
-  lcong48_r \
-  ldiv \
-  llabs \
-  lldiv \
-  lrand48 \
-  lrand48_r \
-  makecontext \
-  mblen \
-  mbstowcs \
-  mbtowc \
-  mrand48 \
-  mrand48_r \
-  nrand48 \
-  nrand48_r \
-  old_atexit  \
-  on_exit atexit \
-  putenv \
-  qsort \
-  quick_exit \
-  rand \
-  rand_r \
-  random \
-  random_r \
-  rpmatch \
-  secure-getenv \
-  seed48 \
-  seed48_r \
-  setcontext \
-  setenv \
-  srand48 \
-  srand48_r \
-  stdc_bit_ceil_uc \
-  stdc_bit_ceil_ui \
-  stdc_bit_ceil_ul \
-  stdc_bit_ceil_ull \
-  stdc_bit_ceil_us \
-  stdc_bit_floor_uc \
-  stdc_bit_floor_ui \
-  stdc_bit_floor_ul \
-  stdc_bit_floor_ull \
-  stdc_bit_floor_us \
-  stdc_bit_width_uc \
-  stdc_bit_width_ui \
-  stdc_bit_width_ul \
-  stdc_bit_width_ull \
-  stdc_bit_width_us \
-  stdc_count_ones_uc \
-  stdc_count_ones_ui \
-  stdc_count_ones_ul \
-  stdc_count_ones_ull \
-  stdc_count_ones_us \
-  stdc_count_zeros_uc \
-  stdc_count_zeros_ui \
-  stdc_count_zeros_ul \
-  stdc_count_zeros_ull \
-  stdc_count_zeros_us \
-  stdc_first_leading_one_uc \
-  stdc_first_leading_one_ui \
-  stdc_first_leading_one_ul \
-  stdc_first_leading_one_ull \
-  stdc_first_leading_one_us \
-  stdc_first_leading_zero_uc \
-  stdc_first_leading_zero_ui \
-  stdc_first_leading_zero_ul \
-  stdc_first_leading_zero_ull \
-  stdc_first_leading_zero_us \
-  stdc_first_trailing_one_uc \
-  stdc_first_trailing_one_ui \
-  stdc_first_trailing_one_ul \
-  stdc_first_trailing_one_ull \
-  stdc_first_trailing_one_us \
-  stdc_first_trailing_zero_uc \
-  stdc_first_trailing_zero_ui \
-  stdc_first_trailing_zero_ul \
-  stdc_first_trailing_zero_ull \
-  stdc_first_trailing_zero_us \
-  stdc_has_single_bit_uc \
-  stdc_has_single_bit_ui \
-  stdc_has_single_bit_ul \
-  stdc_has_single_bit_ull \
-  stdc_has_single_bit_us \
-  stdc_leading_ones_uc \
-  stdc_leading_ones_ui \
-  stdc_leading_ones_ul \
-  stdc_leading_ones_ull \
-  stdc_leading_ones_us \
-  stdc_leading_zeros_uc \
-  stdc_leading_zeros_ui \
-  stdc_leading_zeros_ul \
-  stdc_leading_zeros_ull \
-  stdc_leading_zeros_us \
-  stdc_trailing_ones_uc \
-  stdc_trailing_ones_ui \
-  stdc_trailing_ones_ul \
-  stdc_trailing_ones_ull \
-  stdc_trailing_ones_us \
-  stdc_trailing_zeros_uc \
-  stdc_trailing_zeros_ui \
-  stdc_trailing_zeros_ul \
-  stdc_trailing_zeros_ull \
-  stdc_trailing_zeros_us \
-  strfmon \
-  strfmon_l \
-  strfromd \
-  strfromf \
-  strfroml \
-  strtod \
-  strtod_l \
-  strtod_nan \
-  strtof \
-  strtof_l \
-  strtof_nan \
-  strtol \
-  strtol_l \
-  strtold \
-  strtold_l \
-  strtold_nan \
-  strtoll \
-  strtoll_l \
-  strtoul \
-  strtoul_l \
-  strtoull \
-  strtoull_l \
-  swapcontext \
-  system \
-  wcstombs \
-  wctomb  \
-  xpg_basename \
-  # routines
+routines        :=                                                                           \
+        atof atoi atol atoll                                                                 \
+        abort                                                                                \
+        bsearch qsort                                                                        \
+        getenv putenv setenv secure-getenv                                                   \
+        exit on_exit atexit cxa_atexit cxa_finalize old_atexit                               \
+        quick_exit at_quick_exit cxa_at_quick_exit cxa_thread_atexit_impl                    \
+        abs labs llabs                                                                       \
+        div ldiv lldiv                                                                       \
+        mblen mbstowcs mbtowc wcstombs wctomb                                                \
+        arc4random arc4random_uniform                                                        \
+        random random_r rand rand_r                                                          \
+        drand48 erand48 lrand48 nrand48 mrand48 jrand48                                      \
+        srand48 seed48 lcong48                                                               \
+        drand48_r erand48_r lrand48_r nrand48_r mrand48_r jrand48_r                          \
+        srand48_r seed48_r lcong48_r                                                         \
+        drand48-iter getrandom getentropy                                                    \
+        strfromf strfromd strfroml                                                           \
+        strtol strtoul strtoll strtoull                                                      \
+        strtol_l strtoul_l strtoll_l strtoull_l                                              \
+        strtof strtod strtold                                                                \
+        strtof_l strtod_l strtold_l                                                          \
+        strtof_nan strtod_nan strtold_nan                                                    \
+        system canonicalize                                                                  \
+        stdc_bit_ceil_uc stdc_bit_ceil_ui stdc_bit_ceil_ul                                   \
+        stdc_bit_ceil_ull stdc_bit_ceil_us stdc_bit_floor_uc                                 \
+        stdc_bit_floor_ui stdc_bit_floor_ul stdc_bit_floor_ull                               \
+        stdc_bit_floor_us stdc_bit_width_uc stdc_bit_width_ui                                \
+        stdc_bit_width_ul stdc_bit_width_ull stdc_bit_width_us                               \
+        stdc_count_ones_uc stdc_count_ones_ui stdc_count_ones_ul                             \
+        stdc_count_ones_ull stdc_count_ones_us stdc_count_zeros_uc                           \
+        stdc_count_zeros_ui stdc_count_zeros_ul stdc_count_zeros_ull                         \
+        stdc_count_zeros_us stdc_first_leading_one_uc stdc_first_leading_one_ui              \
+        stdc_first_leading_one_ul stdc_first_leading_one_ull stdc_first_leading_one_us       \
+        stdc_first_leading_zero_uc stdc_first_leading_zero_ui stdc_first_leading_zero_ul     \
+        stdc_first_leading_zero_ull stdc_first_leading_zero_us stdc_first_trailing_one_uc    \
+        stdc_first_trailing_one_ui stdc_first_trailing_one_ul stdc_first_trailing_one_ull    \
+        stdc_first_trailing_one_us stdc_first_trailing_zero_uc stdc_first_trailing_zero_ui   \
+        stdc_first_trailing_zero_ul stdc_first_trailing_zero_ull stdc_first_trailing_zero_us \
+        stdc_has_single_bit_uc stdc_has_single_bit_ui stdc_has_single_bit_ul                 \
+        stdc_has_single_bit_ull stdc_has_single_bit_us stdc_leading_ones_uc                  \
+        stdc_leading_ones_ui stdc_leading_ones_ul stdc_leading_ones_ull                      \
+        stdc_leading_ones_us stdc_leading_zeros_uc stdc_leading_zeros_ui                     \
+        stdc_leading_zeros_ul stdc_leading_zeros_ull stdc_leading_zeros_us                   \
+        stdc_trailing_ones_uc stdc_trailing_ones_ui stdc_trailing_ones_ul                    \
+        stdc_trailing_ones_ull stdc_trailing_ones_us stdc_trailing_zeros_uc                  \
+        stdc_trailing_zeros_ui stdc_trailing_zeros_ul stdc_trailing_zeros_ull                \
+        stdc_trailing_zeros_us                                                               \
+        a64l l64a                                                                            \
+        rpmatch strfmon strfmon_l getsubopt xpg_basename fmtmsg                              \
+        getcontext setcontext makecontext swapcontext
 
 # Exclude fortified routines from being built with _FORTIFY_SOURCE
 routines_no_fortify += \
-- 
2.37.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-08  2:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
2024-04-01 13:12 ` Florian Weimer
2024-04-01 13:17   ` H.J. Lu
2024-04-01 13:46     ` Adhemerval Zanella Netto
2024-04-02  3:54     ` abush wang
2024-04-08  2:48       ` abush wang
2024-04-02  2:17   ` abush wang
2024-04-02  2:28     ` abush wang
2024-04-02  3:13       ` H.J. Lu
2024-04-02  6:18         ` abush wang
2024-04-02 14:15 ` Adhemerval Zanella Netto
2024-04-03  1:57   ` abush wang
  -- strict thread matches above, loose matches on Subject: below --
2024-04-01 11:44 abushwang
2024-04-01 12:03 ` Xi Ruoyao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).