* [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
@ 2024-04-01 11:47 abush wang
2024-04-01 13:12 ` Florian Weimer
2024-04-02 14:15 ` Adhemerval Zanella Netto
0 siblings, 2 replies; 14+ messages in thread
From: abush wang @ 2024-04-01 11:47 UTC (permalink / raw)
To: abushwang via Libc-alpha; +Cc: adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 617 bytes --]
This is test:
```
uint64_t getnsecs() {
uint32_t lo, hi;
__asm__ __volatile__ (
"rdtsc" : "=a"(lo), "=d"(hi)
);
return ((uint64_t)hi << 32) | lo;
}
int main() {
const int num_iterations = 1;
uint64_t start, end, total_time = 0;
start = getnsecs();
for (int i = 0; i < num_iterations; i++) {
(void) lrand48();
}
end = getnsecs();
total_time += (end - start);
printf("Average time for lrand48: %lu cycles\n", total_time /
num_iterations);
return 0;
}
```
before:
Average time for lrand48: 21418 cycles
after:
Average time for lrand48: 9892 cycles
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
@ 2024-04-01 13:12 ` Florian Weimer
2024-04-01 13:17 ` H.J. Lu
2024-04-02 2:17 ` abush wang
2024-04-02 14:15 ` Adhemerval Zanella Netto
1 sibling, 2 replies; 14+ messages in thread
From: Florian Weimer @ 2024-04-01 13:12 UTC (permalink / raw)
To: abush wang; +Cc: abushwang via Libc-alpha, adhemerval.zanella
* abush wang:
> This is test:
> ```
> uint64_t getnsecs() {
> uint32_t lo, hi;
> __asm__ __volatile__ (
> "rdtsc" : "=a"(lo), "=d"(hi)
> );
> return ((uint64_t)hi << 32) | lo;
> }
>
> int main() {
> const int num_iterations = 1;
> uint64_t start, end, total_time = 0;
>
> start = getnsecs();
> for (int i = 0; i < num_iterations; i++) {
> (void) lrand48();
> }
> end = getnsecs();
> total_time += (end - start);
>
> printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
> return 0;
> }
> ```
> before:
> Average time for lrand48: 21418 cycles
>
> after:
> Average time for lrand48: 9892 cycles
Do you see this on x86-64? So this isn't a displacement range issue?
It could be that this is a random performance change due to code
alignment, and not actually caused by the direct call distance.
Thanks,
Florian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 13:12 ` Florian Weimer
@ 2024-04-01 13:17 ` H.J. Lu
2024-04-01 13:46 ` Adhemerval Zanella Netto
2024-04-02 3:54 ` abush wang
2024-04-02 2:17 ` abush wang
1 sibling, 2 replies; 14+ messages in thread
From: H.J. Lu @ 2024-04-01 13:17 UTC (permalink / raw)
To: Florian Weimer; +Cc: abush wang, abushwang via Libc-alpha, adhemerval.zanella
On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * abush wang:
>
> > This is test:
> > ```
> > uint64_t getnsecs() {
> > uint32_t lo, hi;
> > __asm__ __volatile__ (
> > "rdtsc" : "=a"(lo), "=d"(hi)
> > );
> > return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> > const int num_iterations = 1;
> > uint64_t start, end, total_time = 0;
> >
> > start = getnsecs();
> > for (int i = 0; i < num_iterations; i++) {
> > (void) lrand48();
> > }
> > end = getnsecs();
> > total_time += (end - start);
> >
> > printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
> > return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>
> Do you see this on x86-64? So this isn't a displacement range issue?
>
> It could be that this is a random performance change due to code
> alignment, and not actually caused by the direct call distance.
>
I have a linker patch to control section layout:
https://patchwork.sourceware.org/project/binutils/list/?series=29973
It can
1. Reduce gaps between text sections.
2. Put hot text sections close to each other.
If it can solve this issue, we should add this feature to ld.
--
H.J.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 13:17 ` H.J. Lu
@ 2024-04-01 13:46 ` Adhemerval Zanella Netto
2024-04-02 3:54 ` abush wang
1 sibling, 0 replies; 14+ messages in thread
From: Adhemerval Zanella Netto @ 2024-04-01 13:46 UTC (permalink / raw)
To: H.J. Lu, Florian Weimer; +Cc: abush wang, abushwang via Libc-alpha
On 01/04/24 10:17, H.J. Lu wrote:
> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * abush wang:
>>
>>> This is test:
>>> ```
>>> uint64_t getnsecs() {
>>> uint32_t lo, hi;
>>> __asm__ __volatile__ (
>>> "rdtsc" : "=a"(lo), "=d"(hi)
>>> );
>>> return ((uint64_t)hi << 32) | lo;
>>> }
>>>
>>> int main() {
>>> const int num_iterations = 1;
>>> uint64_t start, end, total_time = 0;
>>>
>>> start = getnsecs();
>>> for (int i = 0; i < num_iterations; i++) {
>>> (void) lrand48();
>>> }
>>> end = getnsecs();
>>> total_time += (end - start);
>>>
>>> printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>>> return 0;
>>> }
>>> ```
>>> before:
>>> Average time for lrand48: 21418 cycles
>>>
>>> after:
>>> Average time for lrand48: 9892 cycles
>>
>> Do you see this on x86-64? So this isn't a displacement range issue?
>>
>> It could be that this is a random performance change due to code
>> alignment, and not actually caused by the direct call distance.
>>
>
> I have a linker patch to control section layout:
>
> https://patchwork.sourceware.org/project/binutils/list/?series=29973
>
> It can
>
> 1. Reduce gaps between text sections.
> 2. Put hot text sections close to each other.
>
> If it can solve this issue, we should add this feature to ld.
>
Another possibility, if this is related to a displacement range due some
ISA limitation; would to move the lrand entrypoint to the same TU (at
least the one that are simple wrapper that ended up being tail calls).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 13:12 ` Florian Weimer
2024-04-01 13:17 ` H.J. Lu
@ 2024-04-02 2:17 ` abush wang
2024-04-02 2:28 ` abush wang
1 sibling, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02 2:17 UTC (permalink / raw)
To: Florian Weimer; +Cc: abushwang via Libc-alpha, adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]
Yes,on x86-64.
I just compare the disassemble between d275970ab and before commit by
objdump.
And __drand48_iterate will be more long distance after d275970ab, so I
revert this
commit and found the performance will recover a little.
Thanks,
abush
On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:
> * abush wang:
>
> > This is test:
> > ```
> > uint64_t getnsecs() {
> > uint32_t lo, hi;
> > __asm__ __volatile__ (
> > "rdtsc" : "=a"(lo), "=d"(hi)
> > );
> > return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> > const int num_iterations = 1;
> > uint64_t start, end, total_time = 0;
> >
> > start = getnsecs();
> > for (int i = 0; i < num_iterations; i++) {
> > (void) lrand48();
> > }
> > end = getnsecs();
> > total_time += (end - start);
> >
> > printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> > return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>
> Do you see this on x86-64? So this isn't a displacement range issue?
>
> It could be that this is a random performance change due to code
> alignment, and not actually caused by the direct call distance.
>
> Thanks,
> Florian
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-02 2:17 ` abush wang
@ 2024-04-02 2:28 ` abush wang
2024-04-02 3:13 ` H.J. Lu
0 siblings, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02 2:28 UTC (permalink / raw)
To: Florian Weimer; +Cc: abushwang via Libc-alpha, adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 1885 bytes --]
Actually, not just d275970ab
I found after a91bf4e0ff, there is also performance degradation on x86-64,
even if this commit has nothing to do with lrand48.
This is my test data:
before a91bf4e0ff:
Average time for lrand48: 1940 cycles
after:
Average time for lrand48: 5626 cycles
It seems like there is a gradual performance degradation for lrand48.
On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:
> Yes,on x86-64.
> I just compare the disassemble between d275970ab and before commit by
> objdump.
> And __drand48_iterate will be more long distance after d275970ab, so I
> revert this
> commit and found the performance will recover a little.
>
> Thanks,
> abush
>
>
> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:
>
>> * abush wang:
>>
>> > This is test:
>> > ```
>> > uint64_t getnsecs() {
>> > uint32_t lo, hi;
>> > __asm__ __volatile__ (
>> > "rdtsc" : "=a"(lo), "=d"(hi)
>> > );
>> > return ((uint64_t)hi << 32) | lo;
>> > }
>> >
>> > int main() {
>> > const int num_iterations = 1;
>> > uint64_t start, end, total_time = 0;
>> >
>> > start = getnsecs();
>> > for (int i = 0; i < num_iterations; i++) {
>> > (void) lrand48();
>> > }
>> > end = getnsecs();
>> > total_time += (end - start);
>> >
>> > printf("Average time for lrand48: %lu cycles\n", total_time /
>> num_iterations);
>> > return 0;
>> > }
>> > ```
>> > before:
>> > Average time for lrand48: 21418 cycles
>> >
>> > after:
>> > Average time for lrand48: 9892 cycles
>>
>> Do you see this on x86-64? So this isn't a displacement range issue?
>>
>> It could be that this is a random performance change due to code
>> alignment, and not actually caused by the direct call distance.
>>
>> Thanks,
>> Florian
>>
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-02 2:28 ` abush wang
@ 2024-04-02 3:13 ` H.J. Lu
2024-04-02 6:18 ` abush wang
0 siblings, 1 reply; 14+ messages in thread
From: H.J. Lu @ 2024-04-02 3:13 UTC (permalink / raw)
To: abush wang; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella
On Mon, Apr 1, 2024 at 7:28 PM abush wang <abushwangs@gmail.com> wrote:
>
> Actually, not just d275970ab
> I found after a91bf4e0ff, there is also performance degradation on x86-64,
> even if this commit has nothing to do with lrand48.
> This is my test data:
> before a91bf4e0ff:
> Average time for lrand48: 1940 cycles
>
> after:
> Average time for lrand48: 5626 cycles
Please compare alignments of 2 versions of lrand48.
> It seems like there is a gradual performance degradation for lrand48.
>
>
> On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:
>>
>> Yes,on x86-64.
>> I just compare the disassemble between d275970ab and before commit by objdump.
>> And __drand48_iterate will be more long distance after d275970ab, so I revert this
>> commit and found the performance will recover a little.
>>
>> Thanks,
>> abush
>>
>>
>> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>> * abush wang:
>>>
>>> > This is test:
>>> > ```
>>> > uint64_t getnsecs() {
>>> > uint32_t lo, hi;
>>> > __asm__ __volatile__ (
>>> > "rdtsc" : "=a"(lo), "=d"(hi)
>>> > );
>>> > return ((uint64_t)hi << 32) | lo;
>>> > }
>>> >
>>> > int main() {
>>> > const int num_iterations = 1;
>>> > uint64_t start, end, total_time = 0;
>>> >
>>> > start = getnsecs();
>>> > for (int i = 0; i < num_iterations; i++) {
>>> > (void) lrand48();
>>> > }
>>> > end = getnsecs();
>>> > total_time += (end - start);
>>> >
>>> > printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
>>> > return 0;
>>> > }
>>> > ```
>>> > before:
>>> > Average time for lrand48: 21418 cycles
>>> >
>>> > after:
>>> > Average time for lrand48: 9892 cycles
>>>
>>> Do you see this on x86-64? So this isn't a displacement range issue?
>>>
>>> It could be that this is a random performance change due to code
>>> alignment, and not actually caused by the direct call distance.
>>>
>>> Thanks,
>>> Florian
>>>
--
H.J.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 13:17 ` H.J. Lu
2024-04-01 13:46 ` Adhemerval Zanella Netto
@ 2024-04-02 3:54 ` abush wang
2024-04-08 2:48 ` abush wang
1 sibling, 1 reply; 14+ messages in thread
From: abush wang @ 2024-04-02 3:54 UTC (permalink / raw)
To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 3793 bytes --]
Hi, Lu
it seems like there is some build issue:
```
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: warning: relocation against
`seen_eof_include_file' in read-only section `.text'
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans1.ltrans.o: in function `lang_add_wild':
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8518:(.text+0xa108):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8552:(.text+0xa3aa):
undefined reference to `seen_eof_include_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8553:(.text+0xa3b8):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8551:(.text+0xa48f):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: in function
`ldfile_open_command_file_1.lto_priv.0':
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:939:(.text+0x9fc2):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:965:(.text+0x9ff3):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:998:(.text+0xa0e7):
undefined reference to `in_text_section_ordering_file'
/usr/bin/ld:
/builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:995:(.text+0xa0f7):
undefined reference to `seen_eof_include_file'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:1265: ld-new] Error 1
make[3]: *** [Makefile:1903: all-recursive] Error 1
make[2]: *** [Makefile:1092: all] Error 2
make[1]: *** [Makefile:8046: all-ld] Error 2
```
test by binutils-2.42.50-6.fc41.src.rpm
this is my repo
https://mirrors.aliyun.com/fedora/development/rawhide/Everything/x86_64/os/
I have verified that the error reported is caused by these patches.
On Mon, Apr 1, 2024 at 9:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * abush wang:
> >
> > > This is test:
> > > ```
> > > uint64_t getnsecs() {
> > > uint32_t lo, hi;
> > > __asm__ __volatile__ (
> > > "rdtsc" : "=a"(lo), "=d"(hi)
> > > );
> > > return ((uint64_t)hi << 32) | lo;
> > > }
> > >
> > > int main() {
> > > const int num_iterations = 1;
> > > uint64_t start, end, total_time = 0;
> > >
> > > start = getnsecs();
> > > for (int i = 0; i < num_iterations; i++) {
> > > (void) lrand48();
> > > }
> > > end = getnsecs();
> > > total_time += (end - start);
> > >
> > > printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> > > return 0;
> > > }
> > > ```
> > > before:
> > > Average time for lrand48: 21418 cycles
> > >
> > > after:
> > > Average time for lrand48: 9892 cycles
> >
> > Do you see this on x86-64? So this isn't a displacement range issue?
> >
> > It could be that this is a random performance change due to code
> > alignment, and not actually caused by the direct call distance.
> >
>
> I have a linker patch to control section layout:
>
> https://patchwork.sourceware.org/project/binutils/list/?series=29973
>
> It can
>
> 1. Reduce gaps between text sections.
> 2. Put hot text sections close to each other.
>
> If it can solve this issue, we should add this feature to ld.
>
> --
> H.J.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-02 3:13 ` H.J. Lu
@ 2024-04-02 6:18 ` abush wang
0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-02 6:18 UTC (permalink / raw)
To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]
I have compared the address by nm, readelf and objdump,
it seems like there is no different in lrand48
This is my so
https://github.com/wswsmao/glibc_so
7a7229de1d:
Average time for lrand48: 1940 cycles
a91bf4e0ff:
Average time for lrand48: 5626 cycles
On Tue, Apr 2, 2024 at 11:14 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Apr 1, 2024 at 7:28 PM abush wang <abushwangs@gmail.com> wrote:
> >
> > Actually, not just d275970ab
> > I found after a91bf4e0ff, there is also performance degradation on
> x86-64,
> > even if this commit has nothing to do with lrand48.
> > This is my test data:
> > before a91bf4e0ff:
> > Average time for lrand48: 1940 cycles
> >
> > after:
> > Average time for lrand48: 5626 cycles
>
> Please compare alignments of 2 versions of lrand48.
>
> > It seems like there is a gradual performance degradation for lrand48.
> >
> >
> > On Tue, Apr 2, 2024 at 10:17 AM abush wang <abushwangs@gmail.com> wrote:
> >>
> >> Yes,on x86-64.
> >> I just compare the disassemble between d275970ab and before commit by
> objdump.
> >> And __drand48_iterate will be more long distance after d275970ab, so I
> revert this
> >> commit and found the performance will recover a little.
> >>
> >> Thanks,
> >> abush
> >>
> >>
> >> On Mon, Apr 1, 2024 at 9:12 PM Florian Weimer <fweimer@redhat.com>
> wrote:
> >>>
> >>> * abush wang:
> >>>
> >>> > This is test:
> >>> > ```
> >>> > uint64_t getnsecs() {
> >>> > uint32_t lo, hi;
> >>> > __asm__ __volatile__ (
> >>> > "rdtsc" : "=a"(lo), "=d"(hi)
> >>> > );
> >>> > return ((uint64_t)hi << 32) | lo;
> >>> > }
> >>> >
> >>> > int main() {
> >>> > const int num_iterations = 1;
> >>> > uint64_t start, end, total_time = 0;
> >>> >
> >>> > start = getnsecs();
> >>> > for (int i = 0; i < num_iterations; i++) {
> >>> > (void) lrand48();
> >>> > }
> >>> > end = getnsecs();
> >>> > total_time += (end - start);
> >>> >
> >>> > printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> >>> > return 0;
> >>> > }
> >>> > ```
> >>> > before:
> >>> > Average time for lrand48: 21418 cycles
> >>> >
> >>> > after:
> >>> > Average time for lrand48: 9892 cycles
> >>>
> >>> Do you see this on x86-64? So this isn't a displacement range issue?
> >>>
> >>> It could be that this is a random performance change due to code
> >>> alignment, and not actually caused by the direct call distance.
> >>>
> >>> Thanks,
> >>> Florian
> >>>
>
>
> --
> H.J.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
2024-04-01 13:12 ` Florian Weimer
@ 2024-04-02 14:15 ` Adhemerval Zanella Netto
2024-04-03 1:57 ` abush wang
1 sibling, 1 reply; 14+ messages in thread
From: Adhemerval Zanella Netto @ 2024-04-02 14:15 UTC (permalink / raw)
To: abush wang, abushwang via Libc-alpha, H.J. Lu, Florian Weimer
On 01/04/24 08:47, abush wang wrote:
> This is test:
> ```
> uint64_t getnsecs() {
> uint32_t lo, hi;
> __asm__ __volatile__ (
> "rdtsc" : "=a"(lo), "=d"(hi)
> );
> return ((uint64_t)hi << 32) | lo;
> }
>
> int main() {
> const int num_iterations = 1;
This low number of iteration makes the benchmark pretty much useless
on modern hardware with frequency scaling. By raising to something
like 1000000000 I see no variation on my workstation (Ryzen 5900).
> uint64_t start, end, total_time = 0;
>
> start = getnsecs();
> for (int i = 0; i < num_iterations; i++) {
> (void) lrand48();
> }
> end = getnsecs();
> total_time += (end - start);
>
> printf("Average time for lrand48: %lu cycles\n", total_time / num_iterations);
> return 0;
> }
> ```
> before:
> Average time for lrand48: 21418 cycles
>
> after:
> Average time for lrand48: 9892 cycles
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-02 14:15 ` Adhemerval Zanella Netto
@ 2024-04-03 1:57 ` abush wang
0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-03 1:57 UTC (permalink / raw)
To: Adhemerval Zanella Netto
Cc: abushwang via Libc-alpha, H.J. Lu, Florian Weimer
[-- Attachment #1: Type: text/plain, Size: 1408 bytes --]
I have try to add iteration like this
const int num_iterations = 100;
and I get:
Average time for lrand48: 37 cycles
there is a huge gap about the order of magnitude of cycles.
It seems like the first call for lrand48 do more thing than subsequent calls
On Tue, Apr 2, 2024 at 10:16 PM Adhemerval Zanella Netto <
adhemerval.zanella@linaro.org> wrote:
>
>
> On 01/04/24 08:47, abush wang wrote:
> > This is test:
> > ```
> > uint64_t getnsecs() {
> > uint32_t lo, hi;
> > __asm__ __volatile__ (
> > "rdtsc" : "=a"(lo), "=d"(hi)
> > );
> > return ((uint64_t)hi << 32) | lo;
> > }
> >
> > int main() {
> > const int num_iterations = 1;
>
> This low number of iteration makes the benchmark pretty much useless
> on modern hardware with frequency scaling. By raising to something
> like 1000000000 I see no variation on my workstation (Ryzen 5900).
>
> > uint64_t start, end, total_time = 0;
> >
> > start = getnsecs();
> > for (int i = 0; i < num_iterations; i++) {
> > (void) lrand48();
> > }
> > end = getnsecs();
> > total_time += (end - start);
> >
> > printf("Average time for lrand48: %lu cycles\n", total_time /
> num_iterations);
> > return 0;
> > }
> > ```
> > before:
> > Average time for lrand48: 21418 cycles
> >
> > after:
> > Average time for lrand48: 9892 cycles
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-02 3:54 ` abush wang
@ 2024-04-08 2:48 ` abush wang
0 siblings, 0 replies; 14+ messages in thread
From: abush wang @ 2024-04-08 2:48 UTC (permalink / raw)
To: H.J. Lu; +Cc: Florian Weimer, abushwang via Libc-alpha, adhemerval.zanella
[-- Attachment #1: Type: text/plain, Size: 4081 bytes --]
Hi H.J.
Is there an updated version to solve this problem?
On Tue, Apr 2, 2024 at 11:54 AM abush wang <abushwangs@gmail.com> wrote:
> Hi, Lu
> it seems like there is some build issue:
> ```
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: warning: relocation against
> `seen_eof_include_file' in read-only section `.text'
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans1.ltrans.o: in function `lang_add_wild':
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8518:(.text+0xa108):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8552:(.text+0xa3aa):
> undefined reference to `seen_eof_include_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8553:(.text+0xa3b8):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldlang.c:8551:(.text+0xa48f):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld: /tmp/cc1NV6qZ.ltrans2.ltrans.o: in function
> `ldfile_open_command_file_1.lto_priv.0':
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:939:(.text+0x9fc2):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:965:(.text+0x9ff3):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:998:(.text+0xa0e7):
> undefined reference to `in_text_section_ordering_file'
> /usr/bin/ld:
> /builddir/build/BUILD/binutils-2.42.50/build-x86_64-redhat-linux/ld/../../ld/ldfile.c:995:(.text+0xa0f7):
> undefined reference to `seen_eof_include_file'
> /usr/bin/ld: warning: creating DT_TEXTREL in a PIE
> collect2: error: ld returned 1 exit status
> make[4]: *** [Makefile:1265: ld-new] Error 1
> make[3]: *** [Makefile:1903: all-recursive] Error 1
> make[2]: *** [Makefile:1092: all] Error 2
> make[1]: *** [Makefile:8046: all-ld] Error 2
> ```
>
> test by binutils-2.42.50-6.fc41.src.rpm
> this is my repo
> https://mirrors.aliyun.com/fedora/development/rawhide/Everything/x86_64/os/
> I have verified that the error reported is caused by these patches.
>
>
> On Mon, Apr 1, 2024 at 9:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> On Mon, Apr 1, 2024 at 6:13 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >
>> > * abush wang:
>> >
>> > > This is test:
>> > > ```
>> > > uint64_t getnsecs() {
>> > > uint32_t lo, hi;
>> > > __asm__ __volatile__ (
>> > > "rdtsc" : "=a"(lo), "=d"(hi)
>> > > );
>> > > return ((uint64_t)hi << 32) | lo;
>> > > }
>> > >
>> > > int main() {
>> > > const int num_iterations = 1;
>> > > uint64_t start, end, total_time = 0;
>> > >
>> > > start = getnsecs();
>> > > for (int i = 0; i < num_iterations; i++) {
>> > > (void) lrand48();
>> > > }
>> > > end = getnsecs();
>> > > total_time += (end - start);
>> > >
>> > > printf("Average time for lrand48: %lu cycles\n", total_time /
>> num_iterations);
>> > > return 0;
>> > > }
>> > > ```
>> > > before:
>> > > Average time for lrand48: 21418 cycles
>> > >
>> > > after:
>> > > Average time for lrand48: 9892 cycles
>> >
>> > Do you see this on x86-64? So this isn't a displacement range issue?
>> >
>> > It could be that this is a random performance change due to code
>> > alignment, and not actually caused by the direct call distance.
>> >
>>
>> I have a linker patch to control section layout:
>>
>> https://patchwork.sourceware.org/project/binutils/list/?series=29973
>>
>> It can
>>
>> 1. Reduce gaps between text sections.
>> 2. Put hot text sections close to each other.
>>
>> If it can solve this issue, we should add this feature to ld.
>>
>> --
>> H.J.
>>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
2024-04-01 11:44 abushwang
@ 2024-04-01 12:03 ` Xi Ruoyao
0 siblings, 0 replies; 14+ messages in thread
From: Xi Ruoyao @ 2024-04-01 12:03 UTC (permalink / raw)
To: abushwang, libc-alpha; +Cc: adhemerval.zanella, Shuo Wang
On Mon, 2024-04-01 at 19:44 +0800, abushwang wrote:
> +routines := \
> + atof atoi atol atoll \
Don't make lines exceed 80 characters.
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] stdlib: reorganize stdlib Makefile routines by functionality
@ 2024-04-01 11:44 abushwang
2024-04-01 12:03 ` Xi Ruoyao
0 siblings, 1 reply; 14+ messages in thread
From: abushwang @ 2024-04-01 11:44 UTC (permalink / raw)
To: libc-alpha; +Cc: adhemerval.zanella, Shuo Wang
From: Shuo Wang <abushwang@tencent.com>
Commit d275970ab sort all functions by lexicographic Order,
which potentially impacts performance (such as 'lrand48')
due to increased distance in the compiled output
Signed-off-by: Shuo Wang <abushwang@tencent.com>
---
stdlib/Makefile | 215 ++++++++++++------------------------------------
1 file changed, 51 insertions(+), 164 deletions(-)
diff --git a/stdlib/Makefile b/stdlib/Makefile
index 8b0ac63ddb..d2d912db27 100644
--- a/stdlib/Makefile
+++ b/stdlib/Makefile
@@ -51,170 +51,57 @@ headers := \
ucontext.h \
# headers
-routines := \
- a64l \
- abort \
- abs \
- arc4random \
- arc4random_uniform \
- at_quick_exit \
- atof \
- atoi \
- atol\
- atoll \
- bsearch \
- canonicalize \
- cxa_at_quick_exit \
- cxa_atexit \
- cxa_finalize \
- cxa_thread_atexit_impl \
- div \
- drand48 \
- drand48-iter \
- drand48_r \
- erand48 \
- erand48_r \
- exit \
- fmtmsg \
- getcontext \
- getentropy \
- getenv \
- getrandom \
- getsubopt \
- jrand48 \
- jrand48_r \
- l64a \
- labs \
- lcong48 \
- lcong48_r \
- ldiv \
- llabs \
- lldiv \
- lrand48 \
- lrand48_r \
- makecontext \
- mblen \
- mbstowcs \
- mbtowc \
- mrand48 \
- mrand48_r \
- nrand48 \
- nrand48_r \
- old_atexit \
- on_exit atexit \
- putenv \
- qsort \
- quick_exit \
- rand \
- rand_r \
- random \
- random_r \
- rpmatch \
- secure-getenv \
- seed48 \
- seed48_r \
- setcontext \
- setenv \
- srand48 \
- srand48_r \
- stdc_bit_ceil_uc \
- stdc_bit_ceil_ui \
- stdc_bit_ceil_ul \
- stdc_bit_ceil_ull \
- stdc_bit_ceil_us \
- stdc_bit_floor_uc \
- stdc_bit_floor_ui \
- stdc_bit_floor_ul \
- stdc_bit_floor_ull \
- stdc_bit_floor_us \
- stdc_bit_width_uc \
- stdc_bit_width_ui \
- stdc_bit_width_ul \
- stdc_bit_width_ull \
- stdc_bit_width_us \
- stdc_count_ones_uc \
- stdc_count_ones_ui \
- stdc_count_ones_ul \
- stdc_count_ones_ull \
- stdc_count_ones_us \
- stdc_count_zeros_uc \
- stdc_count_zeros_ui \
- stdc_count_zeros_ul \
- stdc_count_zeros_ull \
- stdc_count_zeros_us \
- stdc_first_leading_one_uc \
- stdc_first_leading_one_ui \
- stdc_first_leading_one_ul \
- stdc_first_leading_one_ull \
- stdc_first_leading_one_us \
- stdc_first_leading_zero_uc \
- stdc_first_leading_zero_ui \
- stdc_first_leading_zero_ul \
- stdc_first_leading_zero_ull \
- stdc_first_leading_zero_us \
- stdc_first_trailing_one_uc \
- stdc_first_trailing_one_ui \
- stdc_first_trailing_one_ul \
- stdc_first_trailing_one_ull \
- stdc_first_trailing_one_us \
- stdc_first_trailing_zero_uc \
- stdc_first_trailing_zero_ui \
- stdc_first_trailing_zero_ul \
- stdc_first_trailing_zero_ull \
- stdc_first_trailing_zero_us \
- stdc_has_single_bit_uc \
- stdc_has_single_bit_ui \
- stdc_has_single_bit_ul \
- stdc_has_single_bit_ull \
- stdc_has_single_bit_us \
- stdc_leading_ones_uc \
- stdc_leading_ones_ui \
- stdc_leading_ones_ul \
- stdc_leading_ones_ull \
- stdc_leading_ones_us \
- stdc_leading_zeros_uc \
- stdc_leading_zeros_ui \
- stdc_leading_zeros_ul \
- stdc_leading_zeros_ull \
- stdc_leading_zeros_us \
- stdc_trailing_ones_uc \
- stdc_trailing_ones_ui \
- stdc_trailing_ones_ul \
- stdc_trailing_ones_ull \
- stdc_trailing_ones_us \
- stdc_trailing_zeros_uc \
- stdc_trailing_zeros_ui \
- stdc_trailing_zeros_ul \
- stdc_trailing_zeros_ull \
- stdc_trailing_zeros_us \
- strfmon \
- strfmon_l \
- strfromd \
- strfromf \
- strfroml \
- strtod \
- strtod_l \
- strtod_nan \
- strtof \
- strtof_l \
- strtof_nan \
- strtol \
- strtol_l \
- strtold \
- strtold_l \
- strtold_nan \
- strtoll \
- strtoll_l \
- strtoul \
- strtoul_l \
- strtoull \
- strtoull_l \
- swapcontext \
- system \
- wcstombs \
- wctomb \
- xpg_basename \
- # routines
+routines := \
+ atof atoi atol atoll \
+ abort \
+ bsearch qsort \
+ getenv putenv setenv secure-getenv \
+ exit on_exit atexit cxa_atexit cxa_finalize old_atexit \
+ quick_exit at_quick_exit cxa_at_quick_exit cxa_thread_atexit_impl \
+ abs labs llabs \
+ div ldiv lldiv \
+ mblen mbstowcs mbtowc wcstombs wctomb \
+ arc4random arc4random_uniform \
+ random random_r rand rand_r \
+ drand48 erand48 lrand48 nrand48 mrand48 jrand48 \
+ srand48 seed48 lcong48 \
+ drand48_r erand48_r lrand48_r nrand48_r mrand48_r jrand48_r \
+ srand48_r seed48_r lcong48_r \
+ drand48-iter getrandom getentropy \
+ strfromf strfromd strfroml \
+ strtol strtoul strtoll strtoull \
+ strtol_l strtoul_l strtoll_l strtoull_l \
+ strtof strtod strtold \
+ strtof_l strtod_l strtold_l \
+ strtof_nan strtod_nan strtold_nan \
+ system canonicalize \
+ stdc_bit_ceil_uc stdc_bit_ceil_ui stdc_bit_ceil_ul \
+ stdc_bit_ceil_ull stdc_bit_ceil_us stdc_bit_floor_uc \
+ stdc_bit_floor_ui stdc_bit_floor_ul stdc_bit_floor_ull \
+ stdc_bit_floor_us stdc_bit_width_uc stdc_bit_width_ui \
+ stdc_bit_width_ul stdc_bit_width_ull stdc_bit_width_us \
+ stdc_count_ones_uc stdc_count_ones_ui stdc_count_ones_ul \
+ stdc_count_ones_ull stdc_count_ones_us stdc_count_zeros_uc \
+ stdc_count_zeros_ui stdc_count_zeros_ul stdc_count_zeros_ull \
+ stdc_count_zeros_us stdc_first_leading_one_uc stdc_first_leading_one_ui \
+ stdc_first_leading_one_ul stdc_first_leading_one_ull stdc_first_leading_one_us \
+ stdc_first_leading_zero_uc stdc_first_leading_zero_ui stdc_first_leading_zero_ul \
+ stdc_first_leading_zero_ull stdc_first_leading_zero_us stdc_first_trailing_one_uc \
+ stdc_first_trailing_one_ui stdc_first_trailing_one_ul stdc_first_trailing_one_ull \
+ stdc_first_trailing_one_us stdc_first_trailing_zero_uc stdc_first_trailing_zero_ui \
+ stdc_first_trailing_zero_ul stdc_first_trailing_zero_ull stdc_first_trailing_zero_us \
+ stdc_has_single_bit_uc stdc_has_single_bit_ui stdc_has_single_bit_ul \
+ stdc_has_single_bit_ull stdc_has_single_bit_us stdc_leading_ones_uc \
+ stdc_leading_ones_ui stdc_leading_ones_ul stdc_leading_ones_ull \
+ stdc_leading_ones_us stdc_leading_zeros_uc stdc_leading_zeros_ui \
+ stdc_leading_zeros_ul stdc_leading_zeros_ull stdc_leading_zeros_us \
+ stdc_trailing_ones_uc stdc_trailing_ones_ui stdc_trailing_ones_ul \
+ stdc_trailing_ones_ull stdc_trailing_ones_us stdc_trailing_zeros_uc \
+ stdc_trailing_zeros_ui stdc_trailing_zeros_ul stdc_trailing_zeros_ull \
+ stdc_trailing_zeros_us \
+ a64l l64a \
+ rpmatch strfmon strfmon_l getsubopt xpg_basename fmtmsg \
+ getcontext setcontext makecontext swapcontext
# Exclude fortified routines from being built with _FORTIFY_SOURCE
routines_no_fortify += \
--
2.37.3
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-04-08 2:49 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-01 11:47 [PATCH] stdlib: reorganize stdlib Makefile routines by functionality abush wang
2024-04-01 13:12 ` Florian Weimer
2024-04-01 13:17 ` H.J. Lu
2024-04-01 13:46 ` Adhemerval Zanella Netto
2024-04-02 3:54 ` abush wang
2024-04-08 2:48 ` abush wang
2024-04-02 2:17 ` abush wang
2024-04-02 2:28 ` abush wang
2024-04-02 3:13 ` H.J. Lu
2024-04-02 6:18 ` abush wang
2024-04-02 14:15 ` Adhemerval Zanella Netto
2024-04-03 1:57 ` abush wang
-- strict thread matches above, loose matches on Subject: below --
2024-04-01 11:44 abushwang
2024-04-01 12:03 ` Xi Ruoyao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).