public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Re: minor patch to improve gprofng performance (re: Bug 30898)
       [not found] <8a8c5ffa-f15d-c7a6-ea64-9afe3d42bdb1@gnu.org>
@ 2023-09-28  2:47 ` Vladimir Mezentsev
  2024-01-12 22:31   ` Simon Sobisch
  0 siblings, 1 reply; 2+ messages in thread
From: Vladimir Mezentsev @ 2023-09-28  2:47 UTC (permalink / raw)
  To: Simon Sobisch; +Cc: binutils

hi Simon,
Thank you for your report.
See comments below.

On 9/26/23 04:25, Simon Sobisch wrote:
> Inspecting bug #30898 [1] showed that there is an issue when using the 
> disassembly option with huge (generated) functions.
>
> I gave this a test and found, via
>
>   perf record -o perf.data.gpdisplay --call-graph dwarf,38192 --aio -z \
>     --sample-cpu --mmap-pages 16M \
>   gprofng display text -name short:soname -metrics e.%totalcpu:name \
>     -disasm prog_ test.1.er > /dev/null
>
> That the problem is the disassembly handling.
> Checking the generated perf recording shows that the **burning hot** 
> place is DbeInstr::mapPCtoLine(SourceFile*), called by 
> Module::set_dis_data(Function*, int, int, int, bool, bool, int); 
> taking more than 93.3% of all instructions.

This is a little surprise for me.
It's likely that gcc inlines functions and generates Dwarf that gprofng 
interprets poorly.

If you run:
   gprofng display src -dis prog_ 
<YOUR_EXECUTION_OR_LIBRARY_WHERE_FUNC_IS_LOCATED>
Do you see the same performance problem ?
If yes, may I get this binary ?

>
> Running that took around 5 minutes. Redirecting the output to a file 
> leads to a file with 4,124,497 lines, so: this _really_ is about huge 
> disassembly.


  I generated the big function (~ 1000000 lines). The  disassembly is  
10000037 lines.  This took 38 sec.
But my test is trivial and gcc generates a trivial Dwarf.


>
> I've tinkered a bit with the burning hot function, the result is a 
> minor decrease when using C++2017 invalid code, you find it in the 
> attached patch.
>
> Also attached is the recorded output for the hot function, 
> interestingly the patched version showed quite clearly that over 60 %  
> of the complete run's cpu instructions goes to Hist_data.cc line 1380:
>
>       if (p->level == 0)
>
>
> For huge (GnuCOBOL) generated functions the attached patch drops the 
> perf stat reported counters by 10%.
> Reported counters (median of 3 runs - code generated with default 
> options -O2 -g using g++ (GCC) 11.3) - are as follows:
>
> Original version:
>
>         270,060.53 msec task-clock             #    0.999 CPUs utilized
>  1,023,551,049,245      cycles                 #    3.790 GHz
>  2,160,049,675,779      instructions           #    2.11  insn per cycle
>
> adjusted version using the C++2017 removed "register" storage class 
> specifier for the pointer (there is possibly a better way), decreasing 
> everything:
>
>         260,284.41 msec task-clock             #    0.999 CPUs utilized
>    986,393,903,158      cycles                 #    3.790 GHz
>  1,815,443,360,713      instructions           #    1.84  insn per cycle
>
> adjusted version that abides to C++2017, only instructions decreased:
>
>         280,430.13 msec task-clock             #    0.999 CPUs utilized
>  1,062,479,713,621      cycles                 #    3.789 GHz
>  1,815,698,269,043      instructions           #    1.71  insn per cycle
>
>
>
> Along to this change a short-term _option_ to drop most of those 60% 
> (and, if it drops the amount of entries in there, a good portion of 
> walking the pointers) could be to have a copy of func->inlinedSubr 
> _once_ that _only_ contains level 0 entries.
>
>
> But in the long-term it seems more reasonable to recheck if that 
> function should be rewritten/replaced for better supporting "huge 
> disassembly".
>
>
>
> Another note: the reserved memory use for gp-display-text topped 1.6 
> GB, there may be a way to improve that, too.

It is not normal.
It looks like  gprofng generates always a new DbeLine in 
DbeInstr::mapPCtoLine().

-Vladimir

>
>
> [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=30898


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: minor patch to improve gprofng performance (re: Bug 30898)
  2023-09-28  2:47 ` minor patch to improve gprofng performance (re: Bug 30898) Vladimir Mezentsev
@ 2024-01-12 22:31   ` Simon Sobisch
  0 siblings, 0 replies; 2+ messages in thread
From: Simon Sobisch @ 2024-01-12 22:31 UTC (permalink / raw)
  To: Vladimir Mezentsev; +Cc: binutils


Am 28.09.2023 um 04:47 schrieb Vladimir Mezentsev:
> hi Simon,
> Thank you for your report.

Sure, just checking back now.

> See comments below.
> 
> On 9/26/23 04:25, Simon Sobisch wrote:
>> Inspecting bug #30898 [1] showed that there is an issue when using the 
>> disassembly option with huge (generated) functions.
>>
>> I gave this a test and found, via
>>
>>   perf record -o perf.data.gpdisplay --call-graph dwarf,38192 --aio -z \
>>     --sample-cpu --mmap-pages 16M \
>>   gprofng display text -name short:soname -metrics e.%totalcpu:name \
>>     -disasm prog_ test.1.er > /dev/null
>>
>> That the problem is the disassembly handling.
>> Checking the generated perf recording shows that the **burning hot** 
>> place is DbeInstr::mapPCtoLine(SourceFile*), called by 
>> Module::set_dis_data(Function*, int, int, int, bool, bool, int); 
>> taking more than 93.3% of all instructions.
> 
> This is a little surprise for me.
> It's likely that gcc inlines functions and generates Dwarf that gprofng 
> interprets poorly.

Is there a bug open on this?
If not can you please do so?

> 
> If you run:
>    gprofng display src -dis prog_ 
> <YOUR_EXECUTION_OR_LIBRARY_WHERE_FUNC_IS_LOCATED>
> Do you see the same performance problem ?
> If yes, may I get this binary ?
> 
>>
>> Running that took around 5 minutes. Redirecting the output to a file 
>> leads to a file with 4,124,497 lines, so: this _really_ is about huge 
>> disassembly.
> 
> 
>   I generated the big function (~ 1000000 lines). The  disassembly is 
> 10,000,037 lines.  This took 38 sec.
> But my test is trivial and gcc generates a trivial Dwarf.

Sadly I don't have access to that environment any more and also not to 
the exact setup, but I could try with a similar one if this is likely to 
help. But this would likely to be more reasonable if the two points 
below are inspected first.

>> I've tinkered a bit with the burning hot function, the result is a 
>> minor decrease when using C++2017 invalid code, you find it in the 
>> attached patch.
>>
>> Also attached is the recorded output for the hot function, 
>> interestingly the patched version showed quite clearly that over 60 % 
>> of the complete run's cpu instructions goes to Hist_data.cc line 1380:
>>
>>       if (p->level == 0)
>>
>>
>> For huge (GnuCOBOL) generated functions the attached patch drops the 
>> perf stat reported counters by 10%.
>>
>> [...]

Is there anything to be happen with that patch?

>>
>> Along to this change a short-term _option_ to drop most of those 60% 
>> (and, if it drops the amount of entries in there, a good portion of 
>> walking the pointers) could be to have a copy of func->inlinedSubr 
>> _once_ that _only_ contains level 0 entries.

At least from reading that nearly 4 months later again, that sounds like 
an easy to implement change with a huge benefit, no?

>> But in the long-term it seems more reasonable to recheck if that 
>> function should be rewritten/replaced for better supporting "huge 
>> disassembly".
>>
>>
>> Another note: the reserved memory use for gp-display-text topped 1.6 
>> GB, there may be a way to improve that, too.
> 
> It is not normal.
> It looks like  gprofng generates always a new DbeLine in 
> DbeInstr::mapPCtoLine().

Is this happen to be tracked with a separate bug, or possibly already 
solved?

> -Vladimir

Thank you for working on gprofng,
Simon

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-01-12 22:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8a8c5ffa-f15d-c7a6-ea64-9afe3d42bdb1@gnu.org>
2023-09-28  2:47 ` minor patch to improve gprofng performance (re: Bug 30898) Vladimir Mezentsev
2024-01-12 22:31   ` Simon Sobisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).