* RFC: Patch for gcc-5/changes.html
@ 2015-03-26 19:25 Vladimir Makarov
2015-03-26 19:54 ` Jan Hubicka
2015-03-26 20:39 ` Sandra Loosemore
0 siblings, 2 replies; 4+ messages in thread
From: Vladimir Makarov @ 2015-03-26 19:25 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 258 bytes --]
Hi, I neglected to write about RA changes for the previous releases
and people asked me to write about RA changes for GCC-5. So here is what
I'd like to add to gcc-5/changes.html. I'll do it tomorrow. So any
comments will be appreciated.
Thanks.
[-- Attachment #2: changes.patch --]
[-- Type: text/x-patch, Size: 2678 bytes --]
Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.91
diff -U 5 -r1.91 changes.html
--- changes.html 23 Mar 2015 10:12:23 -0000 1.91
+++ changes.html 26 Mar 2015 19:24:32 -0000
@@ -95,10 +95,40 @@
<li>The new <code>gcov-tool</code> utility allows manipulating
profiles.</li>
<li>Profiles are now more tolerant to source file changes (this can be
controlled by <code>--param profile-func-internal-id</code>).</li>
</ul></li>
+ <li>Register allocation improvements:
+ <ul>
+ <li>A new local register allocator (LRA) sub-pass was added.
+ The sub-pass implements control-flow sensitive global
+ register rematerialization (controlled via
+ <code>-flra-remat</code>). Instead of spilling and
+ restoring register value, it is recalculated if it is
+ profitable. The sub-pass improved SPEC2000 generated code
+ by 1% and 0.5% correspondingly on ARM and x86-64.</li>
+ <li>In GCC-4.9 and earlier releases PIC hard register was fixed
+ and was not used for other purposes when PIC code was
+ generated. Reuse of PIC hard register was implemented in RA
+ for GCC-5.0. It improves generated PIC code performance as
+ more hard registers can be used. As an example, shared
+ libraries and OS Android would significantly benefit from
+ such optimization. Currently it is switched on only for
+ x86/x86-64 targets. As RA infrastructure is already
+ implemented for PIC register reuse, other targets might
+ follow this in the future.</li>
+ <li>A simple form of inter-procedural RA was implemented. When
+ it is known that a called function does not use caller saved
+ registers, save/restore code is not generated around the
+ call for such registers. This optimization can be controlled
+ by <code>-fipa-ra</code></li>
+ <li>On some architectures (e.g. modern Intel processors),
+ spilling general registers into vector registers can be more
+ profitable than spilling into memory. LRA had already such
+ optimization. It was significantly improved for GCC-5.0,
+ permitting more 85% such spills than in GCC-4.9.</li>
+ </ul></li>
<li>UndefinedBehaviorSanitizer gained a few new sanitization options:
<ul>
<li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point
division by zero;</li>
<li><code>-fsanitize=float-cast-overflow</code>: check that the result
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: Patch for gcc-5/changes.html
2015-03-26 19:25 RFC: Patch for gcc-5/changes.html Vladimir Makarov
@ 2015-03-26 19:54 ` Jan Hubicka
2015-03-26 20:39 ` Sandra Loosemore
1 sibling, 0 replies; 4+ messages in thread
From: Jan Hubicka @ 2015-03-26 19:54 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc-patches
> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed
> + and was not used for other purposes when PIC code was
> + generated. Reuse of PIC hard register was implemented in RA
> + for GCC-5.0. It improves generated PIC code performance as
> + more hard registers can be used. As an example, shared
> + libraries and OS Android would significantly benefit from
> + such optimization. Currently it is switched on only for
> + x86/x86-64 targets. As RA infrastructure is already
> + implemented for PIC register reuse, other targets might
> + follow this in the future.</li>
PIC was always huge performance disaster on x86, I suppose with this patch in,
-fPIC benchmarks should improve quite noticeably ;)
Thanks!
Honza
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: Patch for gcc-5/changes.html
2015-03-26 19:25 RFC: Patch for gcc-5/changes.html Vladimir Makarov
2015-03-26 19:54 ` Jan Hubicka
@ 2015-03-26 20:39 ` Sandra Loosemore
2015-03-26 20:56 ` Vladimir Makarov
1 sibling, 1 reply; 4+ messages in thread
From: Sandra Loosemore @ 2015-03-26 20:39 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc-patches
On 03/26/2015 01:25 PM, Vladimir Makarov wrote:
> Hi, I neglected to write about RA changes for the previous releases
> and people asked me to write about RA changes for GCC-5. So here is what
> I'd like to add to gcc-5/changes.html. I'll do it tomorrow. So any
> comments will be appreciated.
>
> Index: changes.html
> ===================================================================
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
> retrieving revision 1.91
> diff -U 5 -r1.91 changes.html
> --- changes.html 23 Mar 2015 10:12:23 -0000 1.91
> +++ changes.html 26 Mar 2015 19:24:32 -0000
> @@ -95,10 +95,40 @@
> <li>The new <code>gcov-tool</code> utility allows manipulating
> profiles.</li>
> <li>Profiles are now more tolerant to source file changes (this can be
> controlled by <code>--param profile-func-internal-id</code>).</li>
> </ul></li>
> + <li>Register allocation improvements:
> + <ul>
> + <li>A new local register allocator (LRA) sub-pass was added.
> + The sub-pass implements control-flow sensitive global
> + register rematerialization (controlled via
> + <code>-flra-remat</code>). Instead of spilling and
How about rewriting the first two sentences as:
A new local register allocator (LRA) sub-pass, controlled by
<code>-flra-remat</code>, implements control-flow sensitive global
register materialization.
> + restoring register value, it is recalculated if it is
s/register value/a register value/
> + profitable. The sub-pass improved SPEC2000 generated code
> + by 1% and 0.5% correspondingly on ARM and x86-64.</li>
> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed
> + and was not used for other purposes when PIC code was
> + generated. Reuse of PIC hard register was implemented in RA
> + for GCC-5.0. It improves generated PIC code performance as
> + more hard registers can be used. As an example, shared
> + libraries and OS Android would significantly benefit from
> + such optimization. Currently it is switched on only for
> + x86/x86-64 targets. As RA infrastructure is already
> + implemented for PIC register reuse, other targets might
> + follow this in the future.</li>
How about making this less verbose and repetitive:
Reuse of the PIC hard register, instead of using a fixed register, was
implemented on x86/x86-64 targets. This improves generated PIC code
performance as more hard registers can be used. Shared libraries can
significantly benefit from this optimization.
> + <li>A simple form of inter-procedural RA was implemented. When
> + it is known that a called function does not use caller saved
s/caller saved/caller-saved/
> + registers, save/restore code is not generated around the
> + call for such registers. This optimization can be controlled
> + by <code>-fipa-ra</code></li>
> + <li>On some architectures (e.g. modern Intel processors),
> + spilling general registers into vector registers can be more
> + profitable than spilling into memory. LRA had already such
> + optimization. It was significantly improved for GCC-5.0,
> + permitting more 85% such spills than in GCC-4.9.</li>
I don't understand the last sentence. How about just dropping that 85%
bit and making the whole thing less wordy and more focused on the actual
improvement:
LRA is now much more effective at generating spills of general registers
into vector registers instead of memory on architectures (e.g., modern
Intel processors) where this is profitable.
> + </ul></li>
> <li>UndefinedBehaviorSanitizer gained a few new sanitization options:
> <ul>
> <li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point
> division by zero;</li>
> <li><code>-fsanitize=float-cast-overflow</code>: check that the result
-Sandra
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: Patch for gcc-5/changes.html
2015-03-26 20:39 ` Sandra Loosemore
@ 2015-03-26 20:56 ` Vladimir Makarov
0 siblings, 0 replies; 4+ messages in thread
From: Vladimir Makarov @ 2015-03-26 20:56 UTC (permalink / raw)
To: Sandra Loosemore; +Cc: gcc-patches
On 03/26/2015 04:38 PM, Sandra Loosemore wrote:
> On 03/26/2015 01:25 PM, Vladimir Makarov wrote:
>> Hi, I neglected to write about RA changes for the previous releases
>> and people asked me to write about RA changes for GCC-5. So here is what
>> I'd like to add to gcc-5/changes.html. I'll do it tomorrow. So any
>> comments will be appreciated.
>>
>> Index: changes.html
>> ===================================================================
>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
>> retrieving revision 1.91
>> diff -U 5 -r1.91 changes.html
>> --- changes.html 23 Mar 2015 10:12:23 -0000 1.91
>> +++ changes.html 26 Mar 2015 19:24:32 -0000
>> @@ -95,10 +95,40 @@
>> <li>The new <code>gcov-tool</code> utility allows manipulating
>> profiles.</li>
>> <li>Profiles are now more tolerant to source file changes
>> (this can be
>> controlled by <code>--param
>> profile-func-internal-id</code>).</li>
>> </ul></li>
>> + <li>Register allocation improvements:
>> + <ul>
>> + <li>A new local register allocator (LRA) sub-pass was added.
>> + The sub-pass implements control-flow sensitive global
>> + register rematerialization (controlled via
>> + <code>-flra-remat</code>). Instead of spilling and
>
> How about rewriting the first two sentences as:
>
> A new local register allocator (LRA) sub-pass, controlled by
> <code>-flra-remat</code>, implements control-flow sensitive global
> register materialization.
>
That is better. Thanks.
>> + restoring register value, it is recalculated if it is
>
> s/register value/a register value/
>
Fixed.
>> + profitable. The sub-pass improved SPEC2000 generated code
>> + by 1% and 0.5% correspondingly on ARM and x86-64.</li>
>> + <li>In GCC-4.9 and earlier releases PIC hard register was fixed
>> + and was not used for other purposes when PIC code was
>> + generated. Reuse of PIC hard register was implemented in RA
>> + for GCC-5.0. It improves generated PIC code performance as
>> + more hard registers can be used. As an example, shared
>> + libraries and OS Android would significantly benefit from
>> + such optimization. Currently it is switched on only for
>> + x86/x86-64 targets. As RA infrastructure is already
>> + implemented for PIC register reuse, other targets might
>> + follow this in the future.</li>
>
> How about making this less verbose and repetitive:
>
> Reuse of the PIC hard register, instead of using a fixed register, was
> implemented on x86/x86-64 targets. This improves generated PIC code
> performance as more hard registers can be used. Shared libraries can
> significantly benefit from this optimization.
Fixed. Thanks.
>
>> + <li>A simple form of inter-procedural RA was implemented. When
>> + it is known that a called function does not use caller saved
>
> s/caller saved/caller-saved/
>
Done.
>> + registers, save/restore code is not generated around the
>> + call for such registers. This optimization can be controlled
>> + by <code>-fipa-ra</code></li>
>> + <li>On some architectures (e.g. modern Intel processors),
>> + spilling general registers into vector registers can be more
>> + profitable than spilling into memory. LRA had already such
>> + optimization. It was significantly improved for GCC-5.0,
>> + permitting more 85% such spills than in GCC-4.9.</li>
>
> I don't understand the last sentence. How about just dropping that
> 85% bit and making the whole thing less wordy and more focused on the
> actual improvement:
>
> LRA is now much more effective at generating spills of general
> registers into vector registers instead of memory on architectures
> (e.g., modern Intel processors) where this is profitable.
>
Ok. Fixed.
Thanks, Sandra. That was very helpful as English is not my native language.
>> + </ul></li>
>> <li>UndefinedBehaviorSanitizer gained a few new sanitization
>> options:
>> <ul>
>> <li><code>-fsanitize=float-divide-by-zero</code>: detect floating-point
>> division by zero;</li>
>> <li><code>-fsanitize=float-cast-overflow</code>: check that the result
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-03-26 20:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26 19:25 RFC: Patch for gcc-5/changes.html Vladimir Makarov
2015-03-26 19:54 ` Jan Hubicka
2015-03-26 20:39 ` Sandra Loosemore
2015-03-26 20:56 ` Vladimir Makarov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).