public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Set inline-unit-growth to 40
@ 2019-01-12 18:32 Jan Hubicka
  2019-01-14 16:28 ` Christophe Lyon
  2019-01-14 16:53 ` Qing Zhao
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Hubicka @ 2019-01-12 18:32 UTC (permalink / raw)
  To: gcc-patches

Hello,
this patch sets inline-unit-growth to 40.  The performance changes are
- Firefox, LTO
  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f7bd026e1a931b9a284d1c85c2577a72dd592820&newProject=try&newRevision=74889968abcc688b8d161863566ed273c0401ee4&framework=1&filter=opt&showOnlyComparable=1&showOnlyImportant=1
  After fixes to inlining priorities this makes difference without
  profile feedback only.

  Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
  feedback.
- Firefox noLTO
  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c902b72340a3dca3114f58578c1c8f3e6a1cd89c&newProject=try&newRevision=4974da6f92c144a9c09765b56a564a640069ddb9&framework=1&showOnlyComparable=1&showOnlyImportant=1
  With about 7% code size growth
- SPEC
  https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
- C++ benchmarks
  https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10&all_changes=on&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f

I am not entirely happy about the code-size/performance tradeoffs but it
is concerned only for programs built with -O3 or having too many inline
keywords.  I have looked into inlining decisions for Firefox, HHVM and
Clang and inliner gets out of growt bounds way too early and some of
more performance aware projects already sets the limit up.

I will tune other metrics down to handle some of the code size problems.

Honza

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 267882)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2019-01-05  Jan Hubicka  <hubicka@ucw.cz>
+
+	* params.def (inline-unit-growth): Set to 40.
+
 2019-01-12  Jakub Jelinek  <jakub@redhat.com>
 
 	* tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
Index: params.def
===================================================================
--- params.def	(revision 267882)
+++ params.def	(working copy)
@@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
 DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
 	 "inline-unit-growth",
 	 "How much can given compilation unit grow because of the inlining (in percent).",
-	 20, 0, 0)
+	 40, 0, 0)
 DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 	 "ipcp-unit-growth",
 	 "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-12 18:32 Set inline-unit-growth to 40 Jan Hubicka
@ 2019-01-14 16:28 ` Christophe Lyon
  2019-01-14 16:40   ` Jan Hubicka
  2019-01-14 16:53 ` Qing Zhao
  1 sibling, 1 reply; 7+ messages in thread
From: Christophe Lyon @ 2019-01-14 16:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc Patches

Hi Honza,

On Sat, 12 Jan 2019 at 19:32, Jan Hubicka <hubicka@ucw.cz> wrote:
>
> Hello,
> this patch sets inline-unit-growth to 40.  The performance changes are
> - Firefox, LTO
>   https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f7bd026e1a931b9a284d1c85c2577a72dd592820&newProject=try&newRevision=74889968abcc688b8d161863566ed273c0401ee4&framework=1&filter=opt&showOnlyComparable=1&showOnlyImportant=1
>   After fixes to inlining priorities this makes difference without
>   profile feedback only.
>
>   Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
>   feedback.
> - Firefox noLTO
>   https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c902b72340a3dca3114f58578c1c8f3e6a1cd89c&newProject=try&newRevision=4974da6f92c144a9c09765b56a564a640069ddb9&framework=1&showOnlyComparable=1&showOnlyImportant=1
>   With about 7% code size growth
> - SPEC
>   https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> - C++ benchmarks
>   https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10&all_changes=on&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
>
> I am not entirely happy about the code-size/performance tradeoffs but it
> is concerned only for programs built with -O3 or having too many inline
> keywords.  I have looked into inlining decisions for Firefox, HHVM and
> Clang and inliner gets out of growt bounds way too early and some of
> more performance aware projects already sets the limit up.
>
> I will tune other metrics down to handle some of the code size problems.
>
> Honza
>
> Index: ChangeLog
> ===================================================================
> --- ChangeLog   (revision 267882)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,7 @@
> +2019-01-05  Jan Hubicka  <hubicka@ucw.cz>
> +
> +       * params.def (inline-unit-growth): Set to 40.
> +
>  2019-01-12  Jakub Jelinek  <jakub@redhat.com>
>
>         * tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
> Index: params.def
> ===================================================================
> --- params.def  (revision 267882)
> +++ params.def  (working copy)
> @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
>  DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>          "inline-unit-growth",
>          "How much can given compilation unit grow because of the inlining (in percent).",
> -        20, 0, 0)
> +        40, 0, 0)
>  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>          "ipcp-unit-growth",
>          "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",

This patch introduces a regression in libstdc++:
FAIL: ext/pb_ds/regression/list_update_map_rand.cc execution test
on a few arm targets.

For instance:
arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a5
--with-fpu vfpv3-d16-fp16

Using --with-mode thumb and the same other configure options makes the
test pass.
I'm seeing this with other configurations --with-mode arm and
--with-fpu vfp* (as opposed to neon*)

The .log file has:
<?xml version = "1.0"?>
<test>
<sd value = "1547347280">
</sd>
<n value = "50">
</n>
<m value = "10">
</m>
<tp value = "0.2">
</tp>
<ip value = "0.6">
</ip>
<ep value = "0.2">
</ep>
<cp value = "0.001">
</cp>
<mp value = "0.25">
</mp>
</test>
<cntnr name = "lu_mtf_map">
<desc>
<type value = "list_update">
<Update_Policy value = "lu_move_to_front_policy">
</Update_Policy>
</type>
</desc>
<progress>
----------------------------------------
************qemu: uncaught target signal 11 (Segmentation fault) - core dumped

The (incomplete?) qemu execution trace ends with:
----------------
IN:
0x40ada6b8:  e5910008  ldr      r0, [r1, #8]
0x40ada6bc:  e1560000  cmp      r6, r0
0x40ada6c0:  1a00004f  bne      #0x40ada804
----------------
IN:
0x40ada6c4:  e5960004  ldr      r0, [r6, #4]
0x40ada6c8:  e582100c  str      r1, [r2, #0xc]
0x40ada6cc:  e3500c02  cmp      r0, #0x200
0x40ada6d0:  e5812008  str      r2, [r1, #8]
0x40ada6d4:  3a000002  blo      #0x40ada6e4
----------------
IN:
0x40adb880:  eaffff3e  b        #0x40adb580
----------------
IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
0x0001ffc4:  e3a06000  mov      r6, #0
0x0001ffc8:  eaffff88  b        #0x1fdf0
----------------
IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
0x0001fdf0:  ee180a10  vmov     r0, s16
0x0001fdf4:  ebffcd12  bl       #0x13244

Christophe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-14 16:28 ` Christophe Lyon
@ 2019-01-14 16:40   ` Jan Hubicka
  2019-01-17 13:22     ` Christophe Lyon
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2019-01-14 16:40 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches

Hello,
> > Index: params.def
> > ===================================================================
> > --- params.def  (revision 267882)
> > +++ params.def  (working copy)
> > @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> >  DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
> >          "inline-unit-growth",
> >          "How much can given compilation unit grow because of the inlining (in percent).",
> > -        20, 0, 0)
> > +        40, 0, 0)
> >  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
> >          "ipcp-unit-growth",
> >          "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",
> 
> This patch introduces a regression in libstdc++:
> FAIL: ext/pb_ds/regression/list_update_map_rand.cc execution test
> on a few arm targets.
> 
> For instance:
> arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a5
> --with-fpu vfpv3-d16-fp16

Adjusting inliner heuiristics should not trigger correcness issues, so
this seems like a bug that was previously latent.  I guess it may
legally break correct code only if stack usage gets too large.

Do you have any idea what breaks in this testcase?

Honza
> 
> Using --with-mode thumb and the same other configure options makes the
> test pass.
> I'm seeing this with other configurations --with-mode arm and
> --with-fpu vfp* (as opposed to neon*)
> 
> The .log file has:
> <?xml version = "1.0"?>
> <test>
> <sd value = "1547347280">
> </sd>
> <n value = "50">
> </n>
> <m value = "10">
> </m>
> <tp value = "0.2">
> </tp>
> <ip value = "0.6">
> </ip>
> <ep value = "0.2">
> </ep>
> <cp value = "0.001">
> </cp>
> <mp value = "0.25">
> </mp>
> </test>
> <cntnr name = "lu_mtf_map">
> <desc>
> <type value = "list_update">
> <Update_Policy value = "lu_move_to_front_policy">
> </Update_Policy>
> </type>
> </desc>
> <progress>
> ----------------------------------------
> ************qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> 
> The (incomplete?) qemu execution trace ends with:
> ----------------
> IN:
> 0x40ada6b8:  e5910008  ldr      r0, [r1, #8]
> 0x40ada6bc:  e1560000  cmp      r6, r0
> 0x40ada6c0:  1a00004f  bne      #0x40ada804
> ----------------
> IN:
> 0x40ada6c4:  e5960004  ldr      r0, [r6, #4]
> 0x40ada6c8:  e582100c  str      r1, [r2, #0xc]
> 0x40ada6cc:  e3500c02  cmp      r0, #0x200
> 0x40ada6d0:  e5812008  str      r2, [r1, #8]
> 0x40ada6d4:  3a000002  blo      #0x40ada6e4
> ----------------
> IN:
> 0x40adb880:  eaffff3e  b        #0x40adb580
> ----------------
> IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
> 0x0001ffc4:  e3a06000  mov      r6, #0
> 0x0001ffc8:  eaffff88  b        #0x1fdf0
> ----------------
> IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
> 0x0001fdf0:  ee180a10  vmov     r0, s16
> 0x0001fdf4:  ebffcd12  bl       #0x13244
> 
> Christophe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-12 18:32 Set inline-unit-growth to 40 Jan Hubicka
  2019-01-14 16:28 ` Christophe Lyon
@ 2019-01-14 16:53 ` Qing Zhao
  2019-01-15 15:14   ` Jan Hubicka
  1 sibling, 1 reply; 7+ messages in thread
From: Qing Zhao @ 2019-01-14 16:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

Hi, Honza,

in addition to the code size problems, there are several runtime regression for the SPEC: (If I read the table correctly, if not, let me know)

SPEC/SPEC2006/INT/483.xalancbmk <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=183.290.0>	146.131	4.89%

SPEC/SPEC2006/FP/436.cactusADM <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=171.100.0>	130.967	8.07%	

SPEC/SPEC2006/FP/435.gromacs <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=177.90.0>	182.555	11.73%	

SPEC/SPEC2017/INT/541.leela_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=174.397.0>	452.333	4.17%	

SPEC/SPEC2017/INT/520.omnetpp_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=172.357.0>	395.582	4.98%	

do we have plan to study and fix these run-time regression?

thanks.

Qing

> On Jan 12, 2019, at 12:32 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> Hello,
> this patch sets inline-unit-growth to 40.  The performance changes are
> - Firefox, LTO
>  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f7bd026e1a931b9a284d1c85c2577a72dd592820&newProject=try&newRevision=74889968abcc688b8d161863566ed273c0401ee4&framework=1&filter=opt&showOnlyComparable=1&showOnlyImportant=1
>  After fixes to inlining priorities this makes difference without
>  profile feedback only.
> 
>  Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
>  feedback.
> - Firefox noLTO
>  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c902b72340a3dca3114f58578c1c8f3e6a1cd89c&newProject=try&newRevision=4974da6f92c144a9c09765b56a564a640069ddb9&framework=1&showOnlyComparable=1&showOnlyImportant=1
>  With about 7% code size growth
> - SPEC
>  https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> - C++ benchmarks
>  https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10&all_changes=on&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> 
> I am not entirely happy about the code-size/performance tradeoffs but it
> is concerned only for programs built with -O3 or having too many inline
> keywords.  I have looked into inlining decisions for Firefox, HHVM and
> Clang and inliner gets out of growt bounds way too early and some of
> more performance aware projects already sets the limit up.
> 
> I will tune other metrics down to handle some of the code size problems.
> 
> Honza
> 
> Index: ChangeLog
> ===================================================================
> --- ChangeLog	(revision 267882)
> +++ ChangeLog	(working copy)
> @@ -1,3 +1,7 @@
> +2019-01-05  Jan Hubicka  <hubicka@ucw.cz>
> +
> +	* params.def (inline-unit-growth): Set to 40.
> +
> 2019-01-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
> Index: params.def
> ===================================================================
> --- params.def	(revision 267882)
> +++ params.def	(working copy)
> @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
> 	 "inline-unit-growth",
> 	 "How much can given compilation unit grow because of the inlining (in percent).",
> -	 20, 0, 0)
> +	 40, 0, 0)
> DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
> 	 "ipcp-unit-growth",
> 	 "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-14 16:53 ` Qing Zhao
@ 2019-01-15 15:14   ` Jan Hubicka
  2019-01-15 16:18     ` Qing Zhao
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Hubicka @ 2019-01-15 15:14 UTC (permalink / raw)
  To: Qing Zhao; +Cc: gcc-patches

> Hi, Honza,
> 
> in addition to the code size problems, there are several runtime regression for the SPEC: (If I read the table correctly, if not, let me know)
> 
> SPEC/SPEC2006/INT/483.xalancbmk <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=183.290.0>	146.131	4.89%

This test seems to be just a noise, if you look on the mainline plots
there is no noticeable regression and you can see differences +- 4% in
last 10 runs.
> 
> SPEC/SPEC2006/FP/436.cactusADM <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=171.100.0>	130.967	8.07%	
> 
> SPEC/SPEC2017/INT/520.omnetpp_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=172.357.0>	395.582	4.98%	

Here you can see in the graph both boxes to be yellow which means that
binary did not changed nor the size changed. It is just measurement error it seems.
> 
> SPEC/SPEC2006/FP/435.gromacs <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=177.90.0>	182.555	11.73%	

I plan to check on gromacs
This is LTO+PGO which will be rerun only in day or two so I will see if
the same regression stays on mainline.
> 
> SPEC/SPEC2017/INT/541.leela_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=174.397.0>	452.333	4.17%	

This was already taken by daily testers and regression does not
reproduce, so it seems to be noise too.

Honza
> 
> do we have plan to study and fix these run-time regression?

> 
> thanks.
> 
> Qing
> 
> > On Jan 12, 2019, at 12:32 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> > 
> > Hello,
> > this patch sets inline-unit-growth to 40.  The performance changes are
> > - Firefox, LTO
> >  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f7bd026e1a931b9a284d1c85c2577a72dd592820&newProject=try&newRevision=74889968abcc688b8d161863566ed273c0401ee4&framework=1&filter=opt&showOnlyComparable=1&showOnlyImportant=1
> >  After fixes to inlining priorities this makes difference without
> >  profile feedback only.
> > 
> >  Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
> >  feedback.
> > - Firefox noLTO
> >  https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c902b72340a3dca3114f58578c1c8f3e6a1cd89c&newProject=try&newRevision=4974da6f92c144a9c09765b56a564a640069ddb9&framework=1&showOnlyComparable=1&showOnlyImportant=1
> >  With about 7% code size growth
> > - SPEC
> >  https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> > - C++ benchmarks
> >  https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10&all_changes=on&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> > 
> > I am not entirely happy about the code-size/performance tradeoffs but it
> > is concerned only for programs built with -O3 or having too many inline
> > keywords.  I have looked into inlining decisions for Firefox, HHVM and
> > Clang and inliner gets out of growt bounds way too early and some of
> > more performance aware projects already sets the limit up.
> > 
> > I will tune other metrics down to handle some of the code size problems.
> > 
> > Honza
> > 
> > Index: ChangeLog
> > ===================================================================
> > --- ChangeLog	(revision 267882)
> > +++ ChangeLog	(working copy)
> > @@ -1,3 +1,7 @@
> > +2019-01-05  Jan Hubicka  <hubicka@ucw.cz>
> > +
> > +	* params.def (inline-unit-growth): Set to 40.
> > +
> > 2019-01-12  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	* tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
> > Index: params.def
> > ===================================================================
> > --- params.def	(revision 267882)
> > +++ params.def	(working copy)
> > @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> > DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
> > 	 "inline-unit-growth",
> > 	 "How much can given compilation unit grow because of the inlining (in percent).",
> > -	 20, 0, 0)
> > +	 40, 0, 0)
> > DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
> > 	 "ipcp-unit-growth",
> > 	 "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-15 15:14   ` Jan Hubicka
@ 2019-01-15 16:18     ` Qing Zhao
  0 siblings, 0 replies; 7+ messages in thread
From: Qing Zhao @ 2019-01-15 16:18 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

Okay, I see.
thanks.

Qing
> On Jan 15, 2019, at 9:14 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
>> Hi, Honza,
>> 
>> in addition to the code size problems, there are several runtime regression for the SPEC: (If I read the table correctly, if not, let me know)
>> 
>> SPEC/SPEC2006/INT/483.xalancbmk <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=183.290.0>	146.131	4.89%
> 
> This test seems to be just a noise, if you look on the mainline plots
> there is no noticeable regression and you can see differences +- 4% in
> last 10 runs.
>> 
>> SPEC/SPEC2006/FP/436.cactusADM <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=171.100.0>	130.967	8.07%	
>> 
>> SPEC/SPEC2017/INT/520.omnetpp_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=172.357.0>	395.582	4.98%	
> 
> Here you can see in the graph both boxes to be yellow which means that
> binary did not changed nor the size changed. It is just measurement error it seems.
>> 
>> SPEC/SPEC2006/FP/435.gromacs <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=177.90.0>	182.555	11.73%	
> 
> I plan to check on gromacs
> This is LTO+PGO which will be rerun only in day or two so I will see if
> the same regression stays on mainline.
>> 
>> SPEC/SPEC2017/INT/541.leela_r <https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=174.397.0>	452.333	4.17%	
> 
> This was already taken by daily testers and regression does not
> reproduce, so it seems to be noise too.
> 
> Honza
>> 
>> do we have plan to study and fix these run-time regression?
> 
>> 
>> thanks.
>> 
>> Qing
>> 
>>> On Jan 12, 2019, at 12:32 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> 
>>> Hello,
>>> this patch sets inline-unit-growth to 40.  The performance changes are
>>> - Firefox, LTO
>>> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f7bd026e1a931b9a284d1c85c2577a72dd592820&newProject=try&newRevision=74889968abcc688b8d161863566ed273c0401ee4&framework=1&filter=opt&showOnlyComparable=1&showOnlyImportant=1
>>> After fixes to inlining priorities this makes difference without
>>> profile feedback only.
>>> 
>>> Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
>>> feedback.
>>> - Firefox noLTO
>>> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c902b72340a3dca3114f58578c1c8f3e6a1cd89c&newProject=try&newRevision=4974da6f92c144a9c09765b56a564a640069ddb9&framework=1&showOnlyComparable=1&showOnlyImportant=1
>>> With about 7% code size growth
>>> - SPEC
>>> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
>>> - C++ benchmarks
>>> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10&all_changes=on&min_percentage_change=0.02&revisions=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
>>> 
>>> I am not entirely happy about the code-size/performance tradeoffs but it
>>> is concerned only for programs built with -O3 or having too many inline
>>> keywords.  I have looked into inlining decisions for Firefox, HHVM and
>>> Clang and inliner gets out of growt bounds way too early and some of
>>> more performance aware projects already sets the limit up.
>>> 
>>> I will tune other metrics down to handle some of the code size problems.
>>> 
>>> Honza
>>> 
>>> Index: ChangeLog
>>> ===================================================================
>>> --- ChangeLog	(revision 267882)
>>> +++ ChangeLog	(working copy)
>>> @@ -1,3 +1,7 @@
>>> +2019-01-05  Jan Hubicka  <hubicka@ucw.cz>
>>> +
>>> +	* params.def (inline-unit-growth): Set to 40.
>>> +
>>> 2019-01-12  Jakub Jelinek  <jakub@redhat.com>
>>> 
>>> 	* tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
>>> Index: params.def
>>> ===================================================================
>>> --- params.def	(revision 267882)
>>> +++ params.def	(working copy)
>>> @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
>>> DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>>> 	 "inline-unit-growth",
>>> 	 "How much can given compilation unit grow because of the inlining (in percent).",
>>> -	 20, 0, 0)
>>> +	 40, 0, 0)
>>> DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>>> 	 "ipcp-unit-growth",
>>> 	 "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",
>> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Set inline-unit-growth to 40
  2019-01-14 16:40   ` Jan Hubicka
@ 2019-01-17 13:22     ` Christophe Lyon
  0 siblings, 0 replies; 7+ messages in thread
From: Christophe Lyon @ 2019-01-17 13:22 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc Patches

On Mon, 14 Jan 2019 at 17:40, Jan Hubicka <hubicka@ucw.cz> wrote:
>
> Hello,
> > > Index: params.def
> > > ===================================================================
> > > --- params.def  (revision 267882)
> > > +++ params.def  (working copy)
> > > @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> > >  DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
> > >          "inline-unit-growth",
> > >          "How much can given compilation unit grow because of the inlining (in percent).",
> > > -        20, 0, 0)
> > > +        40, 0, 0)
> > >  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
> > >          "ipcp-unit-growth",
> > >          "How much can given compilation unit grow because of the interprocedural constant propagation (in percent).",
> >
> > This patch introduces a regression in libstdc++:
> > FAIL: ext/pb_ds/regression/list_update_map_rand.cc execution test
> > on a few arm targets.
> >
> > For instance:
> > arm-none-linux-gnueabihf
> > --with-mode arm
> > --with-cpu cortex-a5
> > --with-fpu vfpv3-d16-fp16
>
> Adjusting inliner heuiristics should not trigger correcness issues, so
> this seems like a bug that was previously latent.  I guess it may
> legally break correct code only if stack usage gets too large.
>

Indeed I didn't expect such a change to involve correctness issues.


> Do you have any idea what breaks in this testcase?
>

Hmm, I've just seen this test fail then pass again in validations
that did not involve any change related to it.
So there's some "randomness", and the logs are not very helpful
(no message saying memory exhaustion or similar)


> Honza
> >
> > Using --with-mode thumb and the same other configure options makes the
> > test pass.
> > I'm seeing this with other configurations --with-mode arm and
> > --with-fpu vfp* (as opposed to neon*)
> >
> > The .log file has:
> > <?xml version = "1.0"?>
> > <test>
> > <sd value = "1547347280">
> > </sd>
> > <n value = "50">
> > </n>
> > <m value = "10">
> > </m>
> > <tp value = "0.2">
> > </tp>
> > <ip value = "0.6">
> > </ip>
> > <ep value = "0.2">
> > </ep>
> > <cp value = "0.001">
> > </cp>
> > <mp value = "0.25">
> > </mp>
> > </test>
> > <cntnr name = "lu_mtf_map">
> > <desc>
> > <type value = "list_update">
> > <Update_Policy value = "lu_move_to_front_policy">
> > </Update_Policy>
> > </type>
> > </desc>
> > <progress>
> > ----------------------------------------
> > ************qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> >
> > The (incomplete?) qemu execution trace ends with:
> > ----------------
> > IN:
> > 0x40ada6b8:  e5910008  ldr      r0, [r1, #8]
> > 0x40ada6bc:  e1560000  cmp      r6, r0
> > 0x40ada6c0:  1a00004f  bne      #0x40ada804
> > ----------------
> > IN:
> > 0x40ada6c4:  e5960004  ldr      r0, [r6, #4]
> > 0x40ada6c8:  e582100c  str      r1, [r2, #0xc]
> > 0x40ada6cc:  e3500c02  cmp      r0, #0x200
> > 0x40ada6d0:  e5812008  str      r2, [r1, #8]
> > 0x40ada6d4:  3a000002  blo      #0x40ada6e4
> > ----------------
> > IN:
> > 0x40adb880:  eaffff3e  b        #0x40adb580
> > ----------------
> > IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
> > 0x0001ffc4:  e3a06000  mov      r6, #0
> > 0x0001ffc8:  eaffff88  b        #0x1fdf0
> > ----------------
> > IN: _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_EEEEE13subscript_impENSt3tr117integral_constantIiLi0EEE
> > 0x0001fdf0:  ee180a10  vmov     r0, s16
> > 0x0001fdf4:  ebffcd12  bl       #0x13244
> >
> > Christophe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-17 13:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-12 18:32 Set inline-unit-growth to 40 Jan Hubicka
2019-01-14 16:28 ` Christophe Lyon
2019-01-14 16:40   ` Jan Hubicka
2019-01-17 13:22     ` Christophe Lyon
2019-01-14 16:53 ` Qing Zhao
2019-01-15 15:14   ` Jan Hubicka
2019-01-15 16:18     ` Qing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).