[PATCH AutoFDO]Restoring indirect call value profile transformation

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH AutoFDO]Restoring indirect call value profile transformation
@ 2018-12-13  3:51 bin.cheng
  2018-12-13 18:48 ` Jeff Law
  2018-12-16  1:11 ` Andi Kleen
  0 siblings, 2 replies; 20+ messages in thread
From: bin.cheng @ 2018-12-13  3:51 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

Hi,

Due to ICE and mal-functional bugs, indirect call value profile transformation
is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
main issue is AutoFDO should store cgraph_node's profile_id of callee func in
the first histogram value's counter, rather than pointer to callee's name string
as it is now.
With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
others are unstable.  I think the instability is caused by poor perf data collected
during regrets run, and can confirm these tests pass if good perf data could be
collected in manual experiments.

Bootstrap and test along with previous patches.  Is it OK?

FYI, an update about AutoFDO status: 
All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
three categories:

Unstable indirect call value profile transformation:
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-ipa-dump afdo "Indirect call -> direct call.* AA transformation on insn"
FAIL: g++.dg/tree-prof/morefunc.C scan-ipa-dump-times afdo "Indirect call -> direct call" 2
FAIL: g++.dg/tree-prof/pr35545.C scan-ipa-dump profile_estimate "Indirect call -> direct call"

loop peeling case because we don't honor autofdo profile count as reliable:
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"

cold/hot partition cases:
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
FAIL: gcc.dg/tree-prof/section-attr-1.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
FAIL: gcc.dg/tree-prof/section-attr-2.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
FAIL: gcc.dg/tree-prof/section-attr-3.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
These are more difficult to enable because we can't simply treat autofdo::zero
count as cold, it's just too many.

Besides regtest, I run autofdo with kernel/mysql-server, the build and performance
match expectations now, but I haven't run autofdo with any spec yet.

Thanks,
bin

2018-12-13  Bin Cheng  <bin.cheng@linux.alibaba.com>

        * auto-profile.c (afdo_indirect_call): Skip generating histogram
        value if we can't find cgraph_node for then indirected callee.  Save
        profile_id of the cgraph_node in histogram value's first counter.
        * value-prof.c (gimple_value_profile_transformations): Don't skip
        for flag_auto_profile.

[-- Attachment #2: 0007-Fix-autofdo-indirect-call-value-prof-transformation.patch --]
[-- Type: application/octet-stream, Size: 2909 bytes --]

From e5ccb88cb8ba5d3a7686b1b2d2371751e64b5756 Mon Sep 17 00:00:00 2001
From: chengbin <bin.cheng@linux.alibaba.com>
Date: Tue, 11 Dec 2018 19:02:12 +0800
Subject: [PATCH 7/7] Fix autofdo indirect call value-prof transformation.

---
 gcc/auto-profile.c | 25 +++++++++++++------------
 gcc/value-prof.c   |  5 -----
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index d337cbc7200..f22395ec8f5 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -992,14 +992,6 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map,
       || gimple_call_fndecl (stmt) != NULL_TREE)
     return;
 
-  callee = gimple_call_fn (stmt);
-
-  histogram_value hist = gimple_alloc_histogram_value (
-      cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
-  hist->n_counters = 3;
-  hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
-  gimple_add_histogram_value (cfun, stmt, hist);
-
   gcov_type total = 0;
   icall_target_map::const_iterator max_iter = map.end ();
 
@@ -1010,9 +1002,20 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map,
       if (max_iter == map.end () || max_iter->second < iter->second)
         max_iter = iter;
     }
+  struct cgraph_node *direct_call = cgraph_node::get_for_asmname (
+      get_identifier (afdo_string_table->get_name (max_iter->first)));
+  if (direct_call == NULL || !direct_call->profile_id)
+    return;
 
-  hist->hvalue.counters[0]
-      = (unsigned long long)afdo_string_table->get_name (max_iter->first);
+  callee = gimple_call_fn (stmt);
+
+  histogram_value hist = gimple_alloc_histogram_value (
+      cfun, HIST_TYPE_INDIR_CALL, stmt, callee);
+  hist->n_counters = 3;
+  hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
+  gimple_add_histogram_value (cfun, stmt, hist);
+
+  hist->hvalue.counters[0] = direct_call->profile_id;
   hist->hvalue.counters[1] = max_iter->second;
   hist->hvalue.counters[2] = total;
 
@@ -1021,8 +1024,6 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map,
 
   struct cgraph_edge *indirect_edge
       = cgraph_node::get (current_function_decl)->get_edge (stmt);
-  struct cgraph_node *direct_call = cgraph_node::get_for_asmname (
-      get_identifier ((const char *) hist->hvalue.counters[0]));
 
   if (dump_file)
     {
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index f3be9ff8747..2f39ede806a 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -627,11 +627,6 @@ gimple_value_profile_transformations (void)
   gimple_stmt_iterator gsi;
   bool changed = false;
 
-  /* Autofdo does its own transformations for indirect calls,
-     and otherwise does not support value profiling.  */
-  if (flag_auto_profile)
-    return false;
-
   FOR_EACH_BB_FN (bb, cfun)
     {
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-13  3:51 [PATCH AutoFDO]Restoring indirect call value profile transformation bin.cheng
@ 2018-12-13 18:48 ` Jeff Law
  2018-12-16  1:11 ` Andi Kleen
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2018-12-13 18:48 UTC (permalink / raw)
  To: bin.cheng, GCC Patches

On 12/12/18 8:50 PM, bin.cheng wrote:
> Hi,
> 
> Due to ICE and mal-functional bugs, indirect call value profile transformation
> is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> the first histogram value's counter, rather than pointer to callee's name string
> as it is now.
> With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> others are unstable.  I think the instability is caused by poor perf data collected
> during regrets run, and can confirm these tests pass if good perf data could be
> collected in manual experiments.
> 
> Bootstrap and test along with previous patches.  Is it OK?
> 
> FYI, an update about AutoFDO status: 
> All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> three categories:
> 
> Unstable indirect call value profile transformation:
> FAIL: g++.dg/tree-prof/indir-call-prof.C scan-ipa-dump afdo "Indirect call -> direct call.* AA transformation on insn"
> FAIL: g++.dg/tree-prof/morefunc.C scan-ipa-dump-times afdo "Indirect call -> direct call" 2
> FAIL: g++.dg/tree-prof/pr35545.C scan-ipa-dump profile_estimate "Indirect call -> direct call"
> 
> loop peeling case because we don't honor autofdo profile count as reliable:
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
> 
> cold/hot partition cases:
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
> FAIL: gcc.dg/tree-prof/section-attr-1.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> FAIL: gcc.dg/tree-prof/section-attr-2.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> FAIL: gcc.dg/tree-prof/section-attr-3.c scan-assembler .section[\t ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> These are more difficult to enable because we can't simply treat autofdo::zero
> count as cold, it's just too many.
> 
> Besides regtest, I run autofdo with kernel/mysql-server, the build and performance
> match expectations now, but I haven't run autofdo with any spec yet.
> 
> Thanks,
> bin
> 
> 2018-12-13  Bin Cheng  <bin.cheng@linux.alibaba.com>
> 
>         * auto-profile.c (afdo_indirect_call): Skip generating histogram
>         value if we can't find cgraph_node for then indirected callee.  Save
>         profile_id of the cgraph_node in histogram value's first counter.
>         * value-prof.c (gimple_value_profile_transformations): Don't skip
>         for flag_auto_profile.
OK
jeff


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-13  3:51 [PATCH AutoFDO]Restoring indirect call value profile transformation bin.cheng
  2018-12-13 18:48 ` Jeff Law
@ 2018-12-16  1:11 ` Andi Kleen
  2018-12-18 11:16   ` Bin.Cheng
  1 sibling, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2018-12-16  1:11 UTC (permalink / raw)
  To: bin.cheng; +Cc: GCC Patches

"bin.cheng" <bin.cheng@linux.alibaba.com> writes:

> Hi,
>
> Due to ICE and mal-functional bugs, indirect call value profile transformation
> is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> the first histogram value's counter, rather than pointer to callee's name string
> as it is now.
> With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> others are unstable.  I think the instability is caused by poor perf data collected
> during regrets run, and can confirm these tests pass if good perf data could be
> collected in manual experiments.

Would be good to make the tests stable, otherwise we'll just have
regressions in the future again.

The problem is that the tests don't run long enough and don't get enough samples?

Could add some loop?
Or possibly increase the sampling frequency in perf (-F or -c)?
Or run them multiple times and use gcov_merge to merge the files?


> FYI, an update about AutoFDO status: 
> All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> three categories:

Great!

Of course it still ICEs with LTO?

Right now there is no test case for this I think. Probably one should be added.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-16  1:11 ` Andi Kleen
@ 2018-12-18 11:16   ` Bin.Cheng
  2018-12-18 21:27     ` Andi Kleen
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Bin.Cheng @ 2018-12-18 11:16 UTC (permalink / raw)
  To: ak; +Cc: bin.cheng, gcc-patches List

On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
>
> > Hi,
> >
> > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > the first histogram value's counter, rather than pointer to callee's name string
> > as it is now.
> > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > others are unstable.  I think the instability is caused by poor perf data collected
> > during regrets run, and can confirm these tests pass if good perf data could be
> > collected in manual experiments.
>
> Would be good to make the tests stable, otherwise we'll just have
> regressions in the future again.
>
> The problem is that the tests don't run long enough and don't get enough samples?
Yes, take g++.dg/tree-prof/morefunc.C as an example:
-  int i;
-  for (i = 0; i < 1000; i++)
+  int i, j;
+  for (i = 0; i < 1000000; i++)
+    for (j = 0; j < 50; j++)
      g += tc->foo();
    if (g<100) g++;
 }
@@ -27,8 +28,9 @@ void test1 (A *tc)
 static __attribute__((always_inline))
 void test2 (B *tc)
 {
-  int i;
+  int i, j;
   for (i = 0; i < 1000000; i++)
+    for (j = 0; j < 50; j++)

I have to increase loop count like this to get stable pass on my
machine.  The original count (1000) is too small to be sampled.

>
> Could add some loop?
> Or possibly increase the sampling frequency in perf (-F or -c)?
Maybe, I will have a try.
> Or run them multiple times and use gcov_merge to merge the files?
Without changing loop count or sampling frequency, this is not likely
to be helpful, since perf doesn't hit the small loop in most cases.

Thanks,
bin
>
>
> > FYI, an update about AutoFDO status:
> > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > three categories:
>
> Great!
>
> Of course it still ICEs with LTO?
>
> Right now there is no test case for this I think. Probably one should be added.
>
> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-18 11:16   ` Bin.Cheng
@ 2018-12-18 21:27     ` Andi Kleen
  2018-12-19  1:27       ` Bin.Cheng
  2018-12-21 22:13       ` Hans-Peter Nilsson
  2018-12-19  2:01     ` Bin.Cheng
  2019-01-14  8:07     ` Andi Kleen
  2 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2018-12-18 21:27 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

> Yes, take g++.dg/tree-prof/morefunc.C as an example:
> -  int i;
> -  for (i = 0; i < 1000; i++)
> +  int i, j;
> +  for (i = 0; i < 1000000; i++)
> +    for (j = 0; j < 50; j++)
>       g += tc->foo();
>     if (g<100) g++;
>  }
> @@ -27,8 +28,9 @@ void test1 (A *tc)
>  static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
> -  int i;
> +  int i, j;
>    for (i = 0; i < 1000000; i++)
> +    for (j = 0; j < 50; j++)
> 
> I have to increase loop count like this to get stable pass on my
> machine.  The original count (1000) is too small to be sampled.

IIRC It was originally higher, but people running on slow simulators complained,
so it was reduced.  Perhaps we need some way to detect in the test suite
that the test runs on a real CPU.

> 
> > > FYI, an update about AutoFDO status:
> > > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > > three categories:
> >
> > Great!
> >
> > Of course it still ICEs with LTO?
> >
> > Right now there is no test case for this I think. Probably one should be added.


Any comments on this?

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-18 21:27     ` Andi Kleen
@ 2018-12-19  1:27       ` Bin.Cheng
  2018-12-19  3:58         ` Andi Kleen
  2018-12-21 22:13       ` Hans-Peter Nilsson
  1 sibling, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2018-12-19  1:27 UTC (permalink / raw)
  To: ak; +Cc: bin.cheng, gcc-patches List

On Wed, Dec 19, 2018 at 5:27 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >       g += tc->foo();
> >     if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >    for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
>
> IIRC It was originally higher, but people running on slow simulators complained,
> so it was reduced.  Perhaps we need some way to detect in the test suite
> that the test runs on a real CPU.
Is there concise way to do this, given gcc may be run on all kinds of
virtual scenarios?

>
> >
> > > > FYI, an update about AutoFDO status:
> > > > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > > > three categories:
> > >
> > > Great!
> > >
> > > Of course it still ICEs with LTO?
> > >
> > > Right now there is no test case for this I think. Probably one should be added.
>
>
> Any comments on this?
We'd like to further investigate AutoFDO+LTO, may I ask what the
status is (or was)?  Any background elaboration about this would be
appreciated.

Thanks,
bin
>
> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-18 11:16   ` Bin.Cheng
  2018-12-18 21:27     ` Andi Kleen
@ 2018-12-19  2:01     ` Bin.Cheng
  2018-12-19  4:00       ` Andi Kleen
  2019-01-14  8:07     ` Andi Kleen
  2 siblings, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2018-12-19  2:01 UTC (permalink / raw)
  To: ak; +Cc: bin.cheng, gcc-patches List

On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
>
> On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> >
> > > Hi,
> > >
> > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > the first histogram value's counter, rather than pointer to callee's name string
> > > as it is now.
> > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > others are unstable.  I think the instability is caused by poor perf data collected
> > > during regrets run, and can confirm these tests pass if good perf data could be
> > > collected in manual experiments.
> >
> > Would be good to make the tests stable, otherwise we'll just have
> > regressions in the future again.
> >
> > The problem is that the tests don't run long enough and don't get enough samples?
> Yes, take g++.dg/tree-prof/morefunc.C as an example:
> -  int i;
> -  for (i = 0; i < 1000; i++)
> +  int i, j;
> +  for (i = 0; i < 1000000; i++)
> +    for (j = 0; j < 50; j++)
>       g += tc->foo();
>     if (g<100) g++;
>  }
> @@ -27,8 +28,9 @@ void test1 (A *tc)
>  static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
> -  int i;
> +  int i, j;
>    for (i = 0; i < 1000000; i++)
> +    for (j = 0; j < 50; j++)
>
> I have to increase loop count like this to get stable pass on my
> machine.  The original count (1000) is too small to be sampled.
>
> >
> > Could add some loop?
> > Or possibly increase the sampling frequency in perf (-F or -c)?
> Maybe, I will have a try.
Turned out all "Indirect call" test can be resolved by adding -c 100
to perf command line:
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
...
-exec perf record -e $E -b "$@"
+exec perf record -e $E -c 100 -b "$@"

Is 100 too small here?  Or is it fine for all scenarios?

Thanks,
bin

> > Or run them multiple times and use gcov_merge to merge the files?
> Without changing loop count or sampling frequency, this is not likely
> to be helpful, since perf doesn't hit the small loop in most cases.
>
> Thanks,
> bin
> >
> >
> > > FYI, an update about AutoFDO status:
> > > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > > three categories:
> >
> > Great!
> >
> > Of course it still ICEs with LTO?
> >
> > Right now there is no test case for this I think. Probably one should be added.
> >
> > -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  1:27       ` Bin.Cheng
@ 2018-12-19  3:58         ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2018-12-19  3:58 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

On Wed, Dec 19, 2018 at 09:26:51AM +0800, Bin.Cheng wrote:
> On Wed, Dec 19, 2018 at 5:27 AM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > -  int i;
> > > -  for (i = 0; i < 1000; i++)
> > > +  int i, j;
> > > +  for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >       g += tc->foo();
> > >     if (g<100) g++;
> > >  }
> > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > >  static __attribute__((always_inline))
> > >  void test2 (B *tc)
> > >  {
> > > -  int i;
> > > +  int i, j;
> > >    for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >
> > > I have to increase loop count like this to get stable pass on my
> > > machine.  The original count (1000) is too small to be sampled.
> >
> > IIRC It was originally higher, but people running on slow simulators complained,
> > so it was reduced.  Perhaps we need some way to detect in the test suite
> > that the test runs on a real CPU.
> Is there concise way to do this, given gcc may be run on all kinds of
> virtual scenarios?

Virtual should be fine too, just simulators are too slow.

I hope there is, because we certainly need a solution for production
ready autofdo.

Or perhaps could just check if perf is working and only
run the tests if that is true. The TCL code already
checks that. Just would need to pass that information
somehow as a define.

Overall I suspect far more test coverage is needed
to make it solid. The existing tests are not that great.


> 
> >
> > >
> > > > > FYI, an update about AutoFDO status:
> > > > > All AutoFDO ICEs in regtest are fixed, while several tests still failing fall in below
> > > > > three categories:
> > > >
> > > > Great!
> > > >
> > > > Of course it still ICEs with LTO?
> > > >
> > > > Right now there is no test case for this I think. Probably one should be added.
> >
> >
> > Any comments on this?
> We'd like to further investigate AutoFDO+LTO, may I ask what the
> status is (or was)?  Any background elaboration about this would be
> appreciated.

It just never worked and ICEs very quickly if you try it.  

There's an open PR (PR71672)

There are some other open issues with autofdo BTW, e.g. the
old 4.9 google branch still has more features than mainline.
For example it supported discriminators, so can distinguish more
than one basic block per source line.

The last time I tested the gains with mainline autofdo
were also significantly less than 4.9-google, so there might
be other tunings missing.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  2:01     ` Bin.Cheng
@ 2018-12-19  4:00       ` Andi Kleen
  2018-12-19  4:08         ` Bin.Cheng
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2018-12-19  4:00 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> >
> > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> > >
> > > > Hi,
> > > >
> > > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > > the first histogram value's counter, rather than pointer to callee's name string
> > > > as it is now.
> > > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > > others are unstable.  I think the instability is caused by poor perf data collected
> > > > during regrets run, and can confirm these tests pass if good perf data could be
> > > > collected in manual experiments.
> > >
> > > Would be good to make the tests stable, otherwise we'll just have
> > > regressions in the future again.
> > >
> > > The problem is that the tests don't run long enough and don't get enough samples?
> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >       g += tc->foo();
> >     if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >    for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
> >
> > >
> > > Could add some loop?
> > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > Maybe, I will have a try.
> Turned out all "Indirect call" test can be resolved by adding -c 100
> to perf command line:
> diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
> ...
> -exec perf record -e $E -b "$@"
> +exec perf record -e $E -c 100 -b "$@"
> 
> Is 100 too small here?  Or is it fine for all scenarios?

-c 100 is risky because it can cause perf throttling, which
makes it lose data.

perf has a limiter that if the PMU handler uses too much CPU
time it stops measuring for some time. A PMI is 10k+ cycles,
so doing one every 100 branches is a lot of CPU time.

I wouldn't go down that low. It is better to increase the
iteration count.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  4:00       ` Andi Kleen
@ 2018-12-19  4:08         ` Bin.Cheng
  2018-12-19  9:36           ` Richard Biener
  2018-12-19 17:11           ` Andi Kleen
  0 siblings, 2 replies; 20+ messages in thread
From: Bin.Cheng @ 2018-12-19  4:08 UTC (permalink / raw)
  To: ak; +Cc: bin.cheng, gcc-patches List

On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> > >
> > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> > > >
> > > > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> > > >
> > > > > Hi,
> > > > >
> > > > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > > > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > > > the first histogram value's counter, rather than pointer to callee's name string
> > > > > as it is now.
> > > > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > > > others are unstable.  I think the instability is caused by poor perf data collected
> > > > > during regrets run, and can confirm these tests pass if good perf data could be
> > > > > collected in manual experiments.
> > > >
> > > > Would be good to make the tests stable, otherwise we'll just have
> > > > regressions in the future again.
> > > >
> > > > The problem is that the tests don't run long enough and don't get enough samples?
> > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > -  int i;
> > > -  for (i = 0; i < 1000; i++)
> > > +  int i, j;
> > > +  for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >       g += tc->foo();
> > >     if (g<100) g++;
> > >  }
> > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > >  static __attribute__((always_inline))
> > >  void test2 (B *tc)
> > >  {
> > > -  int i;
> > > +  int i, j;
> > >    for (i = 0; i < 1000000; i++)
> > > +    for (j = 0; j < 50; j++)
> > >
> > > I have to increase loop count like this to get stable pass on my
> > > machine.  The original count (1000) is too small to be sampled.
> > >
> > > >
> > > > Could add some loop?
> > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > Maybe, I will have a try.
> > Turned out all "Indirect call" test can be resolved by adding -c 100
> > to perf command line:
> > diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
> > ...
> > -exec perf record -e $E -b "$@"
> > +exec perf record -e $E -c 100 -b "$@"
> >
> > Is 100 too small here?  Or is it fine for all scenarios?
>
> -c 100 is risky because it can cause perf throttling, which
> makes it lose data.
Right, it looks suspicious to me too.

>
> perf has a limiter that if the PMU handler uses too much CPU
> time it stops measuring for some time. A PMI is 10k+ cycles,
> so doing one every 100 branches is a lot of CPU time.
>
> I wouldn't go down that low. It is better to increase the
> iteration count.
We can combine the two together, increasing iteration count and
decreasing perf count at the same time.  What count would you suggest
from your experience?

Thanks,
bin
>
> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  4:08         ` Bin.Cheng
@ 2018-12-19  9:36           ` Richard Biener
  2018-12-19 15:41             ` Andi Kleen
  2018-12-19 17:11           ` Andi Kleen
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Biener @ 2018-12-19  9:36 UTC (permalink / raw)
  To: Amker.Cheng; +Cc: Andi Kleen, bin.cheng, GCC Patches

On Wed, Dec 19, 2018 at 5:08 AM Bin.Cheng <amker.cheng@gmail.com> wrote:
>
> On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> > > > >
> > > > > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > > > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > > > > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > > > > the first histogram value's counter, rather than pointer to callee's name string
> > > > > > as it is now.
> > > > > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > > > > others are unstable.  I think the instability is caused by poor perf data collected
> > > > > > during regrets run, and can confirm these tests pass if good perf data could be
> > > > > > collected in manual experiments.
> > > > >
> > > > > Would be good to make the tests stable, otherwise we'll just have
> > > > > regressions in the future again.
> > > > >
> > > > > The problem is that the tests don't run long enough and don't get enough samples?
> > > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > > -  int i;
> > > > -  for (i = 0; i < 1000; i++)
> > > > +  int i, j;
> > > > +  for (i = 0; i < 1000000; i++)
> > > > +    for (j = 0; j < 50; j++)
> > > >       g += tc->foo();
> > > >     if (g<100) g++;
> > > >  }
> > > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > > >  static __attribute__((always_inline))
> > > >  void test2 (B *tc)
> > > >  {
> > > > -  int i;
> > > > +  int i, j;
> > > >    for (i = 0; i < 1000000; i++)
> > > > +    for (j = 0; j < 50; j++)
> > > >
> > > > I have to increase loop count like this to get stable pass on my
> > > > machine.  The original count (1000) is too small to be sampled.
> > > >
> > > > >
> > > > > Could add some loop?
> > > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > > Maybe, I will have a try.
> > > Turned out all "Indirect call" test can be resolved by adding -c 100
> > > to perf command line:
> > > diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
> > > ...
> > > -exec perf record -e $E -b "$@"
> > > +exec perf record -e $E -c 100 -b "$@"
> > >
> > > Is 100 too small here?  Or is it fine for all scenarios?
> >
> > -c 100 is risky because it can cause perf throttling, which
> > makes it lose data.
> Right, it looks suspicious to me too.
>
> >
> > perf has a limiter that if the PMU handler uses too much CPU
> > time it stops measuring for some time. A PMI is 10k+ cycles,
> > so doing one every 100 branches is a lot of CPU time.
> >
> > I wouldn't go down that low. It is better to increase the
> > iteration count.
> We can combine the two together, increasing iteration count and
> decreasing perf count at the same time.  What count would you suggest
> from your experience?

Can we instead for the tests where we want to test profile use/merge
elide the profiling step and supply the "raw" data in an testsuite alternate
file instead?

Richard.

> Thanks,
> bin
> >
> > -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  9:36           ` Richard Biener
@ 2018-12-19 15:41             ` Andi Kleen
  2018-12-19 17:28               ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2018-12-19 15:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: Amker.Cheng, bin.cheng, GCC Patches

> > We can combine the two together, increasing iteration count and
> > decreasing perf count at the same time.  What count would you suggest
> > from your experience?
> 
> Can we instead for the tests where we want to test profile use/merge
> elide the profiling step and supply the "raw" data in an testsuite alternate
> file instead?

That would be possible, but a drawback is that we wouldn't have an
"end2end" test anymore that also tests the interaction with perf
and autofdo. Would be good to test these cases too, there were regressions
in this before.

But perhaps splitting that into two separate tests is reasonable,
with the majority of tests running with fake data.

This would have the advantage that gcc developers who don't
have an autofdo setup (e.g. missing tools or running in virtualization
with PMU disabled) would still do most of the regression tests.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19  4:08         ` Bin.Cheng
  2018-12-19  9:36           ` Richard Biener
@ 2018-12-19 17:11           ` Andi Kleen
  1 sibling, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2018-12-19 17:11 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

On Wed, Dec 19, 2018 at 12:08:35PM +0800, Bin.Cheng wrote:
> On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.cheng@gmail.com> wrote:
> > > >
> > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <ak@linux.intel.com> wrote:
> > > > >
> > > > > "bin.cheng" <bin.cheng@linux.alibaba.com> writes:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Due to ICE and mal-functional bugs, indirect call value profile transformation
> > > > > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > > > > > main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> > > > > > the first histogram value's counter, rather than pointer to callee's name string
> > > > > > as it is now.
> > > > > > With the patch, some "Indirect call -> direct call" tests pass with autofdo, while
> > > > > > others are unstable.  I think the instability is caused by poor perf data collected
> > > > > > during regrets run, and can confirm these tests pass if good perf data could be
> > > > > > collected in manual experiments.
> > > > >
> > > > > Would be good to make the tests stable, otherwise we'll just have
> > > > > regressions in the future again.
> > > > >
> > > > > The problem is that the tests don't run long enough and don't get enough samples?
> > > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > > -  int i;
> > > > -  for (i = 0; i < 1000; i++)
> > > > +  int i, j;
> > > > +  for (i = 0; i < 1000000; i++)
> > > > +    for (j = 0; j < 50; j++)
> > > >       g += tc->foo();
> > > >     if (g<100) g++;
> > > >  }
> > > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > > >  static __attribute__((always_inline))
> > > >  void test2 (B *tc)
> > > >  {
> > > > -  int i;
> > > > +  int i, j;
> > > >    for (i = 0; i < 1000000; i++)
> > > > +    for (j = 0; j < 50; j++)
> > > >
> > > > I have to increase loop count like this to get stable pass on my
> > > > machine.  The original count (1000) is too small to be sampled.
> > > >
> > > > >
> > > > > Could add some loop?
> > > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > > Maybe, I will have a try.
> > > Turned out all "Indirect call" test can be resolved by adding -c 100
> > > to perf command line:
> > > diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
> > > ...
> > > -exec perf record -e $E -b "$@"
> > > +exec perf record -e $E -c 100 -b "$@"
> > >
> > > Is 100 too small here?  Or is it fine for all scenarios?
> >
> > -c 100 is risky because it can cause perf throttling, which
> > makes it lose data.
> Right, it looks suspicious to me too.
> 
> >
> > perf has a limiter that if the PMU handler uses too much CPU
> > time it stops measuring for some time. A PMI is 10k+ cycles,
> > so doing one every 100 branches is a lot of CPU time.
> >
> > I wouldn't go down that low. It is better to increase the
> > iteration count.
> We can combine the two together, increasing iteration count and
> decreasing perf count at the same time.  What count would you suggest
> from your experience?

Normally nothing less than 50k for a common event like branches.

But for such a limited test 10k might still work, as long
as the runtime is fair controlled.

We would probably need to ensure at least 10+ samples, so
that would be 100k iterations.

iirc that was what we used originally, until people
complained about the simulator run times.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19 15:41             ` Andi Kleen
@ 2018-12-19 17:28               ` Richard Biener
  2018-12-19 17:43                 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2018-12-19 17:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Amker.Cheng, bin.cheng, GCC Patches

On Wed, Dec 19, 2018 at 4:41 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> > > We can combine the two together, increasing iteration count and
> > > decreasing perf count at the same time.  What count would you suggest
> > > from your experience?
> >
> > Can we instead for the tests where we want to test profile use/merge
> > elide the profiling step and supply the "raw" data in an testsuite alternate
> > file instead?
>
> That would be possible, but a drawback is that we wouldn't have an
> "end2end" test anymore that also tests the interaction with perf
> and autofdo. Would be good to test these cases too, there were regressions
> in this before.

Sure.

> But perhaps splitting that into two separate tests is reasonable,
> with the majority of tests running with fake data.
>
> This would have the advantage that gcc developers who don't
> have an autofdo setup (e.g. missing tools or running in virtualization
> with PMU disabled) would still do most of the regression tests.

Yes, I think the pros outweight the cons here.  Well, at least if
generating such data that works on multiple archs is even possible?

Richard.

> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-19 17:28               ` Richard Biener
@ 2018-12-19 17:43                 ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2018-12-19 17:43 UTC (permalink / raw)
  To: Richard Biener; +Cc: Amker.Cheng, bin.cheng, GCC Patches

On Wed, Dec 19, 2018 at 06:28:29PM +0100, Richard Biener wrote:
> On Wed, Dec 19, 2018 at 4:41 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > > > We can combine the two together, increasing iteration count and
> > > > decreasing perf count at the same time.  What count would you suggest
> > > > from your experience?
> > >
> > > Can we instead for the tests where we want to test profile use/merge
> > > elide the profiling step and supply the "raw" data in an testsuite alternate
> > > file instead?
> >
> > That would be possible, but a drawback is that we wouldn't have an
> > "end2end" test anymore that also tests the interaction with perf
> > and autofdo. Would be good to test these cases too, there were regressions
> > in this before.
> 
> Sure.
> 
> > But perhaps splitting that into two separate tests is reasonable,
> > with the majority of tests running with fake data.
> >
> > This would have the advantage that gcc developers who don't
> > have an autofdo setup (e.g. missing tools or running in virtualization
> > with PMU disabled) would still do most of the regression tests.
> 
> Yes, I think the pros outweight the cons here.  Well, at least if
> generating such data that works on multiple archs is even possible?

The gcov data that comes out of autofdo is architecture independent
as far as I know. It's mainly counts per line.

In fact even the perf input data should be fairly architecture
independent (except perhaps for endian)

I think it would need a way to write gcov data using text input
(unless you want to put a lot of binaries into the repository)

Also it would need to be adjusted every time a line number
changes in the test cases. I guess best would be if dejagnu
could somehow generate it from test case comments, but I don't
know how complicated that would be.

Doing such updates would be likely difficult with binaries.

In the future if we ever re-add discriminator support
again it would also need some way to specify the correct
discriminator.

I guess for simple test cases it could be ensured it is
always 0.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-18 21:27     ` Andi Kleen
  2018-12-19  1:27       ` Bin.Cheng
@ 2018-12-21 22:13       ` Hans-Peter Nilsson
  1 sibling, 0 replies; 20+ messages in thread
From: Hans-Peter Nilsson @ 2018-12-21 22:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Bin.Cheng, bin.cheng, gcc-patches List

On Tue, 18 Dec 2018, Andi Kleen wrote:

> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >       g += tc->foo();
> >     if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >    for (i = 0; i < 1000000; i++)
> > +    for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
>
> IIRC It was originally higher, but people running on slow simulators complained,
> so it was reduced.  Perhaps we need some way to detect in the test suite
> that the test runs on a real CPU.

Doesn't check_effective_target_simulator work here?
See e.g. libstdc++-v3/testsuite/25_algorithms/heap/moveable2.cc
for an example.

brgds, H-P

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2018-12-18 11:16   ` Bin.Cheng
  2018-12-18 21:27     ` Andi Kleen
  2018-12-19  2:01     ` Bin.Cheng
@ 2019-01-14  8:07     ` Andi Kleen
  2019-01-14  8:15       ` Bin.Cheng
  2 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2019-01-14  8:07 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

Bin Cheng,

I did some testing on this now. The attached patch automatically increases the iterations
for autofdo profiles.

But even with even more iterations I still have stable failures in 

FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> direct call.* a1 transformation on insn"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"

Did these really ever work for you? 

-Andi


diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
index a9bdc167f45..02b01c073e9 100644
--- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
+++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
@@ -2,6 +2,10 @@
 #include "reorder_class1.h"
 #include "reorder_class2.h"
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int g;
 
 #ifdef _PROFILE_USE
@@ -19,7 +23,7 @@ static __attribute__((always_inline))
 void test1 (A *tc)
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
      g += tc->foo(); 
    if (g<100) g++;
 }
@@ -28,7 +32,7 @@ static __attribute__((always_inline))
 void test2 (B *tc)
 {
   int i;
-  for (i = 0; i < 1000000; i++)
+  for (i = 0; i < ITER; i++)
      g += tc->foo();
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 450308d6407..099069da6a7 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -9,6 +9,10 @@ const char *sarr[SIZE];
 const char *buf_hot;
 const char *buf_cold;
 
+#ifndef ITER
+#define ITER 1000000
+#endif
+
 __attribute__((noinline))
 void 
 foo (int path)
@@ -32,7 +36,7 @@ main (int argc, char *argv[])
   int i;
   buf_hot =  "hello";
   buf_cold = "world";
-  for (i = 0; i < 1000000; i++)
+  for (i = 0; i < ITER; i++)
     foo (argc);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
index 58109d54dc7..32d22c69c6c 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
@@ -2,6 +2,10 @@
 /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
 /* { dg-options "-O3 -flto -DDOJOB=1" } */
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int a;
 extern void (*p[2])(int n);
 void abort (void);
@@ -10,12 +14,12 @@ main()
 { int i;
 
   /* This call shall be converted.  */
-  for (i = 0;i<1000;i++)
+  for (i = 0;i<ITER;i++)
     p[0](1);
   /* This call shall not be converted.  */
-  for (i = 0;i<1000;i++)
+  for (i = 0;i<ITER;i++)
     p[i%2](2);
-  if (a != 1000)
+  if (a != ITER)
     abort ();
 
   return 0;
diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
index 53063c3e7fa..8b9dfbb78c7 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
@@ -1,5 +1,9 @@
 /* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile -fdump-ipa-afdo" } */
 
+#ifndef ITER
+#define ITER 100000
+#endif
+
 static int a1 (void)
 {
     return 10;
@@ -28,7 +32,7 @@ main (void)
   int (*p) (void);
   int  i;
 
-  for (i = 0; i < 10000000; i ++)
+  for (i = 0; i < ITER*100; i++)
     {
 	setp (&p, i);
 	p ();
diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
index 7245b68c1ee..b6ed178e1ad 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
@@ -1,13 +1,17 @@
 /* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops -fpeel-loops" } */
 void abort();
 
-int a[1000];
+#ifndef ITER
+#define ITER 1000
+#endif
+
+int a[ITER];
 int
 __attribute__ ((noinline))
 t()
 {
   int i;
-  for (i=0;i<1000;i++)
+  for (i=0;i<ITER;i++)
     if (!a[i])
       return 1;
   abort ();
@@ -16,7 +20,7 @@ int
 main()
 {
   int i;
-  for (i=0;i<1000;i++)
+  for (i=0;i<ITER;i++)
     t();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr52027.c b/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
index c46a14b2e86..bf2a83a336d 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
@@ -2,6 +2,10 @@
 /* { dg-require-effective-target freorder } */
 /* { dg-options "-O2 -freorder-blocks-and-partition -fno-reorder-functions" } */
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 void
 foo (int len)
 {
@@ -13,7 +17,7 @@ int
 main ()
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
     foo (8);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c b/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
index 1e64f284ac0..65570a5e96d 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
@@ -1,9 +1,14 @@
 /* { dg-options "-O2 -ftracer -fdump-tree-tracer" } */
+
+#ifndef ITER
+#define ITER 1000
+#endif
+
 volatile int a, b, c;
 int main ()
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
     {
       if (i % 17)
 	a++;
diff --git a/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c b/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
index 3ad0cf019b3..3027e75a241 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
@@ -1,13 +1,17 @@
 /* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */
 void abort ();
 
-int a[1000];
+#ifndef ITER
+#define ITER 1000
+#endif
+
+int a[ITER];
 int
 __attribute__ ((noinline))
 t()
 {
   int i;
-  for (i=0;i<1000;i++)
+  for (i=0;i<ITER;i++)
     if (!a[i])
       return 1;
   abort ();
@@ -16,7 +20,7 @@ int
 main()
 {
   int i;
-  for (i=0;i<1000;i++)
+  for (i=0;i<ITER;i++)
     t();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
index c286816cdf8..de2d03ebaee 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
@@ -1,5 +1,9 @@
-
 /* { dg-options "-O2 -fdump-tree-optimized-blocks" } */
+
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int a[8];
 __attribute__ ((noinline))
 int t()
@@ -14,7 +18,7 @@ int
 main ()
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
     t ();
   return 0;
 }
diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
index 65494cfd4f6..13e7828bf32 100644
--- a/gcc/testsuite/lib/profopt.exp
+++ b/gcc/testsuite/lib/profopt.exp
@@ -289,8 +289,8 @@ proc auto-profopt-execute { src } {
         return
     }
     set profile_wrapper [profopt-perf-wrapper]
-    set profile_option "-g"
-    set feedback_option "-fauto-profile"
+    set profile_option "-g -DITER=1000000"
+    set feedback_option "-fauto-profile -DITER=1000000"
     set run_autofdo 1
     profopt-execute $src
     unset profile_wrapper

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2019-01-14  8:07     ` Andi Kleen
@ 2019-01-14  8:15       ` Bin.Cheng
  2019-01-14 17:10         ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Bin.Cheng @ 2019-01-14  8:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: bin.cheng, gcc-patches List

On Mon, Jan 14, 2019 at 4:07 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> Bin Cheng,
>
> I did some testing on this now. The attached patch automatically increases the iterations
> for autofdo profiles.
Hi Andi, thanks very much for tuning these.
>
> But even with even more iterations I still have stable failures in
>
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
I think these two are supposed to fail with current code base.
> FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> direct call.* a1 transformation on insn"
I also got unstable pass/fail for indirect call optimization when
tuning iterations, and haven't got an iteration number which passes
all the time.  I guess we need to combine decreasing of sampling count
here.
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
This one should fail too.

Thanks,
bin
>
> Did these really ever work for you?
>
> -Andi
>
>
> diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> index a9bdc167f45..02b01c073e9 100644
> --- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> +++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> @@ -2,6 +2,10 @@
>  #include "reorder_class1.h"
>  #include "reorder_class2.h"
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int g;
>
>  #ifdef _PROFILE_USE
> @@ -19,7 +23,7 @@ static __attribute__((always_inline))
>  void test1 (A *tc)
>  {
>    int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>       g += tc->foo();
>     if (g<100) g++;
>  }
> @@ -28,7 +32,7 @@ static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
>    int i;
> -  for (i = 0; i < 1000000; i++)
> +  for (i = 0; i < ITER; i++)
>       g += tc->foo();
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> index 450308d6407..099069da6a7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> @@ -9,6 +9,10 @@ const char *sarr[SIZE];
>  const char *buf_hot;
>  const char *buf_cold;
>
> +#ifndef ITER
> +#define ITER 1000000
> +#endif
> +
>  __attribute__((noinline))
>  void
>  foo (int path)
> @@ -32,7 +36,7 @@ main (int argc, char *argv[])
>    int i;
>    buf_hot =  "hello";
>    buf_cold = "world";
> -  for (i = 0; i < 1000000; i++)
> +  for (i = 0; i < ITER; i++)
>      foo (argc);
>    return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> index 58109d54dc7..32d22c69c6c 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> @@ -2,6 +2,10 @@
>  /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
>  /* { dg-options "-O3 -flto -DDOJOB=1" } */
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int a;
>  extern void (*p[2])(int n);
>  void abort (void);
> @@ -10,12 +14,12 @@ main()
>  { int i;
>
>    /* This call shall be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i<ITER;i++)
>      p[0](1);
>    /* This call shall not be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i<ITER;i++)
>      p[i%2](2);
> -  if (a != 1000)
> +  if (a != ITER)
>      abort ();
>
>    return 0;
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> index 53063c3e7fa..8b9dfbb78c7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> @@ -1,5 +1,9 @@
>  /* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile -fdump-ipa-afdo" } */
>
> +#ifndef ITER
> +#define ITER 100000
> +#endif
> +
>  static int a1 (void)
>  {
>      return 10;
> @@ -28,7 +32,7 @@ main (void)
>    int (*p) (void);
>    int  i;
>
> -  for (i = 0; i < 10000000; i ++)
> +  for (i = 0; i < ITER*100; i++)
>      {
>         setp (&p, i);
>         p ();
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> index 7245b68c1ee..b6ed178e1ad 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> @@ -1,13 +1,17 @@
>  /* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops -fpeel-loops" } */
>  void abort();
>
> -int a[1000];
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
> +int a[ITER];
>  int
>  __attribute__ ((noinline))
>  t()
>  {
>    int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i<ITER;i++)
>      if (!a[i])
>        return 1;
>    abort ();
> @@ -16,7 +20,7 @@ int
>  main()
>  {
>    int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i<ITER;i++)
>      t();
>    return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr52027.c b/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
> index c46a14b2e86..bf2a83a336d 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
> @@ -2,6 +2,10 @@
>  /* { dg-require-effective-target freorder } */
>  /* { dg-options "-O2 -freorder-blocks-and-partition -fno-reorder-functions" } */
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  void
>  foo (int len)
>  {
> @@ -13,7 +17,7 @@ int
>  main ()
>  {
>    int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>      foo (8);
>    return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c b/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
> index 1e64f284ac0..65570a5e96d 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/tracer-1.c
> @@ -1,9 +1,14 @@
>  /* { dg-options "-O2 -ftracer -fdump-tree-tracer" } */
> +
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  volatile int a, b, c;
>  int main ()
>  {
>    int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>      {
>        if (i % 17)
>         a++;
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c b/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
> index 3ad0cf019b3..3027e75a241 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/unroll-1.c
> @@ -1,13 +1,17 @@
>  /* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */
>  void abort ();
>
> -int a[1000];
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
> +int a[ITER];
>  int
>  __attribute__ ((noinline))
>  t()
>  {
>    int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i<ITER;i++)
>      if (!a[i])
>        return 1;
>    abort ();
> @@ -16,7 +20,7 @@ int
>  main()
>  {
>    int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i<ITER;i++)
>      t();
>    return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
> index c286816cdf8..de2d03ebaee 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c
> @@ -1,5 +1,9 @@
> -
>  /* { dg-options "-O2 -fdump-tree-optimized-blocks" } */
> +
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int a[8];
>  __attribute__ ((noinline))
>  int t()
> @@ -14,7 +18,7 @@ int
>  main ()
>  {
>    int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>      t ();
>    return 0;
>  }
> diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
> index 65494cfd4f6..13e7828bf32 100644
> --- a/gcc/testsuite/lib/profopt.exp
> +++ b/gcc/testsuite/lib/profopt.exp
> @@ -289,8 +289,8 @@ proc auto-profopt-execute { src } {
>          return
>      }
>      set profile_wrapper [profopt-perf-wrapper]
> -    set profile_option "-g"
> -    set feedback_option "-fauto-profile"
> +    set profile_option "-g -DITER=1000000"
> +    set feedback_option "-fauto-profile -DITER=1000000"
>      set run_autofdo 1
>      profopt-execute $src
>      unset profile_wrapper

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2019-01-14  8:15       ` Bin.Cheng
@ 2019-01-14 17:10         ` Andi Kleen
  2019-02-01  4:12           ` Bin.Cheng
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2019-01-14 17:10 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: bin.cheng, gcc-patches List

On Mon, Jan 14, 2019 at 04:15:20PM +0800, Bin.Cheng wrote:
> On Mon, Jan 14, 2019 at 4:07 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > Bin Cheng,
> >
> > I did some testing on this now. The attached patch automatically increases the iterations
> > for autofdo profiles.
> Hi Andi, thanks very much for tuning these.
> >
> > But even with even more iterations I still have stable failures in
> >
> > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
> I think these two are supposed to fail with current code base.


We should mark it as XFAIL then I guess.

Is it understood why it doesn't work?

> > FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> direct call.* a1 transformation on insn"
> I also got unstable pass/fail for indirect call optimization when
> tuning iterations, and haven't got an iteration number which passes
> all the time.  I guess we need to combine decreasing of sampling count
> here.

Okay I will look into that.

Could also try if prime sample after values help, this sometimes fixes
problems with systematically missing some code in sampling.

> > FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
> This one should fail too.

Same.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH AutoFDO]Restoring indirect call value profile transformation
  2019-01-14 17:10         ` Andi Kleen
@ 2019-02-01  4:12           ` Bin.Cheng
  0 siblings, 0 replies; 20+ messages in thread
From: Bin.Cheng @ 2019-02-01  4:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: bin.cheng, gcc-patches List

On Tue, Jan 15, 2019 at 1:10 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Mon, Jan 14, 2019 at 04:15:20PM +0800, Bin.Cheng wrote:
> > On Mon, Jan 14, 2019 at 4:07 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > Bin Cheng,
> > >
> > > I did some testing on this now. The attached patch automatically increases the iterations
> > > for autofdo profiles.
> > Hi Andi, thanks very much for tuning these.
> > >
> > > But even with even more iterations I still have stable failures in
> > >
> > > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> > > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ \ta-zA-Z0-0]+foo[._]+cold
> > I think these two are supposed to fail with current code base.
>
>
Sorry for being late.
> We should mark it as XFAIL then I guess.
Yeah
>
> Is it understood why it doesn't work?
I didn't dig into it, but I think the reason is autofdo::zero count is
not considered as cold as profile count.  It's kind of reasonable,
otherwise too much code would be categorized as cold.
>
> > > FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> direct call.* a1 transformation on insn"
> > I also got unstable pass/fail for indirect call optimization when
> > tuning iterations, and haven't got an iteration number which passes
> > all the time.  I guess we need to combine decreasing of sampling count
> > here.
>
> Okay I will look into that.
>
> Could also try if prime sample after values help, this sometimes fixes
> problems with systematically missing some code in sampling.
>
> > > FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"
> > This one should fail too.
>
> Same.
Maybe I shouldn't use words "should fail", but the reason is we don't
consider autofdo profile count as reliable in niters analysis.  Maybe
this can be improved somehow.

Thanks,
bin
>
> -Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-02-01  4:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-13  3:51 [PATCH AutoFDO]Restoring indirect call value profile transformation bin.cheng
2018-12-13 18:48 ` Jeff Law
2018-12-16  1:11 ` Andi Kleen
2018-12-18 11:16   ` Bin.Cheng
2018-12-18 21:27     ` Andi Kleen
2018-12-19  1:27       ` Bin.Cheng
2018-12-19  3:58         ` Andi Kleen
2018-12-21 22:13       ` Hans-Peter Nilsson
2018-12-19  2:01     ` Bin.Cheng
2018-12-19  4:00       ` Andi Kleen
2018-12-19  4:08         ` Bin.Cheng
2018-12-19  9:36           ` Richard Biener
2018-12-19 15:41             ` Andi Kleen
2018-12-19 17:28               ` Richard Biener
2018-12-19 17:43                 ` Andi Kleen
2018-12-19 17:11           ` Andi Kleen
2019-01-14  8:07     ` Andi Kleen
2019-01-14  8:15       ` Bin.Cheng
2019-01-14 17:10         ` Andi Kleen
2019-02-01  4:12           ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).