public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* ipa-cp heuristics fixes
@ 2015-12-10  7:30 Jan Hubicka
  2015-12-10 10:47 ` Martin Jambor
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jan Hubicka @ 2015-12-10  7:30 UTC (permalink / raw)
  To: gcc-patches, mjambor, dje.gcc

Martin,
while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
First of all, ipcp_cloning_candidate_p calls
 optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
around.  I replaced it by node->optimize_for_size_p ().

Second we perform incredible number of clones because we do obtain some sort of
polymorphic call context for them.  In wast majority of cases this is useless
effort, because the functions in question do not contain virtual calls and do
not pass the parameter further.  For firefox about 40k out of 50k clones
created are created just because we found some context.

I changed the code to only clone if this immediately leads to devirtualization.
This do not cause any noticeable drop in number of devirtualized calls on
Firefox. I suppose we will miss the case where cloning a caller may allow
devirtualization in a clone of callee, but I do not think the heuristics for
context independent values can handle this as implemented right now and it
simply have way to many false positives.

What we can do is to devirtualize w/o cloning for local functions and
speculatively devirtualize in case we would otherwise clone.

Third problem I noticed is that
will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
ignore the function size when deciding about the code size impact.
This function is doing some analysis for inliner where it, for example, analyses
if a comdat which is going to be inlined consistently in the whole program
will be removed.

In the cloning case I do not see this to apply: we have no evidence that the
other units will pass the same constants to the function.  I think you
basically want to assume that the  function will be removed if it has no
address taken and it is not externally visibible. This is what local flag
is for.

I gathered some stats:

number of clones for all contexts: 49948->11102
number of clones: 4376->4383

good_cloning_opportunity_p is called about 70k times, I wonder if the
thresholds are not simply set too high.  For example, inliner does about 300k
inlines at Firefox.

number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
number of devirts: 956->933
number of devirts happening at inline: 781->868
number of indirect calls promoted: 512->512

Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
to: Unit growth for small function inlining: 7965010->9138577

So it seems that except for large drop in number of clones there is no significant difference.

I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?

Honza

	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
	(good_cloning_opportunity_p): Likewise.
	(gather_context_independent_values): Do not return true when
	polymorphic call context is known or when we have known aggregate
	value of unused parameter.
	(estimate_local_effects): Try to create clone for all context
	when either some params are substituted or devirtualization is possible
	or some params can be removed; use local flag instead of
	node->will_be_removed_from_program_if_no_direct_calls_p.
	(identify_dead_nodes): Likewise.
Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 231477)
+++ ipa-cp.c	(working copy)
@@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
       return false;
     }
 
-  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
+  if (node->optimize_for_size_p ())
     {
       if (dump_file)
         fprintf (dump_file, "Not considering %s for cloning; "
@@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
 {
   if (time_benefit == 0
       || !opt_for_fn (node->decl, flag_ipa_cp_clone)
-      || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
+      || node->optimize_for_size_p ())
     return false;
 
   gcc_assert (size_cost > 0);
@@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
 	*removable_params_cost
 	  += ipa_get_param_move_cost (info, i);
 
+      if (!ipa_is_param_used (info, i))
+	continue;
+
       ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
+      /* Do not account known context as reason for cloning.  We can see
+	 if it permits devirtualization.  */
       if (ctxlat->is_single_const ())
-	{
-	  (*known_contexts)[i] = ctxlat->values->value;
-	  ret = true;
-	}
+	(*known_contexts)[i] = ctxlat->values->value;
 
       if (known_aggs)
 	{
@@ -2494,7 +2496,9 @@ estimate_local_effects (struct cgraph_no
 						    &known_contexts, &known_aggs,
 						    &removable_params_cost);
   known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
-  if (always_const)
+  int devirt_bonus = devirtualization_time_bonus (node, known_csts,
+					   known_contexts, known_aggs_ptrs);
+  if (always_const || devirt_bonus || removable_params_cost)
     {
       struct caller_statistics stats;
       inline_hints hints;
@@ -2505,8 +2509,7 @@ estimate_local_effects (struct cgraph_no
 					      false);
       estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
 					 known_aggs_ptrs, &size, &time, &hints);
-      time -= devirtualization_time_bonus (node, known_csts, known_contexts,
-					   known_aggs_ptrs);
+      time -= devirt_bonus;
       time -= hint_time_bonus (hints);
       time -= removable_params_cost;
       size -= stats.n_calls * removable_params_cost;
@@ -2515,8 +2518,7 @@ estimate_local_effects (struct cgraph_no
 	fprintf (dump_file, " - context independent values, size: %i, "
 		 "time_benefit: %i\n", size, base_time - time);
 
-      if (size <= 0
-	  || node->will_be_removed_from_program_if_no_direct_calls_p ())
+      if (size <= 0 || node->local.local)
 	{
 	  info->do_clone_for_all_contexts = true;
 	  base_time = time;
@@ -2544,6 +2546,10 @@ estimate_local_effects (struct cgraph_no
 		     "max_new_size would be reached with %li.\n",
 		     size + overall_size);
 	}
+      else if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "   Not cloning for all contexts because "
+		 "!good_cloning_opportunity_p.\n");
+	
     }
 
   for (i = 0; i < count ; i++)
@@ -4419,7 +4425,7 @@ identify_dead_nodes (struct cgraph_node
 {
   struct cgraph_node *v;
   for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
-    if (v->will_be_removed_from_program_if_no_direct_calls_p ()
+    if (v->local.local
 	&& !v->call_for_symbol_thunks_and_aliases
 	     (has_undead_caller_from_outside_scc_p, NULL, true))
       IPA_NODE_REF (v)->node_dead = 1;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-10  7:30 ipa-cp heuristics fixes Jan Hubicka
@ 2015-12-10 10:47 ` Martin Jambor
  2015-12-10 16:56   ` Jan Hubicka
  2015-12-11 20:11 ` Jakub Jelinek
  2015-12-16  9:16 ` Dominik Vogt
  2 siblings, 1 reply; 22+ messages in thread
From: Martin Jambor @ 2015-12-10 10:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, dje.gcc

Hi,

thanks for looking into this, I only have one question:

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> Martin,
> while looking into the ipa-cp dumps for bzip and Firefox I noticed few issues.
> First of all, ipcp_cloning_candidate_p calls
>  optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))
> which can not be used at WPA time, becuase we have no DECL_STRUCT_FUNCTION
> around.  I replaced it by node->optimize_for_size_p ().
> 
> Second we perform incredible number of clones because we do obtain some sort of
> polymorphic call context for them.  In wast majority of cases this is useless
> effort, because the functions in question do not contain virtual calls and do
> not pass the parameter further.  For firefox about 40k out of 50k clones
> created are created just because we found some context.
> 
> I changed the code to only clone if this immediately leads to devirtualization.
> This do not cause any noticeable drop in number of devirtualized calls on
> Firefox. I suppose we will miss the case where cloning a caller may allow
> devirtualization in a clone of callee, but I do not think the heuristics for
> context independent values can handle this as implemented right now and it
> simply have way to many false positives.
> 
> What we can do is to devirtualize w/o cloning for local functions and
> speculatively devirtualize in case we would otherwise clone.
> 
> Third problem I noticed is that
> will_be_removed_from_program_if_no_direct_calls_p is used to decide if we can
> ignore the function size when deciding about the code size impact.
> This function is doing some analysis for inliner where it, for example, analyses
> if a comdat which is going to be inlined consistently in the whole program
> will be removed.
> 
> In the cloning case I do not see this to apply: we have no evidence that the
> other units will pass the same constants to the function.  I think you
> basically want to assume that the  function will be removed if it has no
> address taken and it is not externally visibible. This is what local flag
> is for.
> 
> I gathered some stats:
> 
> number of clones for all contexts: 49948->11102
> number of clones: 4376->4383
> 
> good_cloning_opportunity_p is called about 70k times, I wonder if the
> thresholds are not simply set too high.  For example, inliner does about 300k
> inlines at Firefox.
> 
> number of param replacements: 13041-> 13056 + 5383 aggregate replacements (I do not have data on unpatched tree for this)
> number of devirts: 956->933
> number of devirts happening at inline: 781->868
> number of indirect calls promoted: 512->512
> 
> Inliner stats from: Unit growth for small function inlining: 7965701->9130051 (14%)
> to: Unit growth for small function inlining: 7965010->9138577
> 
> So it seems that except for large drop in number of clones there is no significant difference.
> 
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> Honza
> 
> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 231477)
> +++ ipa-cp.c	(working copy)
> @@ -613,7 +613,7 @@ ipcp_cloning_candidate_p (struct cgraph_
>        return false;
>      }
>  
> -  if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +  if (node->optimize_for_size_p ())
>      {
>        if (dump_file)
>          fprintf (dump_file, "Not considering %s for cloning; "
> @@ -2267,7 +2267,7 @@ good_cloning_opportunity_p (struct cgrap
>  {
>    if (time_benefit == 0
>        || !opt_for_fn (node->decl, flag_ipa_cp_clone)
> -      || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> +      || node->optimize_for_size_p ())
>      return false;
>  
>    gcc_assert (size_cost > 0);
> @@ -2387,12 +2387,14 @@ gather_context_independent_values (struc
>  	*removable_params_cost
>  	  += ipa_get_param_move_cost (info, i);
>  
> +      if (!ipa_is_param_used (info, i))
> +	continue;
> +

Is this really necessary, is it not enough to remove the assignment to
ret below?  If the parameter is not used, devirtualization time bonus,
which you then rely on estimate_local_effects, should be zero for it.

It is a very minor point, I suppose, but if the function gets cloned
for a different reason, it might still be beneficial to have as much
context-independent information for it as possible too, because that
can then be used in a callee (see the second call of
gather_context_independent_values).

Other than that, all the changes seem like a clear improvement.

Thanks,

Martin

>        ipcp_lattice<ipa_polymorphic_call_context> *ctxlat = &plats->ctxlat;
> +      /* Do not account known context as reason for cloning.  We can see
> +	 if it permits devirtualization.  */
>        if (ctxlat->is_single_const ())
> -	{
> -	  (*known_contexts)[i] = ctxlat->values->value;
> -	  ret = true;
> -	}
> +	(*known_contexts)[i] = ctxlat->values->value;
>  
>        if (known_aggs)
>  	{
> @@ -2494,7 +2496,9 @@ estimate_local_effects (struct cgraph_no
>  						    &known_contexts, &known_aggs,
>  						    &removable_params_cost);
>    known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
> -  if (always_const)
> +  int devirt_bonus = devirtualization_time_bonus (node, known_csts,
> +					   known_contexts, known_aggs_ptrs);
> +  if (always_const || devirt_bonus || removable_params_cost)
>      {
>        struct caller_statistics stats;
>        inline_hints hints;
> @@ -2505,8 +2509,7 @@ estimate_local_effects (struct cgraph_no
>  					      false);
>        estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>  					 known_aggs_ptrs, &size, &time, &hints);
> -      time -= devirtualization_time_bonus (node, known_csts, known_contexts,
> -					   known_aggs_ptrs);
> +      time -= devirt_bonus;
>        time -= hint_time_bonus (hints);
>        time -= removable_params_cost;
>        size -= stats.n_calls * removable_params_cost;
> @@ -2515,8 +2518,7 @@ estimate_local_effects (struct cgraph_no
>  	fprintf (dump_file, " - context independent values, size: %i, "
>  		 "time_benefit: %i\n", size, base_time - time);
>  
> -      if (size <= 0
> -	  || node->will_be_removed_from_program_if_no_direct_calls_p ())
> +      if (size <= 0 || node->local.local)
>  	{
>  	  info->do_clone_for_all_contexts = true;
>  	  base_time = time;
> @@ -2544,6 +2546,10 @@ estimate_local_effects (struct cgraph_no
>  		     "max_new_size would be reached with %li.\n",
>  		     size + overall_size);
>  	}
> +      else if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, "   Not cloning for all contexts because "
> +		 "!good_cloning_opportunity_p.\n");
> +	
>      }
>  
>    for (i = 0; i < count ; i++)
> @@ -4419,7 +4425,7 @@ identify_dead_nodes (struct cgraph_node
>  {
>    struct cgraph_node *v;
>    for (v = node; v ; v = ((struct ipa_dfs_info *) v->aux)->next_cycle)
> -    if (v->will_be_removed_from_program_if_no_direct_calls_p ()
> +    if (v->local.local
>  	&& !v->call_for_symbol_thunks_and_aliases
>  	     (has_undead_caller_from_outside_scc_p, NULL, true))
>        IPA_NODE_REF (v)->node_dead = 1;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-10 10:47 ` Martin Jambor
@ 2015-12-10 16:56   ` Jan Hubicka
  2015-12-11 12:03     ` Martin Jambor
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-10 16:56 UTC (permalink / raw)
  To: Jan Hubicka, gcc-patches, dje.gcc

> Is this really necessary, is it not enough to remove the assignment to
> ret below?  If the parameter is not used, devirtualization time bonus,
> which you then rely on estimate_local_effects, should be zero for it.
> 
> It is a very minor point, I suppose, but if the function gets cloned
> for a different reason, it might still be beneficial to have as much
> context-independent information for it as possible too, because that
> can then be used in a callee (see the second call of
> gather_context_independent_values).
> 
> Other than that, all the changes seem like a clear improvement.

The cutoff is there mainly for the rest of the function:
      if (known_aggs)
        {
          vec<ipa_agg_jf_item, va_gc> *agg_items;
          struct ipa_agg_jump_function *ajf;

          agg_items = context_independent_aggregate_values (plats);
          ajf = &(*known_aggs)[i];
          ajf->items = agg_items;
          ajf->by_ref = plats->aggs_by_ref;
          ret |= agg_items != NULL;
        }
I did not want ret to become true if we manage to propagate into an unused
aggregate parameter.

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-10 16:56   ` Jan Hubicka
@ 2015-12-11 12:03     ` Martin Jambor
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Jambor @ 2015-12-11 12:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, dje.gcc

On Thu, Dec 10, 2015 at 05:56:26PM +0100, Jan Hubicka wrote:
> > Is this really necessary, is it not enough to remove the assignment to
> > ret below?  If the parameter is not used, devirtualization time bonus,
> > which you then rely on estimate_local_effects, should be zero for it.
> > 
> > It is a very minor point, I suppose, but if the function gets cloned
> > for a different reason, it might still be beneficial to have as much
> > context-independent information for it as possible too, because that
> > can then be used in a callee (see the second call of
> > gather_context_independent_values).
> > 
> > Other than that, all the changes seem like a clear improvement.
> 
> The cutoff is there mainly for the rest of the function:
>       if (known_aggs)
>         {
>           vec<ipa_agg_jf_item, va_gc> *agg_items;
>           struct ipa_agg_jump_function *ajf;
> 
>           agg_items = context_independent_aggregate_values (plats);
>           ajf = &(*known_aggs)[i];
>           ajf->items = agg_items;
>           ajf->by_ref = plats->aggs_by_ref;
>           ret |= agg_items != NULL;
>         }
> I did not want ret to become true if we manage to propagate into an unused
> aggregate parameter.

I see, it makes sense.

Thanks,

Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-10  7:30 ipa-cp heuristics fixes Jan Hubicka
  2015-12-10 10:47 ` Martin Jambor
@ 2015-12-11 20:11 ` Jakub Jelinek
  2015-12-11 20:40   ` Jan Hubicka
  2015-12-16  9:16 ` Dominik Vogt
  2 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-11 20:11 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, mjambor, dje.gcc

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?

I think this patch (just a guess, but certainly ipa-cp related during last
24 hours) significantly regressed guality/pr36728-*.c on x86_64.
Previously we have not turned foo into foo.constprop*, now we do, and pass
just arg7 instead of arg1..arg7.  That is fine, but we really should be
emitting the debug info stuff for that case, that was added to fix PR47858,
but for whatever reason it doesn't happen in this case.  Does it take some
other path in ipa-prop.c, or bypass ipa-prop, something different?

> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-11 20:11 ` Jakub Jelinek
@ 2015-12-11 20:40   ` Jan Hubicka
  2015-12-11 21:20     ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-11 20:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, mjambor, dje.gcc

> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> I think this patch (just a guess, but certainly ipa-cp related during last
> 24 hours) significantly regressed guality/pr36728-*.c on x86_64.
> Previously we have not turned foo into foo.constprop*, now we do, and pass
> just arg7 instead of arg1..arg7.  That is fine, but we really should be

Yes, I changed the heuristics to consider it a win to drop the arugment even
if no constant propagation is done.

> emitting the debug info stuff for that case, that was added to fix PR47858,
> but for whatever reason it doesn't happen in this case.  Does it take some
> other path in ipa-prop.c, or bypass ipa-prop, something different?

No, it is the same clonning as any other. Just the decision heuristics changed.
We would get same issue qith guality/pr36728- if one of arg1...arg7 was
constant.  I will try to take a look at your patch for PR47858 and see if I can
work out what goes wrong.

Honza
> 
> > 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > 	(good_cloning_opportunity_p): Likewise.
> > 	(gather_context_independent_values): Do not return true when
> > 	polymorphic call context is known or when we have known aggregate
> > 	value of unused parameter.
> > 	(estimate_local_effects): Try to create clone for all context
> > 	when either some params are substituted or devirtualization is possible
> > 	or some params can be removed; use local flag instead of
> > 	node->will_be_removed_from_program_if_no_direct_calls_p.
> > 	(identify_dead_nodes): Likewise.
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-11 20:40   ` Jan Hubicka
@ 2015-12-11 21:20     ` Jan Hubicka
  2015-12-14 14:29       ` Martin Jambor
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-11 21:20 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jakub Jelinek, gcc-patches, mjambor, dje.gcc

Actually I added
      if (!ipa_is_param_used (info, i))                                         
        continue;                                                               
shortcut to gather_context_independent_values which prevents
us from recording context_independent_aggregate_values for unused
aggregate parameters. Perhaps that is causing the isssue?
We can simply record them and just avoid returning true if
all propagations happen to those.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-11 21:20     ` Jan Hubicka
@ 2015-12-14 14:29       ` Martin Jambor
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Jambor @ 2015-12-14 14:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jakub Jelinek, gcc-patches, dje.gcc

Hi,

On Fri, Dec 11, 2015 at 10:20:20PM +0100, Jan Hubicka wrote:
> Actually I added
>       if (!ipa_is_param_used (info, i))                                         
>         continue;                                                               
> shortcut to gather_context_independent_values which prevents
> us from recording context_independent_aggregate_values for unused
> aggregate parameters. Perhaps that is causing the isssue?
> We can simply record them and just avoid returning true if
> all propagations happen to those.

No, it's a different thing changed by the patch.  The following patch
makes the testcase pass but of course it is not a good fix.  We were
not performing any IPA-CP on the testcase before, now we are, so I
suppose this has has uncovered a debug info deficiency.

Martin


diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 8087f66..8c44b5a 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2507,7 +2507,7 @@ estimate_local_effects (struct cgraph_node *node)
   known_aggs_ptrs = agg_jmp_p_vec_for_t_vec (known_aggs);
   int devirt_bonus = devirtualization_time_bonus (node, known_csts,
 					   known_contexts, known_aggs_ptrs);
-  if (always_const || devirt_bonus || removable_params_cost)
+  if (always_const || devirt_bonus)/* || removable_params_cost)*/
     {
       struct caller_statistics stats;
       inline_hints hints;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-10  7:30 ipa-cp heuristics fixes Jan Hubicka
  2015-12-10 10:47 ` Martin Jambor
  2015-12-11 20:11 ` Jakub Jelinek
@ 2015-12-16  9:16 ` Dominik Vogt
  2015-12-16  9:59   ` Richard Biener
  2015-12-16 16:24   ` Jan Hubicka
  2 siblings, 2 replies; 22+ messages in thread
From: Dominik Vogt @ 2015-12-16  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka, Andreas Krebbel

On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> 	(good_cloning_opportunity_p): Likewise.
> 	(gather_context_independent_values): Do not return true when
> 	polymorphic call context is known or when we have known aggregate
> 	value of unused parameter.
> 	(estimate_local_effects): Try to create clone for all context
> 	when either some params are substituted or devirtualization is possible
> 	or some params can be removed; use local flag instead of
> 	node->will_be_removed_from_program_if_no_direct_calls_p.
> 	(identify_dead_nodes): Likewise.

This commit breaks several guality tests on S/390x:

+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
 FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
 FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
...
 FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
+FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
 FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
+FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
 FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
 FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6

What can I do to help fixing this?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16  9:16 ` Dominik Vogt
@ 2015-12-16  9:59   ` Richard Biener
  2015-12-16 10:08     ` Jakub Jelinek
  2015-12-16 10:21     ` Dominik Vogt
  2015-12-16 16:24   ` Jan Hubicka
  1 sibling, 2 replies; 22+ messages in thread
From: Richard Biener @ 2015-12-16  9:59 UTC (permalink / raw)
  To: vogt, GCC Patches, Jan Hubicka, Andreas Krebbel

On Wed, Dec 16, 2015 at 10:15 AM, Dominik Vogt <vogt@linux.vnet.ibm.com> wrote:
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
>>       * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
>>       (good_cloning_opportunity_p): Likewise.
>>       (gather_context_independent_values): Do not return true when
>>       polymorphic call context is known or when we have known aggregate
>>       value of unused parameter.
>>       (estimate_local_effects): Try to create clone for all context
>>       when either some params are substituted or devirtualization is possible
>>       or some params can be removed; use local flag instead of
>>       node->will_be_removed_from_program_if_no_direct_calls_p.
>>       (identify_dead_nodes): Likewise.
>
> This commit breaks several guality tests on S/390x:
>
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
> ...
>  FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6
>
> What can I do to help fixing this?

Same on x86_64 btw.  If there isn't a bugreport already please open
one to track this issue.

Thanks,
Richard.

> Ciao
>
> Dominik ^_^  ^_^
>
> --
>
> Dominik Vogt
> IBM Germany
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16  9:59   ` Richard Biener
@ 2015-12-16 10:08     ` Jakub Jelinek
  2015-12-16 10:21     ` Dominik Vogt
  1 sibling, 0 replies; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-16 10:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: vogt, GCC Patches, Jan Hubicka, Andreas Krebbel

On Wed, Dec 16, 2015 at 10:59:10AM +0100, Richard Biener wrote:
> > What can I do to help fixing this?
> 
> Same on x86_64 btw.  If there isn't a bugreport already please open
> one to track this issue.

PR68860 covers this already and it has been discussed on this ml too
already.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16  9:59   ` Richard Biener
  2015-12-16 10:08     ` Jakub Jelinek
@ 2015-12-16 10:21     ` Dominik Vogt
  1 sibling, 0 replies; 22+ messages in thread
From: Dominik Vogt @ 2015-12-16 10:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andreas Krebbel

On Wed, Dec 16, 2015 at 10:59:10AM +0100, Richard Biener wrote:
> Same on x86_64 btw.  If there isn't a bugreport already please open
> one to track this issue.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68935

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16  9:16 ` Dominik Vogt
  2015-12-16  9:59   ` Richard Biener
@ 2015-12-16 16:24   ` Jan Hubicka
  2015-12-16 16:29     ` Jakub Jelinek
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-16 16:24 UTC (permalink / raw)
  To: gcc-patches, Jan Hubicka, Andreas Krebbel

> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > 	* ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > 	(good_cloning_opportunity_p): Likewise.
> > 	(gather_context_independent_values): Do not return true when
> > 	polymorphic call context is known or when we have known aggregate
> > 	value of unused parameter.
> > 	(estimate_local_effects): Try to create clone for all context
> > 	when either some params are substituted or devirtualization is possible
> > 	or some params can be removed; use local flag instead of
> > 	node->will_be_removed_from_program_if_no_direct_calls_p.
> > 	(identify_dead_nodes): Likewise.
> 
> This commit breaks several guality tests on S/390x:

The patch changes ipa-cp heuristics in a way that it will clone in order to remove
unused parameter.  Previously it did know how to remove unused parameter but would
do so only if some of function's parameter (perhaps the unused one) was also constant.
This is inconsistent.  As a result we do more ipa-cp cloning and I suppose we are
more likely to hit them on guality for simplified testcases that often contain unused
code:
int __attribute__((noinline))
foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
{
  char *x = __builtin_alloca (arg7);
  int __attribute__ ((aligned(32))) y;

  y = 2;
  asm (NOP : "=m" (y), "=m" (b) : "m" (y));
  x[0] = 25;
  asm (NOP : "=m" (x[0]), "=m" (a) : "m" (x[0]), "m" (b));
  return y;
}

int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
  asm volatile ("" :: "r" (l));
  return 0;
}

I am trying to understand Jakub's debug code and perhaps it can be improved. But in
the case of optimized out unused parameters I think it is perfectly resonable to
say that the variable was optimized out.

As you can see, the testcase explicitely prevents ipa-cp constant propagation by the
asm statement.  We can just update the testcases to use the parameters and test
the original issue again.
> 
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 16 y == 2
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-1.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2  line 18 *x == (char) 25
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 18 y == 2
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 16 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-2.c   -O3 -g  line 18 y == 2
>  FAIL: gcc.dg/guality/pr36728-2.c   -Os  line 18 *x == (char) 25
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-3.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 16 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 14 arg7 == 30
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg3 == 3
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg4 == 4
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg5 == 5
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg6 == 6
> +FAIL: gcc.dg/guality/pr36728-4.c   -O3 -g  line 16 arg7 == 30
> ...
>  FAIL: gcc.dg/guality/vla-1.c   -O1  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  line 17 sizeof (a) == 6
> +FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 i == 5
>  FAIL: gcc.dg/guality/vla-1.c   -O3 -g  line 17 sizeof (a) == 6
>  FAIL: gcc.dg/guality/vla-1.c   -Os  line 17 sizeof (a) == 6
> 
> What can I do to help fixing this?
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
> -- 
> 
> Dominik Vogt
> IBM Germany

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16 16:24   ` Jan Hubicka
@ 2015-12-16 16:29     ` Jakub Jelinek
  2015-12-16 17:15       ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-16 16:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, Andreas Krebbel

On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> I am trying to understand Jakub's debug code and perhaps it can be improved. But in
> the case of optimized out unused parameters I think it is perfectly resonable to
> say that the variable was optimized out.

As long as the values that would be passed to the unused parameters are
constant or live in memory or registers that isn't clobbered by the call in
the caller, we have the ability to express that in the debug info now, and
we should.

> As you can see, the testcase explicitely prevents ipa-cp constant propagation by the
> asm statement.  We can just update the testcases to use the parameters and test
> the original issue again.

No, the testcase intentionally wants to test unused arguments, they happen
in quite a lot of code and "optimized out" is not really desirable answer if
we can provide the values some other way.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16 16:29     ` Jakub Jelinek
@ 2015-12-16 17:15       ` Jan Hubicka
  2015-12-16 17:37         ` Jakub Jelinek
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-16 17:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, Andreas Krebbel

> On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> > I am trying to understand Jakub's debug code and perhaps it can be improved. But in
> > the case of optimized out unused parameters I think it is perfectly resonable to
> > say that the variable was optimized out.
> 
> As long as the values that would be passed to the unused parameters are
> constant or live in memory or registers that isn't clobbered by the call in
> the caller, we have the ability to express that in the debug info now, and
> we should.
int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
  asm volatile ("" :: "r" (l));
  return 0;
}

the unused parameters are not constant because of the asm block and we simply do not
compute them if cloning happens.  The following patch to the testcase
Index: testsuite/gcc.dg/guality/pr36728-1.c
===================================================================
--- testsuite/gcc.dg/guality/pr36728-1.c        (revision 231022)
+++ testsuite/gcc.dg/guality/pr36728-1.c        (working copy)
@@ -7,7 +7,7 @@
 int a, b;

 int __attribute__((noinline))
-foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
+foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int cst)
 { 
   char *x = __builtin_alloca (arg7);
   int __attribute__ ((aligned(32))) y;
@@ -48,7 +48,7 @@ main ()
 { 
   int l = 0;
   asm ("" : "=r" (l) : "0" (l));
-  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
+  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
   asm volatile ("" :: "r" (l));
   return 0;
 }

makes it fail on GCC 5 tree, too. The extra unused constant argument makes GCC 5
to clone foo and  optimize out l, too.

Honza
> 
> > As you can see, the testcase explicitely prevents ipa-cp constant propagation by the
> > asm statement.  We can just update the testcases to use the parameters and test
> > the original issue again.
> 
> No, the testcase intentionally wants to test unused arguments, they happen
> in quite a lot of code and "optimized out" is not really desirable answer if
> we can provide the values some other way.
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16 17:15       ` Jan Hubicka
@ 2015-12-16 17:37         ` Jakub Jelinek
  2015-12-16 19:15           ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-16 17:37 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, Andreas Krebbel

On Wed, Dec 16, 2015 at 06:15:33PM +0100, Jan Hubicka wrote:
> > On Wed, Dec 16, 2015 at 05:24:25PM +0100, Jan Hubicka wrote:
> > > I am trying to understand Jakub's debug code and perhaps it can be improved. But in
> > > the case of optimized out unused parameters I think it is perfectly resonable to
> > > say that the variable was optimized out.
> > 
> > As long as the values that would be passed to the unused parameters are
> > constant or live in memory or registers that isn't clobbered by the call in
> > the caller, we have the ability to express that in the debug info now, and
> > we should.
> int
> main ()
> {
>   int l = 0;
>   asm ("" : "=r" (l) : "0" (l));
>   a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
>   asm volatile ("" :: "r" (l));
>   return 0;
> }
> 
> the unused parameters are not constant because of the asm block and we simply do not
> compute them if cloning happens.  The following patch to the testcase

So, we had a debuginfo QoI bug before, but with the change we hit that now
way more often than in the past, which makes it a debuginfo quality regression.

But, we still do emit the right debuginfo stuff for
volatile int vv;

static __attribute__ ((noinline))
int f1 (int x, int y, int z)
{
  int a = x * 2;
  int b = y * 2;
  int c = z * 2;
  vv++;
  return x + z;
}

int
u1 (int x)
{
  return f1 (x, 2, 3);
}

and I really can't see what is the difference between that and say
the pr36728-1.c testcase or the one with your modification to also pass
in a constant.  This one also has one used argument, some unused ones, and
some where a constant is passed.  BTW, if I adjust your modified
pr36728-1.c so that cst is used somewhere, then that parameter is passed,
rather than optimized away as I'd expect (and as happens on the above
testcase).
So, clearly there are multiple ways to perform the same parameter
optimizations, one in isra (though I don't see anything to scalarize above),
another one in ipa-cp or where, and only some of them hit the debug info
creation for the optimized away arguments.
Thus the question is why we have multiple spots to do the same thing,
what is so different between the testcases, and what do we need to change
to emit the debug info we should, and what we should change to emit
the cst below if it is used, if we already clone it anyway.

> Index: testsuite/gcc.dg/guality/pr36728-1.c
> ===================================================================
> --- testsuite/gcc.dg/guality/pr36728-1.c        (revision 231022)
> +++ testsuite/gcc.dg/guality/pr36728-1.c        (working copy)
> @@ -7,7 +7,7 @@
>  int a, b;
> 
>  int __attribute__((noinline))
> -foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
> +foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int cst)
>  { 
>    char *x = __builtin_alloca (arg7);
>    int __attribute__ ((aligned(32))) y;
> @@ -48,7 +48,7 @@ main ()
>  { 
>    int l = 0;
>    asm ("" : "=r" (l) : "0" (l));
> -  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30);
> +  a = foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
>    asm volatile ("" :: "r" (l));
>    return 0;
>  }
> 
> makes it fail on GCC 5 tree, too. The extra unused constant argument makes GCC 5
> to clone foo and  optimize out l, too.

	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16 17:37         ` Jakub Jelinek
@ 2015-12-16 19:15           ` Jan Hubicka
  2015-12-17 18:12             ` Jakub Jelinek
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-16 19:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, Andreas Krebbel

Hi,
just to summarize a discussion on IRC. The problem is that we produce debug
statements for eliminated arguments only in ipa-sra and ipa-split, while we
don't do anything for cgraph clones. This is a problem on release branches,
too.

It seems we have all the necessary logic, but the callee modification code from
ipa-split should be moved to tree_function_versioning (which is used by both
ipa-split and cgraph clone mechanizm) and caller modifcation copied to
cgraph_edge::redirect_call_stmt_to_callee.

I am trying to do that. It seems bit difficult as the caller and callee
modifications are tied together and I do not know how chaining of
transfomraitons is going to work. 

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-16 19:15           ` Jan Hubicka
@ 2015-12-17 18:12             ` Jakub Jelinek
  2015-12-17 21:00               ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-17 18:12 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 12502 bytes --]

On Wed, Dec 16, 2015 at 08:15:12PM +0100, Jan Hubicka wrote:
> just to summarize a discussion on IRC. The problem is that we produce debug
> statements for eliminated arguments only in ipa-sra and ipa-split, while we
> don't do anything for cgraph clones. This is a problem on release branches,
> too.
> 
> It seems we have all the necessary logic, but the callee modification code from
> ipa-split should be moved to tree_function_versioning (which is used by both
> ipa-split and cgraph clone mechanizm) and caller modifcation copied to
> cgraph_edge::redirect_call_stmt_to_callee.
> 
> I am trying to do that. It seems bit difficult as the caller and callee
> modifications are tied together and I do not know how chaining of
> transfomraitons is going to work. 

Ok, so here is a WIP patch changing the functions you wanted, untested so
far.

I've been looking at 3 testcases (attached), -1.c and -3.c with -g -O2,
and -2.c with -g -O3.
The -3.c one is a copy of the test we have for the ipa-split debug info
stuff, before/after the patch we generate the same stuff.
-2.c testcase is for the (new or now much more often taken patch) of
ipa-cp, the patch arranges for proper debug info in that case
(but, I'm really surprised why when the function is already cloned, nothing
figures out that the clone is always called with the same constant
passed to the arg8 and the argument isn't removed and replaced by constant.
-1.c is a testcase for the IPA-SRA path, where we unfortunately end up with
-a slight regression (on the IL size, in the end we generate the same
assembly):
+  # DEBUG D#8 s=> arg8
+  # DEBUG arg8 => D#8
   # DEBUG arg8 => 7
with the patch.  On that testcase, arg8 is used, but it is always passed
value 7 (similarly to -2.c testcase) and in that case we really don't
need/want the decl_debug_args stuff, it is unnecessary, it is enough to say
in the callee that arg8 is 7.  Nothing on the caller side sets the magic
corresponding D# debug expr decl anyway.
Either tree_versioning is too low-level for the debug info addition, or
we need to figure out how to tell it if a constant will be always passed
to some argument and what that constant will be, so that we'd emit
always the # DEBUG arg8 => constant in that case instead of the source bind
stuff (but then figure out what has added that and avoid duplication too).

And then there is another thing, but best to be handled somewhere in
dwarf2out.c or in the debugger.  The arguments are printed in pretty random
order:
#0  foo (arg7=arg7@entry=30, arg8=arg8@entry=7, arg6=6, arg5=5, arg4=4, arg3=3, arg2=2, arg1=1) at pr68860-2.c:15
So, either the debugger for functions with abstract origins should look at
the order of arguments in the abstract origin and ignore order in the
particular instantiation, or dwarf2out.c should sort the
DW_TAG_formal_parameter such that it if at all possible matches the order
specified in the source.

--- gcc/ipa-split.c.jj	2015-12-10 11:14:00.000000000 +0100
+++ gcc/ipa-split.c	2015-12-17 18:21:39.402036180 +0100
@@ -1209,7 +1209,6 @@ split_function (basic_block return_bb, s
   gimple *last_stmt = NULL;
   unsigned int i;
   tree arg, ddef;
-  vec<tree, va_gc> **debug_args = NULL;
 
   if (dump_file)
     {
@@ -1432,73 +1431,38 @@ split_function (basic_block return_bb, s
      vector to say for debug info that if parameter parm had been passed,
      it would have value parm_Y(D).  */
   if (args_to_skip)
-    for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
-	 parm; parm = DECL_CHAIN (parm), num++)
-      if (bitmap_bit_p (args_to_skip, num)
-	  && is_gimple_reg (parm))
-	{
-	  tree ddecl;
-	  gimple *def_temp;
-
-	  /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
-	     otherwise if it didn't exist before, we'd end up with
-	     different SSA_NAME_VERSIONs between -g and -g0.  */
-	  arg = get_or_create_ssa_default_def (cfun, parm);
-	  if (!MAY_HAVE_DEBUG_STMTS)
-	    continue;
-
-	  if (debug_args == NULL)
-	    debug_args = decl_debug_args_insert (node->decl);
-	  ddecl = make_node (DEBUG_EXPR_DECL);
-	  DECL_ARTIFICIAL (ddecl) = 1;
-	  TREE_TYPE (ddecl) = TREE_TYPE (parm);
-	  DECL_MODE (ddecl) = DECL_MODE (parm);
-	  vec_safe_push (*debug_args, DECL_ORIGIN (parm));
-	  vec_safe_push (*debug_args, ddecl);
-	  def_temp = gimple_build_debug_bind (ddecl, unshare_expr (arg),
-					      call);
-	  gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
-	}
-  /* And on the callee side, add
-     DEBUG D#Y s=> parm
-     DEBUG var => D#Y
-     stmts to the first bb where var is a VAR_DECL created for the
-     optimized away parameter in DECL_INITIAL block.  This hints
-     in the debug info that var (whole DECL_ORIGIN is the parm PARM_DECL)
-     is optimized away, but could be looked up at the call site
-     as value of D#X there.  */
-  if (debug_args != NULL)
     {
-      unsigned int i;
-      tree var, vexpr;
-      gimple_stmt_iterator cgsi;
-      gimple *def_temp;
-
-      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
-      var = BLOCK_VARS (DECL_INITIAL (node->decl));
-      i = vec_safe_length (*debug_args);
-      cgsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-      do
+      vec<tree, va_gc> **debug_args = NULL;
+      unsigned i = 0, len = 0;
+      if (MAY_HAVE_DEBUG_STMTS)
 	{
-	  i -= 2;
-	  while (var != NULL_TREE
-		 && DECL_ABSTRACT_ORIGIN (var) != (**debug_args)[i])
-	    var = TREE_CHAIN (var);
-	  if (var == NULL_TREE)
-	    break;
-	  vexpr = make_node (DEBUG_EXPR_DECL);
-	  parm = (**debug_args)[i];
-	  DECL_ARTIFICIAL (vexpr) = 1;
-	  TREE_TYPE (vexpr) = TREE_TYPE (parm);
-	  DECL_MODE (vexpr) = DECL_MODE (parm);
-	  def_temp = gimple_build_debug_source_bind (vexpr, parm,
-						     NULL);
-	  gsi_insert_before (&cgsi, def_temp, GSI_SAME_STMT);
-	  def_temp = gimple_build_debug_bind (var, vexpr, NULL);
-	  gsi_insert_before (&cgsi, def_temp, GSI_SAME_STMT);
+	  debug_args = decl_debug_args_lookup (node->decl);
+	  if (debug_args)
+	    len = vec_safe_length (*debug_args);
 	}
-      while (i);
-      pop_cfun ();
+      for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
+	   parm; parm = DECL_CHAIN (parm), num++)
+	if (bitmap_bit_p (args_to_skip, num) && is_gimple_reg (parm))
+	  {
+	    tree ddecl;
+	    gimple *def_temp;
+
+	    /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
+	       otherwise if it didn't exist before, we'd end up with
+	       different SSA_NAME_VERSIONs between -g and -g0.  */
+	    arg = get_or_create_ssa_default_def (cfun, parm);
+	    if (!MAY_HAVE_DEBUG_STMTS || debug_args == NULL)
+	      continue;
+
+	    while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
+	      i += 2;
+	    if (i >= len)
+	      continue;
+	    ddecl = (**debug_args)[i + 1];
+	    def_temp
+	      = gimple_build_debug_bind (ddecl, unshare_expr (arg), call);
+	    gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
+	  }
     }
 
   /* We avoid address being taken on any variable used by split part,
--- gcc/cgraph.c.jj	2015-12-16 09:02:11.000000000 +0100
+++ gcc/cgraph.c	2015-12-17 18:34:50.609062057 +0100
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.
 #include "params.h"
 #include "tree-chkp.h"
 #include "context.h"
+#include "gimplify.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
@@ -1420,6 +1421,51 @@ cgraph_edge::redirect_call_stmt_to_calle
 	SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
 
       gsi = gsi_for_stmt (e->call_stmt);
+
+      /* For optimized away parameters, add on the caller side
+	 before the call
+	 DEBUG D#X => parm_Y(D)
+	 stmts and associate D#X with parm in decl_debug_args_lookup
+	 vector to say for debug info that if parameter parm had been passed,
+	 it would have value parm_Y(D).  */
+      if (e->callee->clone.combined_args_to_skip && MAY_HAVE_DEBUG_STMTS)
+	{
+	  vec<tree, va_gc> **debug_args
+	    = decl_debug_args_lookup (e->callee->decl);
+	  if (debug_args)
+	    {
+	      tree parm;
+	      unsigned i = 0, num;
+	      unsigned len = vec_safe_length (*debug_args);
+	      for (parm = DECL_ARGUMENTS (decl), num = 0;
+		   parm; parm = DECL_CHAIN (parm), num++)
+		if (bitmap_bit_p (e->callee->clone.combined_args_to_skip, num)
+		    && is_gimple_reg (parm))
+		  {
+		    gimple *def_temp;
+		    unsigned last = i;
+
+		    while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
+		      i += 2;
+		    if (i >= len)
+		      {
+			i = 0;
+			while (i < last && (**debug_args)[i]
+			       != DECL_ORIGIN (parm))
+			  i += 2;
+			if (i >= last)
+			  continue;
+		      }
+		    tree ddecl = (**debug_args)[i + 1];
+		    tree arg = gimple_call_arg (e->call_stmt, num);
+		    def_temp
+		      = gimple_build_debug_bind (ddecl, unshare_expr (arg),
+						 e->call_stmt);
+		    gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
+		  }
+	    }
+	}
+
       gsi_replace (&gsi, new_stmt, false);
       /* We need to defer cleaning EH info on the new statement to
          fixup-cfg.  We may not have dominator information at this point
--- gcc/tree-inline.c.jj	2015-12-10 16:56:26.000000000 +0100
+++ gcc/tree-inline.c	2015-12-17 18:56:18.667166746 +0100
@@ -5740,9 +5740,8 @@ tree_function_versioning (tree old_decl,
   /* Copy the function's static chain.  */
   p = DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl;
   if (p)
-    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl =
-      copy_static_chain (DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl,
-			 &id);
+    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl
+      = copy_static_chain (p, &id);
 
   /* If there's a tree_map, prepare for substitution.  */
   if (tree_map)
@@ -5797,9 +5796,9 @@ tree_function_versioning (tree old_decl,
       }
   /* Copy the function's arguments.  */
   if (DECL_ARGUMENTS (old_decl) != NULL_TREE)
-    DECL_ARGUMENTS (new_decl) =
-      copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
-      				     args_to_skip, &vars);
+    DECL_ARGUMENTS (new_decl)
+      = copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
+				       args_to_skip, &vars);
 
   DECL_INITIAL (new_decl) = remap_blocks (DECL_INITIAL (id.src_fn), &id);
   BLOCK_SUPERCONTEXT (DECL_INITIAL (new_decl)) = new_decl;
@@ -5914,6 +5913,67 @@ tree_function_versioning (tree old_decl,
 	}
     }
 
+  if (args_to_skip && MAY_HAVE_DEBUG_STMTS)
+    {
+      tree parm;
+      vec<tree, va_gc> **debug_args = NULL;
+      unsigned int len = 0;
+      for (parm = DECL_ARGUMENTS (old_decl), i = 0;
+	   parm; parm = DECL_CHAIN (parm), i++)
+	if (bitmap_bit_p (args_to_skip, i) && is_gimple_reg (parm))
+	  {
+	    tree ddecl;
+
+	    if (debug_args == NULL)
+	      {
+		debug_args = decl_debug_args_insert (new_decl);
+		len = vec_safe_length (*debug_args);
+	      }
+	    ddecl = make_node (DEBUG_EXPR_DECL);
+	    DECL_ARTIFICIAL (ddecl) = 1;
+	    TREE_TYPE (ddecl) = TREE_TYPE (parm);
+	    DECL_MODE (ddecl) = DECL_MODE (parm);
+	    vec_safe_push (*debug_args, DECL_ORIGIN (parm));
+	    vec_safe_push (*debug_args, ddecl);
+	  }
+      if (debug_args != NULL)
+	{
+	  /* On the callee side, add
+	     DEBUG D#Y s=> parm
+	     DEBUG var => D#Y
+	     stmts to the first bb where var is a VAR_DECL created for the
+	     optimized away parameter in DECL_INITIAL block.  This hints
+	     in the debug info that var (whole DECL_ORIGIN is the parm
+	     PARM_DECL) is optimized away, but could be looked up at the
+	     call site as value of D#X there.  */
+	  tree var = vars, vexpr;
+	  gimple_stmt_iterator cgsi
+	    = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	  gimple *def_temp;
+	  var = vars;
+	  i = vec_safe_length (*debug_args);
+	  do
+	    {
+	      i -= 2;
+	      while (var != NULL_TREE
+		     && DECL_ABSTRACT_ORIGIN (var) != (**debug_args)[i])
+		var = TREE_CHAIN (var);
+	      if (var == NULL_TREE)
+		break;
+	      vexpr = make_node (DEBUG_EXPR_DECL);
+	      parm = (**debug_args)[i];
+	      DECL_ARTIFICIAL (vexpr) = 1;
+	      TREE_TYPE (vexpr) = TREE_TYPE (parm);
+	      DECL_MODE (vexpr) = DECL_MODE (parm);
+	      def_temp = gimple_build_debug_bind (var, vexpr, NULL);
+	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
+	      def_temp = gimple_build_debug_source_bind (vexpr, parm, NULL);
+	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
+	    }
+	  while (i > len);
+	}
+    }
+
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
 


	Jakub

[-- Attachment #2: pr68860-1.c --]
[-- Type: text/plain, Size: 1789 bytes --]

/* PR debug/36728 */
/* { dg-do run } */
/* { dg-options "-g" } */

#define NOP "nop"

static int __attribute__((noinline))
foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
{
  char *x = __builtin_alloca (arg7);
  int __attribute__ ((aligned(32))) y;

  y = 2;
  asm (NOP : "=m" (y) : "m" (y));
  x[0] = 25 + arg8;
  asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
  return y;
}

/* On s390(x) r2 and r3 are (depending on the optimization level) used
   when adjusting the addresses in order to meet the alignment
   requirements above.  They usually hold the function arguments arg1
   and arg2.  So it is expected that these values are unavailable in
   some of these tests.  */

/* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 14 "arg3" "3" } } */
/* { dg-final { gdb-test 14 "arg4" "4" } } */
/* { dg-final { gdb-test 14 "arg5" "5" } } */
/* { dg-final { gdb-test 14 "arg6" "6" } } */
/* { dg-final { gdb-test 14 "arg7" "30" } } */
/* { dg-final { gdb-test 14 "y" "2" } } */
/* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 16 "arg3" "3" } } */
/* { dg-final { gdb-test 16 "arg4" "4" } } */
/* { dg-final { gdb-test 16 "arg5" "5" } } */
/* { dg-final { gdb-test 16 "arg6" "6" } } */
/* { dg-final { gdb-test 16 "arg7" "30" } } */
/* { dg-final { gdb-test 16 "*x" "(char) 25" } } */
/* { dg-final { gdb-test 16 "y" "2" } } */

int q;

int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
  asm volatile ("" :: "r" (l));
  return 0;
}

[-- Attachment #3: pr68860-2.c --]
[-- Type: text/plain, Size: 1782 bytes --]

/* PR debug/36728 */
/* { dg-do run } */
/* { dg-options "-g" } */

#define NOP "nop"

int __attribute__((noinline))
foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
{
  char *x = __builtin_alloca (arg7);
  int __attribute__ ((aligned(32))) y;

  y = 2;
  asm (NOP : "=m" (y) : "m" (y));
  x[0] = 25 + arg8;
  asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
  return y;
}

/* On s390(x) r2 and r3 are (depending on the optimization level) used
   when adjusting the addresses in order to meet the alignment
   requirements above.  They usually hold the function arguments arg1
   and arg2.  So it is expected that these values are unavailable in
   some of these tests.  */

/* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 14 "arg3" "3" } } */
/* { dg-final { gdb-test 14 "arg4" "4" } } */
/* { dg-final { gdb-test 14 "arg5" "5" } } */
/* { dg-final { gdb-test 14 "arg6" "6" } } */
/* { dg-final { gdb-test 14 "arg7" "30" } } */
/* { dg-final { gdb-test 14 "y" "2" } } */
/* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
/* { dg-final { gdb-test 16 "arg3" "3" } } */
/* { dg-final { gdb-test 16 "arg4" "4" } } */
/* { dg-final { gdb-test 16 "arg5" "5" } } */
/* { dg-final { gdb-test 16 "arg6" "6" } } */
/* { dg-final { gdb-test 16 "arg7" "30" } } */
/* { dg-final { gdb-test 16 "*x" "(char) 25" } } */
/* { dg-final { gdb-test 16 "y" "2" } } */

int q;

int
main ()
{
  int l = 0;
  asm ("" : "=r" (l) : "0" (l));
  foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
  asm volatile ("" :: "r" (l));
  return 0;
}

[-- Attachment #4: pr68860-3.c --]
[-- Type: text/plain, Size: 902 bytes --]

/* PR debug/54519 */
/* { dg-do run } */
/* { dg-options "-g" } */

__attribute__((noinline, noclone)) void
fn1 (int x)
{
  __asm volatile ("" : "+r" (x) : : "memory");
}

static int
fn2 (int x, int y, int z)
{
  int a = 8;
  if (x != z)
    {
      fn1 (x);
      fn1 (x);		/* { dg-final { gdb-test 20 "x" "36" } } */
      if (x == 36)	/* { dg-final { gdb-test 20 "y" "25" } } */
	fn1 (x);	/* { dg-final { gdb-test 20 "z" "6" } } */
      fn1 (x);		/* { dg-final { gdb-test 23 "x" "98" } } */
      if (x == 98)	/* { dg-final { gdb-test 23 "y" "117" } } */
	fn1 (x);	/* { dg-final { gdb-test 23 "z" "8" } } */
      fn1 (x);
      fn1 (x + a);
    }
  return y;
}

__attribute__((noinline, noclone)) int
fn3 (int x, int y)
{
  return fn2 (x, y, 6);
}

__attribute__((noinline, noclone)) int
fn4 (int x, int y)
{
  return fn2 (x, y, 8);
}

int
main ()
{
  fn3 (36, 25);
  fn4 (98, 117);
  return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-17 18:12             ` Jakub Jelinek
@ 2015-12-17 21:00               ` Jan Hubicka
  2015-12-17 21:10                 ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-17 21:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, Andreas Krebbel

Jakub,
thanks a lot for looking into this! I am now bit on tight schedule moving back
to Prague and I knew little about the implementation of debug info for
optimized out arguments.
> 
> Ok, so here is a WIP patch changing the functions you wanted, untested so
> far.
> 
> I've been looking at 3 testcases (attached), -1.c and -3.c with -g -O2,
> and -2.c with -g -O3.
> The -3.c one is a copy of the test we have for the ipa-split debug info
> stuff, before/after the patch we generate the same stuff.
> -2.c testcase is for the (new or now much more often taken patch) of

The path is not really new or much more taken :) We used to do 50k clones on
Firefox, now we do 11k clones.  It is just taken on different testcases
than before... So good news is that solving this should improve debug info
in quite few cases.

> ipa-cp, the patch arranges for proper debug info in that case
> (but, I'm really surprised why when the function is already cloned, nothing
> figures out that the clone is always called with the same constant
> passed to the arg8 and the argument isn't removed and replaced by constant.

I will take a look. Perhaps Martin will know.

> -1.c is a testcase for the IPA-SRA path, where we unfortunately end up with
> -a slight regression (on the IL size, in the end we generate the same
> assembly):
> +  # DEBUG D#8 s=> arg8
> +  # DEBUG arg8 => D#8
>    # DEBUG arg8 => 7
> with the patch.  On that testcase, arg8 is used, but it is always passed
> value 7 (similarly to -2.c testcase) and in that case we really don't
> need/want the decl_debug_args stuff, it is unnecessary, it is enough to say
> in the callee that arg8 is 7.  Nothing on the caller side sets the magic
> corresponding D# debug expr decl anyway.
> Either tree_versioning is too low-level for the debug info addition, or
> we need to figure out how to tell it if a constant will be always passed
> to some argument and what that constant will be, so that we'd emit

tree_versioning has the info for that.  It obtains args_to_skip for arguments
that should be skipped and also tree_map which tells for those argument
that are skipped (removed) what they should be replaced for.  For those we
substituted by a constant, we ge the constant there.  

Note that there is also code to replace aggrgates by constants that bypasses
tree_versioning but I donot think we remove the aggregate arguments after 
propagating the constant in (we should).

I wonder if you tried to trigger a cascaded clonning.  For example:
struct a {int a;int b;};

static int reta (struct a a, int unused)
{
  return a.a;
}
main()
{
  struct a a={1,1};
  int v = reta(a,1);
  struct a a2={1,1};
  v += reta(a2,2);
  return v;
}

will first produce isra clone and then constprop clone and finally turn it to inline clone.

The changed to cgraph and tree-inline makes sense to me.

Honza
> +      /* For optimized away parameters, add on the caller side
> +	 before the call
> +	 DEBUG D#X => parm_Y(D)
> +	 stmts and associate D#X with parm in decl_debug_args_lookup
> +	 vector to say for debug info that if parameter parm had been passed,
> +	 it would have value parm_Y(D).  */
> +      if (e->callee->clone.combined_args_to_skip && MAY_HAVE_DEBUG_STMTS)
> +	{
> +	  vec<tree, va_gc> **debug_args
> +	    = decl_debug_args_lookup (e->callee->decl);
> +	  if (debug_args)
> +	    {
> +	      tree parm;
> +	      unsigned i = 0, num;
> +	      unsigned len = vec_safe_length (*debug_args);
> +	      for (parm = DECL_ARGUMENTS (decl), num = 0;
> +		   parm; parm = DECL_CHAIN (parm), num++)
> +		if (bitmap_bit_p (e->callee->clone.combined_args_to_skip, num)
> +		    && is_gimple_reg (parm))
> +		  {
> +		    gimple *def_temp;
> +		    unsigned last = i;
> +
> +		    while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
> +		      i += 2;
> +		    if (i >= len)
> +		      {
> +			i = 0;
> +			while (i < last && (**debug_args)[i]
> +			       != DECL_ORIGIN (parm))
> +			  i += 2;
> +			if (i >= last)
> +			  continue;
> +		      }
> +		    tree ddecl = (**debug_args)[i + 1];
> +		    tree arg = gimple_call_arg (e->call_stmt, num);
> +		    def_temp
> +		      = gimple_build_debug_bind (ddecl, unshare_expr (arg),
> +						 e->call_stmt);
> +		    gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
> +		  }
> +	    }
> +	}
> +
>        gsi_replace (&gsi, new_stmt, false);
>        /* We need to defer cleaning EH info on the new statement to
>           fixup-cfg.  We may not have dominator information at this point
> --- gcc/tree-inline.c.jj	2015-12-10 16:56:26.000000000 +0100
> +++ gcc/tree-inline.c	2015-12-17 18:56:18.667166746 +0100
> @@ -5740,9 +5740,8 @@ tree_function_versioning (tree old_decl,
>    /* Copy the function's static chain.  */
>    p = DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl;
>    if (p)
> -    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl =
> -      copy_static_chain (DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl,
> -			 &id);
> +    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl
> +      = copy_static_chain (p, &id);
>  
>    /* If there's a tree_map, prepare for substitution.  */
>    if (tree_map)
> @@ -5797,9 +5796,9 @@ tree_function_versioning (tree old_decl,
>        }
>    /* Copy the function's arguments.  */
>    if (DECL_ARGUMENTS (old_decl) != NULL_TREE)
> -    DECL_ARGUMENTS (new_decl) =
> -      copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
> -      				     args_to_skip, &vars);
> +    DECL_ARGUMENTS (new_decl)
> +      = copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
> +				       args_to_skip, &vars);
>  
>    DECL_INITIAL (new_decl) = remap_blocks (DECL_INITIAL (id.src_fn), &id);
>    BLOCK_SUPERCONTEXT (DECL_INITIAL (new_decl)) = new_decl;
> @@ -5914,6 +5913,67 @@ tree_function_versioning (tree old_decl,
>  	}
>      }
>  
> +  if (args_to_skip && MAY_HAVE_DEBUG_STMTS)
> +    {
> +      tree parm;
> +      vec<tree, va_gc> **debug_args = NULL;
> +      unsigned int len = 0;
> +      for (parm = DECL_ARGUMENTS (old_decl), i = 0;
> +	   parm; parm = DECL_CHAIN (parm), i++)
> +	if (bitmap_bit_p (args_to_skip, i) && is_gimple_reg (parm))
> +	  {
> +	    tree ddecl;
> +
> +	    if (debug_args == NULL)
> +	      {
> +		debug_args = decl_debug_args_insert (new_decl);
> +		len = vec_safe_length (*debug_args);
> +	      }
> +	    ddecl = make_node (DEBUG_EXPR_DECL);
> +	    DECL_ARTIFICIAL (ddecl) = 1;
> +	    TREE_TYPE (ddecl) = TREE_TYPE (parm);
> +	    DECL_MODE (ddecl) = DECL_MODE (parm);
> +	    vec_safe_push (*debug_args, DECL_ORIGIN (parm));
> +	    vec_safe_push (*debug_args, ddecl);
> +	  }
> +      if (debug_args != NULL)
> +	{
> +	  /* On the callee side, add
> +	     DEBUG D#Y s=> parm
> +	     DEBUG var => D#Y
> +	     stmts to the first bb where var is a VAR_DECL created for the
> +	     optimized away parameter in DECL_INITIAL block.  This hints
> +	     in the debug info that var (whole DECL_ORIGIN is the parm
> +	     PARM_DECL) is optimized away, but could be looked up at the
> +	     call site as value of D#X there.  */
> +	  tree var = vars, vexpr;
> +	  gimple_stmt_iterator cgsi
> +	    = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +	  gimple *def_temp;
> +	  var = vars;
> +	  i = vec_safe_length (*debug_args);
> +	  do
> +	    {
> +	      i -= 2;
> +	      while (var != NULL_TREE
> +		     && DECL_ABSTRACT_ORIGIN (var) != (**debug_args)[i])
> +		var = TREE_CHAIN (var);
> +	      if (var == NULL_TREE)
> +		break;
> +	      vexpr = make_node (DEBUG_EXPR_DECL);
> +	      parm = (**debug_args)[i];
> +	      DECL_ARTIFICIAL (vexpr) = 1;
> +	      TREE_TYPE (vexpr) = TREE_TYPE (parm);
> +	      DECL_MODE (vexpr) = DECL_MODE (parm);
> +	      def_temp = gimple_build_debug_bind (var, vexpr, NULL);
> +	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
> +	      def_temp = gimple_build_debug_source_bind (vexpr, parm, NULL);
> +	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
> +	    }
> +	  while (i > len);
> +	}
> +    }
> +
>    free_dominance_info (CDI_DOMINATORS);
>    free_dominance_info (CDI_POST_DOMINATORS);
>  
> 
> 
> 	Jakub

> /* PR debug/36728 */
> /* { dg-do run } */
> /* { dg-options "-g" } */
> 
> #define NOP "nop"
> 
> static int __attribute__((noinline))
> foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
> {
>   char *x = __builtin_alloca (arg7);
>   int __attribute__ ((aligned(32))) y;
> 
>   y = 2;
>   asm (NOP : "=m" (y) : "m" (y));
>   x[0] = 25 + arg8;
>   asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
>   return y;
> }
> 
> /* On s390(x) r2 and r3 are (depending on the optimization level) used
>    when adjusting the addresses in order to meet the alignment
>    requirements above.  They usually hold the function arguments arg1
>    and arg2.  So it is expected that these values are unavailable in
>    some of these tests.  */
> 
> /* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 14 "arg3" "3" } } */
> /* { dg-final { gdb-test 14 "arg4" "4" } } */
> /* { dg-final { gdb-test 14 "arg5" "5" } } */
> /* { dg-final { gdb-test 14 "arg6" "6" } } */
> /* { dg-final { gdb-test 14 "arg7" "30" } } */
> /* { dg-final { gdb-test 14 "y" "2" } } */
> /* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 16 "arg3" "3" } } */
> /* { dg-final { gdb-test 16 "arg4" "4" } } */
> /* { dg-final { gdb-test 16 "arg5" "5" } } */
> /* { dg-final { gdb-test 16 "arg6" "6" } } */
> /* { dg-final { gdb-test 16 "arg7" "30" } } */
> /* { dg-final { gdb-test 16 "*x" "(char) 25" } } */
> /* { dg-final { gdb-test 16 "y" "2" } } */
> 
> int q;
> 
> int
> main ()
> {
>   int l = 0;
>   asm ("" : "=r" (l) : "0" (l));
>   foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
>   asm volatile ("" :: "r" (l));
>   return 0;
> }

> /* PR debug/36728 */
> /* { dg-do run } */
> /* { dg-options "-g" } */
> 
> #define NOP "nop"
> 
> int __attribute__((noinline))
> foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
> {
>   char *x = __builtin_alloca (arg7);
>   int __attribute__ ((aligned(32))) y;
> 
>   y = 2;
>   asm (NOP : "=m" (y) : "m" (y));
>   x[0] = 25 + arg8;
>   asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
>   return y;
> }
> 
> /* On s390(x) r2 and r3 are (depending on the optimization level) used
>    when adjusting the addresses in order to meet the alignment
>    requirements above.  They usually hold the function arguments arg1
>    and arg2.  So it is expected that these values are unavailable in
>    some of these tests.  */
> 
> /* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 14 "arg3" "3" } } */
> /* { dg-final { gdb-test 14 "arg4" "4" } } */
> /* { dg-final { gdb-test 14 "arg5" "5" } } */
> /* { dg-final { gdb-test 14 "arg6" "6" } } */
> /* { dg-final { gdb-test 14 "arg7" "30" } } */
> /* { dg-final { gdb-test 14 "y" "2" } } */
> /* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } }*/
> /* { dg-final { gdb-test 16 "arg3" "3" } } */
> /* { dg-final { gdb-test 16 "arg4" "4" } } */
> /* { dg-final { gdb-test 16 "arg5" "5" } } */
> /* { dg-final { gdb-test 16 "arg6" "6" } } */
> /* { dg-final { gdb-test 16 "arg7" "30" } } */
> /* { dg-final { gdb-test 16 "*x" "(char) 25" } } */
> /* { dg-final { gdb-test 16 "y" "2" } } */
> 
> int q;
> 
> int
> main ()
> {
>   int l = 0;
>   asm ("" : "=r" (l) : "0" (l));
>   foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
>   asm volatile ("" :: "r" (l));
>   return 0;
> }

> /* PR debug/54519 */
> /* { dg-do run } */
> /* { dg-options "-g" } */
> 
> __attribute__((noinline, noclone)) void
> fn1 (int x)
> {
>   __asm volatile ("" : "+r" (x) : : "memory");
> }
> 
> static int
> fn2 (int x, int y, int z)
> {
>   int a = 8;
>   if (x != z)
>     {
>       fn1 (x);
>       fn1 (x);		/* { dg-final { gdb-test 20 "x" "36" } } */
>       if (x == 36)	/* { dg-final { gdb-test 20 "y" "25" } } */
> 	fn1 (x);	/* { dg-final { gdb-test 20 "z" "6" } } */
>       fn1 (x);		/* { dg-final { gdb-test 23 "x" "98" } } */
>       if (x == 98)	/* { dg-final { gdb-test 23 "y" "117" } } */
> 	fn1 (x);	/* { dg-final { gdb-test 23 "z" "8" } } */
>       fn1 (x);
>       fn1 (x + a);
>     }
>   return y;
> }
> 
> __attribute__((noinline, noclone)) int
> fn3 (int x, int y)
> {
>   return fn2 (x, y, 6);
> }
> 
> __attribute__((noinline, noclone)) int
> fn4 (int x, int y)
> {
>   return fn2 (x, y, 8);
> }
> 
> int
> main ()
> {
>   fn3 (36, 25);
>   fn4 (98, 117);
>   return 0;
> }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-17 21:00               ` Jan Hubicka
@ 2015-12-17 21:10                 ` Jan Hubicka
  2015-12-18 20:19                   ` Jakub Jelinek
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hubicka @ 2015-12-17 21:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jakub Jelinek, gcc-patches, Andreas Krebbel

> Jakub,
> thanks a lot for looking into this! I am now bit on tight schedule moving back
> to Prague and I knew little about the implementation of debug info for
> optimized out arguments.
Hi,
here is better testcase that also trigger splitting
struct a {int a;int b;};

inline
static int reta (struct a a, int unused, int c)
{
  if (__builtin_expect (c,1) != 0)
   {
     return c;
   }
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  test();
  return a.a;
}
main()
{
  struct a a={1,1};
  int v = reta(a,1,1);
  struct a a2={1,1};
  v += reta(a2,2,1);
  return v;
}

Compile with -fno-early-inlining

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-17 21:10                 ` Jan Hubicka
@ 2015-12-18 20:19                   ` Jakub Jelinek
  2015-12-18 21:42                     ` Jan Hubicka
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Jelinek @ 2015-12-18 20:19 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, Andreas Krebbel

On Thu, Dec 17, 2015 at 10:10:48PM +0100, Jan Hubicka wrote:
> here is better testcase that also trigger splitting
> struct a {int a;int b;};

The debug info on your testcase looks as expected (though,
we really don't emit the DW_OP_GNU_parameter_ref for aggregates anyway
- not sure if we could do something about those at least in the easy cases
where the argument used to be stored into the struct shortly before the
call.

Anyway, here is an updated patch where I'm not adding anything for
skipped parameters that have tree_map replacement, and fixes some issues
found during bootstrap/regtest (x86_64-linux and i686-linux).

Ok for trunk?

2015-12-18  Jakub Jelinek  <jakub@redhat.com>

	PR debug/68860
	* ipa-split.c (split_function): Only perform caller side
	modifications for decl_debug_args here.
	* cgraph.c: Include gimplify.h.
	(cgraph_edge::redirect_call_stmt_to_callee): Add caller side
	debug stmts for decl_debug_args.  Spelling fix in a comment.
	* tree-inline.c (tree_function_versioning): Populate decl_debug_args
	for args_to_skip arguments and add callee side debug stmts.
	Formatting fixes.  Avoid shadowing i variable.

	* gcc.dg/guality/pr68860-1.c: New test.
	* gcc.dg/guality/pr68860-2.c: New test.

--- gcc/ipa-split.c.jj	2015-12-18 15:34:28.484342293 +0100
+++ gcc/ipa-split.c	2015-12-18 15:45:37.080969238 +0100
@@ -1209,7 +1209,6 @@ split_function (basic_block return_bb, s
   gimple *last_stmt = NULL;
   unsigned int i;
   tree arg, ddef;
-  vec<tree, va_gc> **debug_args = NULL;
 
   if (dump_file)
     {
@@ -1432,73 +1431,38 @@ split_function (basic_block return_bb, s
      vector to say for debug info that if parameter parm had been passed,
      it would have value parm_Y(D).  */
   if (args_to_skip)
-    for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
-	 parm; parm = DECL_CHAIN (parm), num++)
-      if (bitmap_bit_p (args_to_skip, num)
-	  && is_gimple_reg (parm))
-	{
-	  tree ddecl;
-	  gimple *def_temp;
-
-	  /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
-	     otherwise if it didn't exist before, we'd end up with
-	     different SSA_NAME_VERSIONs between -g and -g0.  */
-	  arg = get_or_create_ssa_default_def (cfun, parm);
-	  if (!MAY_HAVE_DEBUG_STMTS)
-	    continue;
-
-	  if (debug_args == NULL)
-	    debug_args = decl_debug_args_insert (node->decl);
-	  ddecl = make_node (DEBUG_EXPR_DECL);
-	  DECL_ARTIFICIAL (ddecl) = 1;
-	  TREE_TYPE (ddecl) = TREE_TYPE (parm);
-	  DECL_MODE (ddecl) = DECL_MODE (parm);
-	  vec_safe_push (*debug_args, DECL_ORIGIN (parm));
-	  vec_safe_push (*debug_args, ddecl);
-	  def_temp = gimple_build_debug_bind (ddecl, unshare_expr (arg),
-					      call);
-	  gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
-	}
-  /* And on the callee side, add
-     DEBUG D#Y s=> parm
-     DEBUG var => D#Y
-     stmts to the first bb where var is a VAR_DECL created for the
-     optimized away parameter in DECL_INITIAL block.  This hints
-     in the debug info that var (whole DECL_ORIGIN is the parm PARM_DECL)
-     is optimized away, but could be looked up at the call site
-     as value of D#X there.  */
-  if (debug_args != NULL)
     {
-      unsigned int i;
-      tree var, vexpr;
-      gimple_stmt_iterator cgsi;
-      gimple *def_temp;
-
-      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
-      var = BLOCK_VARS (DECL_INITIAL (node->decl));
-      i = vec_safe_length (*debug_args);
-      cgsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-      do
+      vec<tree, va_gc> **debug_args = NULL;
+      unsigned i = 0, len = 0;
+      if (MAY_HAVE_DEBUG_STMTS)
 	{
-	  i -= 2;
-	  while (var != NULL_TREE
-		 && DECL_ABSTRACT_ORIGIN (var) != (**debug_args)[i])
-	    var = TREE_CHAIN (var);
-	  if (var == NULL_TREE)
-	    break;
-	  vexpr = make_node (DEBUG_EXPR_DECL);
-	  parm = (**debug_args)[i];
-	  DECL_ARTIFICIAL (vexpr) = 1;
-	  TREE_TYPE (vexpr) = TREE_TYPE (parm);
-	  DECL_MODE (vexpr) = DECL_MODE (parm);
-	  def_temp = gimple_build_debug_source_bind (vexpr, parm,
-						     NULL);
-	  gsi_insert_before (&cgsi, def_temp, GSI_SAME_STMT);
-	  def_temp = gimple_build_debug_bind (var, vexpr, NULL);
-	  gsi_insert_before (&cgsi, def_temp, GSI_SAME_STMT);
+	  debug_args = decl_debug_args_lookup (node->decl);
+	  if (debug_args)
+	    len = vec_safe_length (*debug_args);
 	}
-      while (i);
-      pop_cfun ();
+      for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
+	   parm; parm = DECL_CHAIN (parm), num++)
+	if (bitmap_bit_p (args_to_skip, num) && is_gimple_reg (parm))
+	  {
+	    tree ddecl;
+	    gimple *def_temp;
+
+	    /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
+	       otherwise if it didn't exist before, we'd end up with
+	       different SSA_NAME_VERSIONs between -g and -g0.  */
+	    arg = get_or_create_ssa_default_def (cfun, parm);
+	    if (!MAY_HAVE_DEBUG_STMTS || debug_args == NULL)
+	      continue;
+
+	    while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
+	      i += 2;
+	    if (i >= len)
+	      continue;
+	    ddecl = (**debug_args)[i + 1];
+	    def_temp
+	      = gimple_build_debug_bind (ddecl, unshare_expr (arg), call);
+	    gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
+	  }
     }
 
   /* We avoid address being taken on any variable used by split part,
--- gcc/cgraph.c.jj	2015-12-18 15:34:28.324344539 +0100
+++ gcc/cgraph.c	2015-12-18 16:26:43.656578724 +0100
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.
 #include "params.h"
 #include "tree-chkp.h"
 #include "context.h"
+#include "gimplify.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
@@ -1275,7 +1276,7 @@ cgraph_edge::redirect_call_stmt_to_calle
       if (decl)
 	e = e->resolve_speculation (decl);
       /* If types do not match, speculation was likely wrong. 
-         The direct edge was posisbly redirected to the clone with a different
+         The direct edge was possibly redirected to the clone with a different
 	 signature.  We did not update the call statement yet, so compare it 
 	 with the reference that still points to the proper type.  */
       else if (!gimple_check_call_matching_types (e->call_stmt,
@@ -1420,6 +1421,70 @@ cgraph_edge::redirect_call_stmt_to_calle
 	SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
 
       gsi = gsi_for_stmt (e->call_stmt);
+
+      /* For optimized away parameters, add on the caller side
+	 before the call
+	 DEBUG D#X => parm_Y(D)
+	 stmts and associate D#X with parm in decl_debug_args_lookup
+	 vector to say for debug info that if parameter parm had been passed,
+	 it would have value parm_Y(D).  */
+      if (e->callee->clone.combined_args_to_skip && MAY_HAVE_DEBUG_STMTS)
+	{
+	  vec<tree, va_gc> **debug_args
+	    = decl_debug_args_lookup (e->callee->decl);
+	  tree old_decl = gimple_call_fndecl (e->call_stmt);
+	  if (debug_args && old_decl)
+	    {
+	      tree parm;
+	      unsigned i = 0, num;
+	      unsigned len = vec_safe_length (*debug_args);
+	      unsigned nargs = gimple_call_num_args (e->call_stmt);
+	      for (parm = DECL_ARGUMENTS (old_decl), num = 0;
+		   parm && num < nargs;
+		   parm = DECL_CHAIN (parm), num++)
+		if (bitmap_bit_p (e->callee->clone.combined_args_to_skip, num)
+		    && is_gimple_reg (parm))
+		  {
+		    unsigned last = i;
+
+		    while (i < len && (**debug_args)[i] != DECL_ORIGIN (parm))
+		      i += 2;
+		    if (i >= len)
+		      {
+			i = 0;
+			while (i < last
+			       && (**debug_args)[i] != DECL_ORIGIN (parm))
+			  i += 2;
+			if (i >= last)
+			  continue;
+		      }
+		    tree ddecl = (**debug_args)[i + 1];
+		    tree arg = gimple_call_arg (e->call_stmt, num);
+		    if (!useless_type_conversion_p (TREE_TYPE (ddecl),
+						    TREE_TYPE (arg)))
+		      {
+			tree rhs1;
+			if (!fold_convertible_p (TREE_TYPE (ddecl), arg))
+			  continue;
+			if (TREE_CODE (arg) == SSA_NAME
+			    && gimple_assign_cast_p (SSA_NAME_DEF_STMT (arg))
+			    && (rhs1
+				= gimple_assign_rhs1 (SSA_NAME_DEF_STMT (arg)))
+			    && useless_type_conversion_p (TREE_TYPE (ddecl),
+							  TREE_TYPE (rhs1)))
+			  arg = rhs1;
+			else
+			  arg = fold_convert (TREE_TYPE (ddecl), arg);
+		      }
+
+		    gimple *def_temp
+		      = gimple_build_debug_bind (ddecl, unshare_expr (arg),
+						 e->call_stmt);
+		    gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
+		  }
+	    }
+	}
+
       gsi_replace (&gsi, new_stmt, false);
       /* We need to defer cleaning EH info on the new statement to
          fixup-cfg.  We may not have dominator information at this point
--- gcc/tree-inline.c.jj	2015-12-18 15:34:28.308344764 +0100
+++ gcc/tree-inline.c	2015-12-18 15:45:37.083969196 +0100
@@ -5668,6 +5668,7 @@ tree_function_versioning (tree old_decl,
   basic_block old_entry_block, bb;
   auto_vec<gimple *, 10> init_stmts;
   tree vars = NULL_TREE;
+  bitmap debug_args_to_skip = args_to_skip;
 
   gcc_assert (TREE_CODE (old_decl) == FUNCTION_DECL
 	      && TREE_CODE (new_decl) == FUNCTION_DECL);
@@ -5740,9 +5741,8 @@ tree_function_versioning (tree old_decl,
   /* Copy the function's static chain.  */
   p = DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl;
   if (p)
-    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl =
-      copy_static_chain (DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl,
-			 &id);
+    DECL_STRUCT_FUNCTION (new_decl)->static_chain_decl
+      = copy_static_chain (p, &id);
 
   /* If there's a tree_map, prepare for substitution.  */
   if (tree_map)
@@ -5752,29 +5752,39 @@ tree_function_versioning (tree old_decl,
 	replace_info = (*tree_map)[i];
 	if (replace_info->replace_p)
 	  {
+	    int parm_num = -1;
 	    if (!replace_info->old_tree)
 	      {
-		int i = replace_info->parm_num;
+		int p = replace_info->parm_num;
 		tree parm;
-		tree req_type;
+		tree req_type, new_type;
 
-		for (parm = DECL_ARGUMENTS (old_decl); i; parm = DECL_CHAIN (parm))
-		  i --;
+		for (parm = DECL_ARGUMENTS (old_decl); p;
+		     parm = DECL_CHAIN (parm))
+		  p--;
 		replace_info->old_tree = parm;
+		parm_num = replace_info->parm_num;
 		req_type = TREE_TYPE (parm);
-		if (!useless_type_conversion_p (req_type, TREE_TYPE (replace_info->new_tree)))
+		new_type = TREE_TYPE (replace_info->new_tree);
+		if (!useless_type_conversion_p (req_type, new_type))
 		  {
 		    if (fold_convertible_p (req_type, replace_info->new_tree))
-		      replace_info->new_tree = fold_build1 (NOP_EXPR, req_type, replace_info->new_tree);
-		    else if (TYPE_SIZE (req_type) == TYPE_SIZE (TREE_TYPE (replace_info->new_tree)))
-		      replace_info->new_tree = fold_build1 (VIEW_CONVERT_EXPR, req_type, replace_info->new_tree);
+		      replace_info->new_tree
+			= fold_build1 (NOP_EXPR, req_type,
+				       replace_info->new_tree);
+		    else if (TYPE_SIZE (req_type) == TYPE_SIZE (new_type))
+		      replace_info->new_tree
+			= fold_build1 (VIEW_CONVERT_EXPR, req_type,
+				       replace_info->new_tree);
 		    else
 		      {
 			if (dump_file)
 			  {
 			    fprintf (dump_file, "    const ");
-			    print_generic_expr (dump_file, replace_info->new_tree, 0);
-			    fprintf (dump_file, "  can't be converted to param ");
+			    print_generic_expr (dump_file,
+						replace_info->new_tree, 0);
+			    fprintf (dump_file,
+				     "  can't be converted to param ");
 			    print_generic_expr (dump_file, parm, 0);
 			    fprintf (dump_file, "\n");
 			  }
@@ -5792,14 +5802,38 @@ tree_function_versioning (tree old_decl,
 					    &vars);
 		if (init)
 		  init_stmts.safe_push (init);
+		if (MAY_HAVE_DEBUG_STMTS && args_to_skip)
+		  {
+		    if (parm_num == -1)
+		      {
+			tree parm;
+			int p;
+			for (parm = DECL_ARGUMENTS (old_decl), p = 0; parm;
+			     parm = DECL_CHAIN (parm), p++)
+			  if (parm == replace_info->old_tree)
+			    {
+			      parm_num = p;
+			      break;
+			    }
+		      }
+		    if (parm_num != -1)
+		      {
+			if (debug_args_to_skip == args_to_skip)
+			  {
+			    debug_args_to_skip = BITMAP_ALLOC (NULL);
+			    bitmap_copy (debug_args_to_skip, args_to_skip);
+			  }
+			bitmap_clear_bit (debug_args_to_skip, parm_num);
+		      }
+		  }
 	      }
 	  }
       }
   /* Copy the function's arguments.  */
   if (DECL_ARGUMENTS (old_decl) != NULL_TREE)
-    DECL_ARGUMENTS (new_decl) =
-      copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
-      				     args_to_skip, &vars);
+    DECL_ARGUMENTS (new_decl)
+      = copy_arguments_for_versioning (DECL_ARGUMENTS (old_decl), &id,
+				       args_to_skip, &vars);
 
   DECL_INITIAL (new_decl) = remap_blocks (DECL_INITIAL (id.src_fn), &id);
   BLOCK_SUPERCONTEXT (DECL_INITIAL (new_decl)) = new_decl;
@@ -5914,6 +5948,69 @@ tree_function_versioning (tree old_decl,
 	}
     }
 
+  if (debug_args_to_skip && MAY_HAVE_DEBUG_STMTS)
+    {
+      tree parm;
+      vec<tree, va_gc> **debug_args = NULL;
+      unsigned int len = 0;
+      for (parm = DECL_ARGUMENTS (old_decl), i = 0;
+	   parm; parm = DECL_CHAIN (parm), i++)
+	if (bitmap_bit_p (debug_args_to_skip, i) && is_gimple_reg (parm))
+	  {
+	    tree ddecl;
+
+	    if (debug_args == NULL)
+	      {
+		debug_args = decl_debug_args_insert (new_decl);
+		len = vec_safe_length (*debug_args);
+	      }
+	    ddecl = make_node (DEBUG_EXPR_DECL);
+	    DECL_ARTIFICIAL (ddecl) = 1;
+	    TREE_TYPE (ddecl) = TREE_TYPE (parm);
+	    DECL_MODE (ddecl) = DECL_MODE (parm);
+	    vec_safe_push (*debug_args, DECL_ORIGIN (parm));
+	    vec_safe_push (*debug_args, ddecl);
+	  }
+      if (debug_args != NULL)
+	{
+	  /* On the callee side, add
+	     DEBUG D#Y s=> parm
+	     DEBUG var => D#Y
+	     stmts to the first bb where var is a VAR_DECL created for the
+	     optimized away parameter in DECL_INITIAL block.  This hints
+	     in the debug info that var (whole DECL_ORIGIN is the parm
+	     PARM_DECL) is optimized away, but could be looked up at the
+	     call site as value of D#X there.  */
+	  tree var = vars, vexpr;
+	  gimple_stmt_iterator cgsi
+	    = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	  gimple *def_temp;
+	  var = vars;
+	  i = vec_safe_length (*debug_args);
+	  do
+	    {
+	      i -= 2;
+	      while (var != NULL_TREE
+		     && DECL_ABSTRACT_ORIGIN (var) != (**debug_args)[i])
+		var = TREE_CHAIN (var);
+	      if (var == NULL_TREE)
+		break;
+	      vexpr = make_node (DEBUG_EXPR_DECL);
+	      parm = (**debug_args)[i];
+	      DECL_ARTIFICIAL (vexpr) = 1;
+	      TREE_TYPE (vexpr) = TREE_TYPE (parm);
+	      DECL_MODE (vexpr) = DECL_MODE (parm);
+	      def_temp = gimple_build_debug_bind (var, vexpr, NULL);
+	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
+	      def_temp = gimple_build_debug_source_bind (vexpr, parm, NULL);
+	      gsi_insert_before (&cgsi, def_temp, GSI_NEW_STMT);
+	    }
+	  while (i > len);
+	}
+    }
+
+  if (debug_args_to_skip && debug_args_to_skip != args_to_skip)
+    BITMAP_FREE (debug_args_to_skip);
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
 
--- gcc/testsuite/gcc.dg/guality/pr68860-1.c.jj	2015-12-18 15:45:37.083969196 +0100
+++ gcc/testsuite/gcc.dg/guality/pr68860-1.c	2015-12-18 15:45:37.083969196 +0100
@@ -0,0 +1,54 @@
+/* PR debug/68860 */
+/* { dg-do run } */
+/* { dg-options "-g" } */
+
+#include "../nop.h"
+
+static int __attribute__((noinline))
+foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
+{
+  char *x = __builtin_alloca (arg7);
+  int __attribute__ ((aligned(32))) y;
+
+  y = 2;
+  asm (NOP : "=m" (y) : "m" (y));
+  x[0] = 25 + arg8;
+  asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
+  return y;
+}
+
+/* On s390(x) r2 and r3 are (depending on the optimization level) used
+   when adjusting the addresses in order to meet the alignment
+   requirements above.  They usually hold the function arguments arg1
+   and arg2.  So it is expected that these values are unavailable in
+   some of these tests.  */
+
+/* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 14 "arg3" "3" } } */
+/* { dg-final { gdb-test 14 "arg4" "4" } } */
+/* { dg-final { gdb-test 14 "arg5" "5" } } */
+/* { dg-final { gdb-test 14 "arg6" "6" } } */
+/* { dg-final { gdb-test 14 "arg7" "30" } } */
+/* { dg-final { gdb-test 14 "arg8" "7" } } */
+/* { dg-final { gdb-test 14 "y" "2" } } */
+/* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 16 "arg3" "3" } } */
+/* { dg-final { gdb-test 16 "arg4" "4" } } */
+/* { dg-final { gdb-test 16 "arg5" "5" } } */
+/* { dg-final { gdb-test 16 "arg6" "6" } } */
+/* { dg-final { gdb-test 16 "arg7" "30" } } */
+/* { dg-final { gdb-test 16 "arg8" "7" } } */
+/* { dg-final { gdb-test 16 "*x" "(char) 32" } } */
+/* { dg-final { gdb-test 16 "y" "2" } } */
+
+int
+main ()
+{
+  int l = 0;
+  asm volatile ("" : "=r" (l) : "0" (l));
+  foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
+  asm volatile ("" :: "r" (l));
+  return 0;
+}
--- gcc/testsuite/gcc.dg/guality/pr68860-2.c.jj	2015-12-18 15:45:37.084969182 +0100
+++ gcc/testsuite/gcc.dg/guality/pr68860-2.c	2015-12-18 15:45:37.084969182 +0100
@@ -0,0 +1,54 @@
+/* PR debug/68860 */
+/* { dg-do run } */
+/* { dg-options "-g" } */
+
+#include "../nop.h"
+
+int __attribute__((noinline))
+foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
+{
+  char *x = __builtin_alloca (arg7);
+  int __attribute__ ((aligned(32))) y;
+
+  y = 2;
+  asm (NOP : "=m" (y) : "m" (y));
+  x[0] = 25 + arg8;
+  asm volatile (NOP : "=m" (x[0]) : "m" (x[0]));
+  return y;
+}
+
+/* On s390(x) r2 and r3 are (depending on the optimization level) used
+   when adjusting the addresses in order to meet the alignment
+   requirements above.  They usually hold the function arguments arg1
+   and arg2.  So it is expected that these values are unavailable in
+   some of these tests.  */
+
+/* { dg-final { gdb-test 14 "arg1" "1" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 14 "arg2" "2" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 14 "arg3" "3" } } */
+/* { dg-final { gdb-test 14 "arg4" "4" } } */
+/* { dg-final { gdb-test 14 "arg5" "5" } } */
+/* { dg-final { gdb-test 14 "arg6" "6" } } */
+/* { dg-final { gdb-test 14 "arg7" "30" } } */
+/* { dg-final { gdb-test 14 "arg8" "7" } } */
+/* { dg-final { gdb-test 14 "y" "2" } } */
+/* { dg-final { gdb-test 16 "arg1" "1" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 16 "arg2" "2" { target { ! "s390*-*-*" } } } } */
+/* { dg-final { gdb-test 16 "arg3" "3" } } */
+/* { dg-final { gdb-test 16 "arg4" "4" } } */
+/* { dg-final { gdb-test 16 "arg5" "5" } } */
+/* { dg-final { gdb-test 16 "arg6" "6" } } */
+/* { dg-final { gdb-test 16 "arg7" "30" } } */
+/* { dg-final { gdb-test 16 "arg8" "7" } } */
+/* { dg-final { gdb-test 16 "*x" "(char) 32" } } */
+/* { dg-final { gdb-test 16 "y" "2" } } */
+
+int
+main ()
+{
+  int l = 0;
+  asm volatile ("" : "=r" (l) : "0" (l));
+  foo (l + 1, l + 2, l + 3, l + 4, l + 5, l + 6, l + 30, 7);
+  asm volatile ("" :: "r" (l));
+  return 0;
+}


	Jakub

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipa-cp heuristics fixes
  2015-12-18 20:19                   ` Jakub Jelinek
@ 2015-12-18 21:42                     ` Jan Hubicka
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Hubicka @ 2015-12-18 21:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jan Hubicka, gcc-patches, Andreas Krebbel

> 
> The debug info on your testcase looks as expected (though,
> we really don't emit the DW_OP_GNU_parameter_ref for aggregates anyway
> - not sure if we could do something about those at least in the easy cases
> where the argument used to be stored into the struct shortly before the
> call.

Cool, I was basically curious if we can chain the changes w/o suprises.
> 
> Anyway, here is an updated patch where I'm not adding anything for
> skipped parameters that have tree_map replacement, and fixes some issues
> found during bootstrap/regtest (x86_64-linux and i686-linux).
> 
> Ok for trunk?
> 
> 2015-12-18  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR debug/68860
> 	* ipa-split.c (split_function): Only perform caller side
> 	modifications for decl_debug_args here.
> 	* cgraph.c: Include gimplify.h.
> 	(cgraph_edge::redirect_call_stmt_to_callee): Add caller side
> 	debug stmts for decl_debug_args.  Spelling fix in a comment.
> 	* tree-inline.c (tree_function_versioning): Populate decl_debug_args
> 	for args_to_skip arguments and add callee side debug stmts.
> 	Formatting fixes.  Avoid shadowing i variable.
> 
> 	* gcc.dg/guality/pr68860-1.c: New test.
> 	* gcc.dg/guality/pr68860-2.c: New test.

Looks good to me.
Thanks again for working on this!

Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-12-18 21:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10  7:30 ipa-cp heuristics fixes Jan Hubicka
2015-12-10 10:47 ` Martin Jambor
2015-12-10 16:56   ` Jan Hubicka
2015-12-11 12:03     ` Martin Jambor
2015-12-11 20:11 ` Jakub Jelinek
2015-12-11 20:40   ` Jan Hubicka
2015-12-11 21:20     ` Jan Hubicka
2015-12-14 14:29       ` Martin Jambor
2015-12-16  9:16 ` Dominik Vogt
2015-12-16  9:59   ` Richard Biener
2015-12-16 10:08     ` Jakub Jelinek
2015-12-16 10:21     ` Dominik Vogt
2015-12-16 16:24   ` Jan Hubicka
2015-12-16 16:29     ` Jakub Jelinek
2015-12-16 17:15       ` Jan Hubicka
2015-12-16 17:37         ` Jakub Jelinek
2015-12-16 19:15           ` Jan Hubicka
2015-12-17 18:12             ` Jakub Jelinek
2015-12-17 21:00               ` Jan Hubicka
2015-12-17 21:10                 ` Jan Hubicka
2015-12-18 20:19                   ` Jakub Jelinek
2015-12-18 21:42                     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).