Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Bernd Schmidt <bschmidt@redhat.com>
To: Kelvin Nilsen <kdnilsen@linux.vnet.ibm.com>, gcc-patches@gcc.gnu.org
Subject: Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
Date: Mon, 09 Nov 2015 17:35:00 -0000	[thread overview]
Message-ID: <5640D958.7010300@redhat.com> (raw)
In-Reply-To: <563E0E67.1090101@linux.vnet.ibm.com>

On 11/07/2015 03:44 PM, Kelvin Nilsen wrote:
> This is a draft patch to partially address the concerns described in
> bugzilla problem report
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is
> incomplete in the sense that there are some known shortcomings with
> nested loops which I am still working on.  I am sending this out for
> comments at this time because we would like these patches to be
> integrated into the GCC 6 release and want to begin responding to
> community feedback as soon as possible in order to make the integration
> possible.

I'll mainly comment on style points right now. Your code generally looks 
mostly good but doesn't quite follow our guidelines. In terms of logic 
I'm sure there will be followup questions after the first round of 
points is addressed. Others might be better qualified to review the 
frequencies manipulation; for this first round I'm just assuming that 
the general idea is sound (but would appreciate input).

> 1. Before a loop body is unpeeled into a pre-header location, we
> temporarily adjust the loop body frequencies to represent the values
> appropriate for the context into which the loop body is to be
> copied.
>
> 2. After unrolling the loop body (by replicating the loop body (N-1)
> times within the loop), we recompute all frequencies associated with
> blocks contained within the loop.

If these are independent from each other it might be better to split up 
the patch.

>          (check_loop_frequency_integrity): new helper routine
>          (set_zero_probability): added another parameter
>          (duplicate_loop_to_header_edge): Add code to recompute loop
> body frequencies after blocks are replicated (unrolled) into the loop
> body. Introduce certain help routines because existing infrastructure
> routines are not reliable during typical executions of
> duplicate_loop_to_header_edge().

Please review our guidelines how to write ChangeLogs - capitalize, 
punctuate, and document only the what, not the why. Also, linewrap manually.

>     opt_info_start_duplication (opt_info);
> +
>     ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),

Please make sure you don't change whitespace unnecessarily. There are a 
few other occurrences in the patch, and also cases where you seem to be 
adding spaces to otherwise blank lines.

> @@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop)
>     bitmap_clear_bit (wont_exit, may_exit_copy);
>     opt_info_start_duplication (opt_info);
>
> +  {
> +    /* Recompute the loop body frequencies. */
> +    zero_loop_frequencies (loop);

No reason to start a braced block here.

> +    /* Scale the incoming frequencies according to the heuristic that
> +     * the loop frequency is the incoming edge frequency divided by
> +     * 0.09.  This heuristic applies only to loops that iterate over a
> +     * run-time value that is not known at compile time.  Note that
> +     * 1/.09 equals 11.1111.  We'll use integer arithmetic on ten
> +     * thousandths, and then divide by 10,000 after we've "rounded".
> +     */

Please examine the comment style in gcc - no asterisks to start new 
lines, and comment terminators don't go on their own line.

> +    sum_incoming_frequencies *= 111111;  /* multiply by 11.1111 */
> +    sum_incoming_frequencies += 5000;    /* round by adding 0.5 */
> +    sum_incoming_frequencies /= 10000;	 /* convert ten thousandths
> +					    to ones
> +					 */

These comments could also be improved, but really they should just be 
removed since they're pretty obvious and redundant with the one before.

> +/* Define ENABLE_CHECKING to enforce the following run-time checks.

"With checking enabled, the following run-time checks are performed:"

> + * This may report false-positive errors due to round-off errors.

That doesn't sound good as it could lead to bootstrap failures when 
checking is enabled.

> @@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo
>   static bool fix_bb_placement (basic_block);
>   static void fix_bb_placements (basic_block, bool *, bitmap);
>
> +/*
> + * Return true iff block is considered to reside within the loop
> + * represented by loop_ptr.
> + */

Arguments are capitalized in function comments.

> +bool
> +in_loop_p (basic_block block, struct loop *loop_ptr)
> +{
> +  basic_block *bbs = get_loop_body (loop_ptr);
> +  bool result = false;
> +
> +  for (unsigned int i = 0; i < loop_ptr->num_nodes; i++)
> +    {
> +      if (bbs[i] == block)
> +	result = true;
> +    }

I think something that starts with bb->loop_father and iterates outwards 
would be more efficient.

> +/* A list of block_ladder_rung structs is used to keep track of all the
> + * blocks visited in a depth-first recursive traversal of a control-flow
> + * graph.  This list is used to detect and prevent attempts to revisit
> + * a block that is already being visited in the recursive traversal.
> + */
> +typedef struct block_ladder_rung {
> +  basic_block block;
> +  struct block_ladder_rung *lower_rung;
> +} *ladder_rung_p;

I was going to suggest a vec rather than manually dealing with lists, 
but it looks like you're using these in a rather unusual way in a 
recursive walk. I'm not actually sure that what you're doing entirely 
prevents revisiting a block, it only does so for a single recursive 
call-chain, but two separate recursive calls from the same level could 
still visit the same blocks. I'm not sure this is intended (the usual 
approach would be to use a (s)bitmap of visited blocks). More comments 
on the whole recursion scheme below.

> +/* Return true iff an_edge represents the same source and destination
> + * blocks as another_edge.
> + */
> +static bool
> +same_edge_p (edge an_edge, edge another_edge)
> +{
> +  return ((an_edge->src == another_edge->src)
> +	  && (an_edge->dest == another_edge->dest));
> +}

Formatting aside (extra parentheses), I wonder why you can't just test 
edges for pointer equality? Does anything make duplicate edges?

> +static void
> +recursively_zero_frequency (struct loop *loop_ptr, vec<edge> exit_edges,
> +			    ladder_rung_p ladder_rung,
> +			    edge incoming_edge)
> +{
> +  if (incoming_edge->dest == loop_ptr->header)
> +    return;
> +  else if (in_edge_set_p (incoming_edge, exit_edges))
> +    return;
> +  else if (in_call_chain_p (incoming_edge, ladder_rung))
> +    return;
> +  else
> +    {
> +      struct block_ladder_rung a_rung;
> +      basic_block block = incoming_edge->dest;
> +
> +      a_rung.block = block;
> +      a_rung.lower_rung = ladder_rung;
> +      block->frequency = 0;
> +
> +      edge_iterator ei;
> +      edge successor;
> +      FOR_EACH_EDGE (successor, ei, block->succs)
> +	{
> +	  recursively_zero_frequency (loop_ptr, exit_edges,
> +				      &a_rung, successor);
> +	}
> +    }
> +}

Does this really have to be recursive, and this complicated? I think 
something with a worklist and maybe a bitmap to indicate visited blocks 
would be considerably easier to understand, and the recursion could be a 
problem with very large testcases.

You might want to check whether something like dfs_enumerate_from or 
other functions in cfganal would be helpful.

> +static bool
> +recursion_detected_p (basic_block candidate, ladder_rung_p lower_steps) {

Don't place the open brace on the same line as the declaration.

> +/* Return true iff candidate is contained within the loop represented
> + * by loop_header and loop_latch.
> + *
> + * We consider the block to be within the loop if there exists a path
> + * within the control flow graph from this node to the loop's latch
> + * which does not pass through the loop's header.  (If all paths to
> + * the latch pass through the loop header, then the node is contained
> + * within an outer-nested loop but not within this loop.)
> + *
> + * Note that if a candidate's successor is in the loop and the successor
> + * is not the loop header, then the candidate itself is also in the loop.
> + * If none of the successors of a candidate are in the loop, then the
> + * candidate itself is not in the loop.
> + */
> +static bool
> +in_loop_of_header_p (basic_block candidate, basic_block loop_header,
> +		      basic_block loop_latch, bool start_of_recursion,
> +		      ladder_rung_p lower_steps)

This is another function that makes me wonder why it's so complicated. 
If you want to know whether a basic block is inside a loop, why is 
looking for it in the set returned by get_loop_body insufficient, or 
iterating backwards from the block's loop_father? And even so I think 
there are probably better approaches to writing this using worklists or 
existing helper functions, as mentioned above.

> +      return false;		  /* None of the successors was in loop	 */

Move comment before statement.

> +/* Return true iff block is an element of the block_set vector.
> + */
> +static bool
> +in_block_set_p (basic_block block, vec<basic_block> block_set)
> +{
> +  basic_block bb;
> +  unsigned int u;
> +  FOR_EACH_VEC_ELT (block_set, u, bb)
> +    {
> +      if (bb == block)
> +	return true;
> +    }
> +  return false;
> +}

You probably want to use (s)bitmaps so you don't have to do this 
potentially expensive iteration.

> +
> +/* Return a vector containing all of the edges that exit the loop
> + * represented by the loop_blocks vector.
> + */
> +static vec<edge>
> +get_exit_edges_from_loop_blocks (vec<basic_block> loop_blocks) {

How does this one differ from the existing get_loop_exit_edges? Ah, I 
see this:

> +  /* When zero_partial_loop_frequencies is invoked, the *loop_ptr
> +   * object is not entirely coherent, so the library service
> +   * get_loop_exit_edges (loop_p) cannot be called from this context.

Just how incoherent is the information? Is that fixable? I assume this 
is the same reason why you have get_loop_blocks and are using it instead 
of get_loop_body? That duplication is somewhat unfortunate and it would 
be nice to avoid it. I think this is the core point that needs to be 
addressed first. If this is unavoidable (and I really hope not), then it 
needs to be clearly documented which set of functions should be called 
under which circumstances.

> +	  if (!in_block_set_p (edge_dest, loop_blocks))
> +	    {
> +	      results.safe_push (successor);
> +	    }

No braces around single statements. Please fix throughout.

> + * Zero all frequencies for all blocks contained within the loop
> + * represented by loop_ptr which are reachable from block without
> + * passing through the block header.  If block is not within the loop,
> + * this has no effect.	The behavior is as outlined by the following

Lose the tab character in here.

> +static void
> +recursively_increment_frequency (struct loop *loop_ptr, vec<edge> exit_edges,
> +				 ladder_rung_p ladder_rung,
> +				 edge incoming_edge,
> +				 int frequency_increment)

Some of these would probably fit on one line which would make this more 
compact.

> +#ifdef ENABLE_CHECKING

We're getting rid of this #ifdef. Define unconditionally, and use 
flag_checking to determine whether to do the verification (if it is 
safe, see comment above - otherwise just make this a DEBUG_FUNCTION).

> +/**
> + * Issue a fatal error message and abort program execution.
> + */
> +static void
> +internal (const char *msg)
> +{
> +  fprintf (stderr, "Fatal internal error: %s\n", msg);
> +  exit (-1);
> +}

Unnecessary, we have fatal_error.

> + * The integrity check is problematic due to round-off errors.	Though

Another tab character.

> @@ -1101,7 +1649,7 @@ can_duplicate_loop_p (const struct loop *loop)
>      is redistributed evenly to the remaining edges coming from E->src.  */
>
>   static void
> -set_zero_probability (edge e)
> +set_zero_probability (struct loop *loop_ptr, edge e)

Arguments should be documented in the function comment.

> +   * KELVIN_TODO:

Just "TODO" if at all.

Bernd

next prev parent reply	other threads:[~2015-11-09 17:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-07 14:45 Kelvin Nilsen
2015-11-09 17:35 ` Bernd Schmidt [this message]
2015-11-10  9:56   ` Bernhard Reutner-Fischer
2015-11-10 10:16     ` Bernd Schmidt
2015-11-10 10:54   ` Richard Biener
2015-11-10 14:19     ` Michael Matz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5640D958.7010300@redhat.com \
    --to=bschmidt@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kdnilsen@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).