Re: [PATCH][v4] GIMPLE store merging pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, wschmidt@linux.vnet.ibm.com
Subject: Re: [PATCH][v4] GIMPLE store merging pass
Date: Thu, 29 Sep 2016 15:37:00 -0000	[thread overview]
Message-ID: <57ED30AB.402@foss.arm.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1609291133040.26629@t29.fhfr.qr>

Hi Richard,

Thanks for the detailed comments, I'll be working on addressing them.
Below are answers to some of your questions:

On 29/09/16 11:45, Richard Biener wrote:
>
> +
> +  /* If we're inserting a non-bytesized width or not at a byte boundary
> +     use an intermediate wide_int to perform the bit-insertion correctly.  */
> +  if (sub_byte_op_p
> +      || (TREE_CODE (expr) == INTEGER_CST
> +	  && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
> I wonder when we have BLKmode here ...
>
>> +    {
>> +      unsigned int byte_size = last_byte - first_byte + 1;
>> +
>> +      /* The functions native_encode_expr/native_interpret_expr uses the
>> +	 TYPE_MODE of the type to determine the number of bytes to write/read
>> +	 so if we want to process a number of bytes that does not have a
>> +	 TYPE_MODE of equal size we need to use a type that has a valid mode
>> +	 for it.  */
>> +
>> +      machine_mode mode
>> +	= smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
>> +      tree dest_int_type
>> +	= build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
>> +      byte_size = GET_MODE_SIZE (mode);
> ... how we ever get non-BLKmode here.

smallest_mode_for_size is guaranteed to never return BLKmode.
It returns a mode that contains at least the requested number of bits.
mode_for_size returns BLKmode if no mode fits the exact number of bits.

>
>> +      /* The region from the byte array that we're inserting into.  */
>> +      tree ptr_wide_int
>> +	= native_interpret_expr (dest_int_type, ptr + first_byte,
>> +				 total_bytes);
>> +
>> +      gcc_assert (ptr_wide_int);
>> +      wide_int dest_wide_int
>> +	= wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
>> +      wide_int expr_wide_int
>> +	= wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
>> +      if (BYTES_BIG_ENDIAN)
>> +	{
>> +	  unsigned int insert_pos
>> +	    = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
>> +	  dest_wide_int
>> +	    = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
>> +	}
>> +      else
>> +	dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
>> +				    bitpos % BITS_PER_UNIT, bitlen);
>> +
>> +      tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
>> +      native_encode_expr (res, ptr + first_byte, total_bytes, 0);
>> +
> OTOH this whole dance looks as complicated and way more expensive than
> using native_encode_expr into a temporary buffern and then a
> manually implemented "bit-merging" of it at ptr + first_byte + bitpos.
> AFAICS that operation is even endianess agnostic.

I did try implementing that, but it kept blowing up in big-endian.
I found it to be very fiddly. I can try again, but we'll see how it goes...

<snip>

>> +
>> +bool
>> +pass_store_merging::terminate_and_process_all_chains (basic_block bb)
>> +{
>> +  hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
>> +    = m_stores.begin ();
>> +  bool ret = false;
>> +  for (; iter != m_stores.end (); ++iter)
>> +    ret |= terminate_and_release_chain ((*iter).first);
>> +
>> +  if (ret)
>> +    gimple_purge_dead_eh_edges (bb);
> Why do you need this?  I don't see how merging stores should effect EH
> edges at all -- unless you are removing EH side-effects (ISTR you
> are not merging cross-BB).

I was seeing a testsuite failure in the C++ testsuite,
an ICE about EH edges. However, when I rerun the testsuite today
without the gimple_purge_dead_eh_edges call I don't see any fallout...
I have been rebasing the patch against trunk regularly so maybe something
change in the underlying trunk since then...

<snip>

> +
> +bool
> +imm_store_chain_info::coalesce_immediate_stores ()
> +{
> +  /* Anything less can't be processed.  */
> +  if (m_store_info.length () < 2)
> +    return false;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
> +	     m_store_info.length ());
> +
> +  store_immediate_info *info;
> +  unsigned int i;
> +
> +  /* Order the stores by the bitposition they write to.  */
> +  m_store_info.qsort (sort_by_bitpos);
> +
> +  info = m_store_info[0];
> +  merged_store_group *merged_store = new merged_store_group (info);
> +  m_merged_store_groups.safe_push (merged_store);
> +
> +  FOR_EACH_VEC_ELT (m_store_info, i, info)
> +    {
> +      if (dump_file)
> +	{
> +	  fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
> +			      " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
> +		   i, info->bitsize, info->bitpos);
> +	  print_generic_expr (dump_file, info->val, 0);
> +	  fprintf (dump_file, "\n------------\n");
> +	}
> +
> +      if (i == 0)
> +	continue;
> +
> +      /* |---store 1---|
> +	       |---store 2---|
> +       Overlapping stores.  */
> +      unsigned HOST_WIDE_INT start = info->bitpos;
> +      if (IN_RANGE (start, merged_store->start,
> +		    merged_store->start + merged_store->width - 1))
> +	{
> +	  merged_store->merge_overlapping (info);
> +	  continue;
> +	}
> +
> +      /* |---store 1---| <gap> |---store 2---|.
> +	 Gap between stores.  Start a new group.  */
> +      if (start != merged_store->start + merged_store->width)
> +	{
> +	  merged_store = new merged_store_group (info);
> +	  m_merged_store_groups.safe_push (merged_store);
> So this will create merged_store_group even for single-store groups
> (all stores have gaps between them)?  Can't we optimize this maybe
> common case to not perform the memory allocation?  Like handle
> this case first and lazily allocate a new group.  Looks like
> apply_stores also doesn't bail out early for single-element groups.

Yeah, we can fast-path single-store groups.

>> +
>> +      num_stmts++;
>> +
>> +      try_pos += try_size / BITS_PER_UNIT;
>> +
>> +      wi_offset += try_size;
>> +      size -= try_size;
>> +      align = try_size;
>> +      while (size < try_size)
>> +	try_size /= 2;
>> +
>> +      if (num_stmts >= orig_num_stmts)
>> +	{
> Hmm.  I think that if you'd end up with an unchanged stmt simply leave it
> there.  I don't see how you'd ever end up with more stmts than in the
> original source?

We don't ever end up with more statements, this could
be if (num_stmts == orig_num_stmts) to catch the case
where we can't improve on the original statement count.

>
>> +    return false;
>> +
>> +  return code != VOID_CST;
> Huh, do you really hit this?

Not really, I just looked through tree.def on what tree codes
were tcc_constant and removed those that were handled by native_encode_expr.
I can remove this.

>> +			     || !rhs_valid_for_store_merging_p (rhs)
>> +			     || TREE_THIS_VOLATILE (base_addr);
> TREE_THIS_VOLATILE is handled above already.

I found that it doesn't and caused various issues.
While the stmt didn't trigger gimple_has_volatile_ops the result of
get_inner reference could be TREE_THIS_VOLATILE. Not doing this check
caused havoc later on (duplicate entries in the hash table for the same
tree because apparently volatile trees don't hash to the same location).

Thanks,
Kyrill

next prev parent reply	other threads:[~2016-09-29 15:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-28 16:00 Kyrill Tkachov
2016-09-28 16:07 ` Bill Schmidt
2016-09-28 16:09   ` Kyrill Tkachov
2016-09-29 16:01   ` Pat Haugen
2016-09-29 16:10     ` Kyrill Tkachov
2016-09-28 17:32 ` Pat Haugen
2016-09-29  8:02   ` Richard Biener
2016-09-29  8:24     ` Kyrill Tkachov
2016-09-29 10:54 ` Richard Biener
2016-09-29 15:37   ` Kyrill Tkachov [this message]
2016-09-30  7:09     ` Richard Biener
2016-09-30 15:25   ` Kyrill Tkachov
2016-09-30 15:34     ` Kyrill Tkachov
2016-09-30 17:02       ` Richard Biener
2016-09-30 16:58   ` Kyrill Tkachov
2016-10-04  8:18     ` Richard Biener
2016-10-03 13:02   ` Kyrill Tkachov
2016-10-03 16:43     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57ED30AB.402@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=wschmidt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).