Re: [PATCH][v4] GIMPLE store merging pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, wschmidt@linux.vnet.ibm.com
Subject: Re: [PATCH][v4] GIMPLE store merging pass
Date: Fri, 30 Sep 2016 07:09:00 -0000	[thread overview]
Message-ID: <alpine.LSU.2.11.1609300852060.26629@t29.fhfr.qr> (raw)
In-Reply-To: <57ED30AB.402@foss.arm.com>

On Thu, 29 Sep 2016, Kyrill Tkachov wrote:

> Hi Richard,
> 
> Thanks for the detailed comments, I'll be working on addressing them.
> Below are answers to some of your questions:
> 
> On 29/09/16 11:45, Richard Biener wrote:
> > 
> > +
> > +  /* If we're inserting a non-bytesized width or not at a byte boundary
> > +     use an intermediate wide_int to perform the bit-insertion correctly.
> > */
> > +  if (sub_byte_op_p
> > +      || (TREE_CODE (expr) == INTEGER_CST
> > +	  && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
> > I wonder when we have BLKmode here ...
> > 
> > > +    {
> > > +      unsigned int byte_size = last_byte - first_byte + 1;
> > > +
> > > +      /* The functions native_encode_expr/native_interpret_expr uses the
> > > +	 TYPE_MODE of the type to determine the number of bytes to write/read
> > > +	 so if we want to process a number of bytes that does not have a
> > > +	 TYPE_MODE of equal size we need to use a type that has a valid mode
> > > +	 for it.  */
> > > +
> > > +      machine_mode mode
> > > +	= smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
> > > +      tree dest_int_type
> > > +	= build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
> > > +      byte_size = GET_MODE_SIZE (mode);
> > ... how we ever get non-BLKmode here.
> 
> smallest_mode_for_size is guaranteed to never return BLKmode.
> It returns a mode that contains at least the requested number of bits.
> mode_for_size returns BLKmode if no mode fits the exact number of bits.

But then it's really about GET_MODE_SIZE (TYPE_MODE (expr)) vs.
TYPE_PRECISION (expr).  I see no need to invoke [smallest_]mode_for_size.

> > 
> > > +      /* The region from the byte array that we're inserting into.  */
> > > +      tree ptr_wide_int
> > > +	= native_interpret_expr (dest_int_type, ptr + first_byte,
> > > +				 total_bytes);
> > > +
> > > +      gcc_assert (ptr_wide_int);
> > > +      wide_int dest_wide_int
> > > +	= wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
> > > +      wide_int expr_wide_int
> > > +	= wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
> > > +      if (BYTES_BIG_ENDIAN)
> > > +	{
> > > +	  unsigned int insert_pos
> > > +	    = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
> > > +	  dest_wide_int
> > > +	    = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
> > > +	}
> > > +      else
> > > +	dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
> > > +				    bitpos % BITS_PER_UNIT, bitlen);
> > > +
> > > +      tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
> > > +      native_encode_expr (res, ptr + first_byte, total_bytes, 0);
> > > +
> > OTOH this whole dance looks as complicated and way more expensive than
> > using native_encode_expr into a temporary buffern and then a
> > manually implemented "bit-merging" of it at ptr + first_byte + bitpos.
> > AFAICS that operation is even endianess agnostic.
> 
> I did try implementing that, but it kept blowing up in big-endian.

I'm really curious how ;)  And I'm willing to help debugging in
case you have sth basic working.  I suggest to work on a byte
granularity here (bytes have no endianess!).

> I found it to be very fiddly. I can try again, but we'll see how it goes...
> 
> <snip>
> 
> > > +
> > > +bool
> > > +pass_store_merging::terminate_and_process_all_chains (basic_block bb)
> > > +{
> > > +  hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator
> > > iter
> > > +    = m_stores.begin ();
> > > +  bool ret = false;
> > > +  for (; iter != m_stores.end (); ++iter)
> > > +    ret |= terminate_and_release_chain ((*iter).first);
> > > +
> > > +  if (ret)
> > > +    gimple_purge_dead_eh_edges (bb);
> > Why do you need this?  I don't see how merging stores should effect EH
> > edges at all -- unless you are removing EH side-effects (ISTR you
> > are not merging cross-BB).
> 
> I was seeing a testsuite failure in the C++ testsuite,
> an ICE about EH edges. However, when I rerun the testsuite today
> without the gimple_purge_dead_eh_edges call I don't see any fallout...
> I have been rebasing the patch against trunk regularly so maybe something
> change in the underlying trunk since then...

I'd doubt that.  I don't see any check for whether a store may throw
internally so that missing check certainly would explain things.
You'd need -fnon-call-exceptions but you also need to have an
interesting to achieve setup where you have one store of a group marked as
not throwing and one as throwing.

I'd say throw some stmt_can_throw_internal in and mark such stores as
invalid and remove the edge purging.  That should do the trick.

> <snip>
> 
> > +
> > +bool
> > +imm_store_chain_info::coalesce_immediate_stores ()
> > +{
> > +  /* Anything less can't be processed.  */
> > +  if (m_store_info.length () < 2)
> > +    return false;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
> > +	     m_store_info.length ());
> > +
> > +  store_immediate_info *info;
> > +  unsigned int i;
> > +
> > +  /* Order the stores by the bitposition they write to.  */
> > +  m_store_info.qsort (sort_by_bitpos);
> > +
> > +  info = m_store_info[0];
> > +  merged_store_group *merged_store = new merged_store_group (info);
> > +  m_merged_store_groups.safe_push (merged_store);
> > +
> > +  FOR_EACH_VEC_ELT (m_store_info, i, info)
> > +    {
> > +      if (dump_file)
> > +	{
> > +	  fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
> > +			      " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
> > +		   i, info->bitsize, info->bitpos);
> > +	  print_generic_expr (dump_file, info->val, 0);
> > +	  fprintf (dump_file, "\n------------\n");
> > +	}
> > +
> > +      if (i == 0)
> > +	continue;
> > +
> > +      /* |---store 1---|
> > +	       |---store 2---|
> > +       Overlapping stores.  */
> > +      unsigned HOST_WIDE_INT start = info->bitpos;
> > +      if (IN_RANGE (start, merged_store->start,
> > +		    merged_store->start + merged_store->width - 1))
> > +	{
> > +	  merged_store->merge_overlapping (info);
> > +	  continue;
> > +	}
> > +
> > +      /* |---store 1---| <gap> |---store 2---|.
> > +	 Gap between stores.  Start a new group.  */
> > +      if (start != merged_store->start + merged_store->width)
> > +	{
> > +	  merged_store = new merged_store_group (info);
> > +	  m_merged_store_groups.safe_push (merged_store);
> > So this will create merged_store_group even for single-store groups
> > (all stores have gaps between them)?  Can't we optimize this maybe
> > common case to not perform the memory allocation?  Like handle
> > this case first and lazily allocate a new group.  Looks like
> > apply_stores also doesn't bail out early for single-element groups.
> 
> Yeah, we can fast-path single-store groups.
> 
> > > +
> > > +      num_stmts++;
> > > +
> > > +      try_pos += try_size / BITS_PER_UNIT;
> > > +
> > > +      wi_offset += try_size;
> > > +      size -= try_size;
> > > +      align = try_size;
> > > +      while (size < try_size)
> > > +	try_size /= 2;
> > > +
> > > +      if (num_stmts >= orig_num_stmts)
> > > +	{
> > Hmm.  I think that if you'd end up with an unchanged stmt simply leave it
> > there.  I don't see how you'd ever end up with more stmts than in the
> > original source?
> 
> We don't ever end up with more statements, this could
> be if (num_stmts == orig_num_stmts) to catch the case
> where we can't improve on the original statement count.
>
> > 
> > > +    return false;
> > > +
> > > +  return code != VOID_CST;
> > Huh, do you really hit this?
> 
> Not really, I just looked through tree.def on what tree codes
> were tcc_constant and removed those that were handled by native_encode_expr.
> I can remove this.
> 
> > > +			     || !rhs_valid_for_store_merging_p (rhs)
> > > +			     || TREE_THIS_VOLATILE (base_addr);
> > TREE_THIS_VOLATILE is handled above already.
> 
> I found that it doesn't and caused various issues.
> While the stmt didn't trigger gimple_has_volatile_ops the result of
> get_inner reference could be TREE_THIS_VOLATILE. Not doing this check
> caused havoc later on (duplicate entries in the hash table for the same
> tree because apparently volatile trees don't hash to the same location).

Yes, you can have a MEM[&volatile-decl] that is actually not a volatile
access but get_inner_reference will return volatile-decl as base
(now marked TREE_THIS_VOLATILE).  The hashing will not consider those
equal.  I suppose this case doesn't matter enough to be worth
"fixing" in some way but please add a comment above the TREE_THIS_VOLATILE
check about its reason.

Thanks,
Richard.

next prev parent reply	other threads:[~2016-09-30  7:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-28 16:00 Kyrill Tkachov
2016-09-28 16:07 ` Bill Schmidt
2016-09-28 16:09   ` Kyrill Tkachov
2016-09-29 16:01   ` Pat Haugen
2016-09-29 16:10     ` Kyrill Tkachov
2016-09-28 17:32 ` Pat Haugen
2016-09-29  8:02   ` Richard Biener
2016-09-29  8:24     ` Kyrill Tkachov
2016-09-29 10:54 ` Richard Biener
2016-09-29 15:37   ` Kyrill Tkachov
2016-09-30  7:09     ` Richard Biener [this message]
2016-09-30 15:25   ` Kyrill Tkachov
2016-09-30 15:34     ` Kyrill Tkachov
2016-09-30 17:02       ` Richard Biener
2016-09-30 16:58   ` Kyrill Tkachov
2016-10-04  8:18     ` Richard Biener
2016-10-03 13:02   ` Kyrill Tkachov
2016-10-03 16:43     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1609300852060.26629@t29.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=wschmidt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).