From: Richard Biener <rguenther@suse.de>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, wschmidt@linux.vnet.ibm.com
Subject: Re: [PATCH][v4] GIMPLE store merging pass
Date: Fri, 30 Sep 2016 07:09:00 -0000 [thread overview]
Message-ID: <alpine.LSU.2.11.1609300852060.26629@t29.fhfr.qr> (raw)
In-Reply-To: <57ED30AB.402@foss.arm.com>
On Thu, 29 Sep 2016, Kyrill Tkachov wrote:
> Hi Richard,
>
> Thanks for the detailed comments, I'll be working on addressing them.
> Below are answers to some of your questions:
>
> On 29/09/16 11:45, Richard Biener wrote:
> >
> > +
> > + /* If we're inserting a non-bytesized width or not at a byte boundary
> > + use an intermediate wide_int to perform the bit-insertion correctly.
> > */
> > + if (sub_byte_op_p
> > + || (TREE_CODE (expr) == INTEGER_CST
> > + && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
> > I wonder when we have BLKmode here ...
> >
> > > + {
> > > + unsigned int byte_size = last_byte - first_byte + 1;
> > > +
> > > + /* The functions native_encode_expr/native_interpret_expr uses the
> > > + TYPE_MODE of the type to determine the number of bytes to write/read
> > > + so if we want to process a number of bytes that does not have a
> > > + TYPE_MODE of equal size we need to use a type that has a valid mode
> > > + for it. */
> > > +
> > > + machine_mode mode
> > > + = smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
> > > + tree dest_int_type
> > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
> > > + byte_size = GET_MODE_SIZE (mode);
> > ... how we ever get non-BLKmode here.
>
> smallest_mode_for_size is guaranteed to never return BLKmode.
> It returns a mode that contains at least the requested number of bits.
> mode_for_size returns BLKmode if no mode fits the exact number of bits.
But then it's really about GET_MODE_SIZE (TYPE_MODE (expr)) vs.
TYPE_PRECISION (expr). I see no need to invoke [smallest_]mode_for_size.
> >
> > > + /* The region from the byte array that we're inserting into. */
> > > + tree ptr_wide_int
> > > + = native_interpret_expr (dest_int_type, ptr + first_byte,
> > > + total_bytes);
> > > +
> > > + gcc_assert (ptr_wide_int);
> > > + wide_int dest_wide_int
> > > + = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
> > > + wide_int expr_wide_int
> > > + = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
> > > + if (BYTES_BIG_ENDIAN)
> > > + {
> > > + unsigned int insert_pos
> > > + = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
> > > + dest_wide_int
> > > + = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
> > > + }
> > > + else
> > > + dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
> > > + bitpos % BITS_PER_UNIT, bitlen);
> > > +
> > > + tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
> > > + native_encode_expr (res, ptr + first_byte, total_bytes, 0);
> > > +
> > OTOH this whole dance looks as complicated and way more expensive than
> > using native_encode_expr into a temporary buffern and then a
> > manually implemented "bit-merging" of it at ptr + first_byte + bitpos.
> > AFAICS that operation is even endianess agnostic.
>
> I did try implementing that, but it kept blowing up in big-endian.
I'm really curious how ;) And I'm willing to help debugging in
case you have sth basic working. I suggest to work on a byte
granularity here (bytes have no endianess!).
> I found it to be very fiddly. I can try again, but we'll see how it goes...
>
> <snip>
>
> > > +
> > > +bool
> > > +pass_store_merging::terminate_and_process_all_chains (basic_block bb)
> > > +{
> > > + hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator
> > > iter
> > > + = m_stores.begin ();
> > > + bool ret = false;
> > > + for (; iter != m_stores.end (); ++iter)
> > > + ret |= terminate_and_release_chain ((*iter).first);
> > > +
> > > + if (ret)
> > > + gimple_purge_dead_eh_edges (bb);
> > Why do you need this? I don't see how merging stores should effect EH
> > edges at all -- unless you are removing EH side-effects (ISTR you
> > are not merging cross-BB).
>
> I was seeing a testsuite failure in the C++ testsuite,
> an ICE about EH edges. However, when I rerun the testsuite today
> without the gimple_purge_dead_eh_edges call I don't see any fallout...
> I have been rebasing the patch against trunk regularly so maybe something
> change in the underlying trunk since then...
I'd doubt that. I don't see any check for whether a store may throw
internally so that missing check certainly would explain things.
You'd need -fnon-call-exceptions but you also need to have an
interesting to achieve setup where you have one store of a group marked as
not throwing and one as throwing.
I'd say throw some stmt_can_throw_internal in and mark such stores as
invalid and remove the edge purging. That should do the trick.
> <snip>
>
> > +
> > +bool
> > +imm_store_chain_info::coalesce_immediate_stores ()
> > +{
> > + /* Anything less can't be processed. */
> > + if (m_store_info.length () < 2)
> > + return false;
> > +
> > + if (dump_file && (dump_flags & TDF_DETAILS))
> > + fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
> > + m_store_info.length ());
> > +
> > + store_immediate_info *info;
> > + unsigned int i;
> > +
> > + /* Order the stores by the bitposition they write to. */
> > + m_store_info.qsort (sort_by_bitpos);
> > +
> > + info = m_store_info[0];
> > + merged_store_group *merged_store = new merged_store_group (info);
> > + m_merged_store_groups.safe_push (merged_store);
> > +
> > + FOR_EACH_VEC_ELT (m_store_info, i, info)
> > + {
> > + if (dump_file)
> > + {
> > + fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
> > + " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
> > + i, info->bitsize, info->bitpos);
> > + print_generic_expr (dump_file, info->val, 0);
> > + fprintf (dump_file, "\n------------\n");
> > + }
> > +
> > + if (i == 0)
> > + continue;
> > +
> > + /* |---store 1---|
> > + |---store 2---|
> > + Overlapping stores. */
> > + unsigned HOST_WIDE_INT start = info->bitpos;
> > + if (IN_RANGE (start, merged_store->start,
> > + merged_store->start + merged_store->width - 1))
> > + {
> > + merged_store->merge_overlapping (info);
> > + continue;
> > + }
> > +
> > + /* |---store 1---| <gap> |---store 2---|.
> > + Gap between stores. Start a new group. */
> > + if (start != merged_store->start + merged_store->width)
> > + {
> > + merged_store = new merged_store_group (info);
> > + m_merged_store_groups.safe_push (merged_store);
> > So this will create merged_store_group even for single-store groups
> > (all stores have gaps between them)? Can't we optimize this maybe
> > common case to not perform the memory allocation? Like handle
> > this case first and lazily allocate a new group. Looks like
> > apply_stores also doesn't bail out early for single-element groups.
>
> Yeah, we can fast-path single-store groups.
>
> > > +
> > > + num_stmts++;
> > > +
> > > + try_pos += try_size / BITS_PER_UNIT;
> > > +
> > > + wi_offset += try_size;
> > > + size -= try_size;
> > > + align = try_size;
> > > + while (size < try_size)
> > > + try_size /= 2;
> > > +
> > > + if (num_stmts >= orig_num_stmts)
> > > + {
> > Hmm. I think that if you'd end up with an unchanged stmt simply leave it
> > there. I don't see how you'd ever end up with more stmts than in the
> > original source?
>
> We don't ever end up with more statements, this could
> be if (num_stmts == orig_num_stmts) to catch the case
> where we can't improve on the original statement count.
>
> >
> > > + return false;
> > > +
> > > + return code != VOID_CST;
> > Huh, do you really hit this?
>
> Not really, I just looked through tree.def on what tree codes
> were tcc_constant and removed those that were handled by native_encode_expr.
> I can remove this.
>
> > > + || !rhs_valid_for_store_merging_p (rhs)
> > > + || TREE_THIS_VOLATILE (base_addr);
> > TREE_THIS_VOLATILE is handled above already.
>
> I found that it doesn't and caused various issues.
> While the stmt didn't trigger gimple_has_volatile_ops the result of
> get_inner reference could be TREE_THIS_VOLATILE. Not doing this check
> caused havoc later on (duplicate entries in the hash table for the same
> tree because apparently volatile trees don't hash to the same location).
Yes, you can have a MEM[&volatile-decl] that is actually not a volatile
access but get_inner_reference will return volatile-decl as base
(now marked TREE_THIS_VOLATILE). The hashing will not consider those
equal. I suppose this case doesn't matter enough to be worth
"fixing" in some way but please add a comment above the TREE_THIS_VOLATILE
check about its reason.
Thanks,
Richard.
next prev parent reply other threads:[~2016-09-30 7:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-28 16:00 Kyrill Tkachov
2016-09-28 16:07 ` Bill Schmidt
2016-09-28 16:09 ` Kyrill Tkachov
2016-09-29 16:01 ` Pat Haugen
2016-09-29 16:10 ` Kyrill Tkachov
2016-09-28 17:32 ` Pat Haugen
2016-09-29 8:02 ` Richard Biener
2016-09-29 8:24 ` Kyrill Tkachov
2016-09-29 10:54 ` Richard Biener
2016-09-29 15:37 ` Kyrill Tkachov
2016-09-30 7:09 ` Richard Biener [this message]
2016-09-30 15:25 ` Kyrill Tkachov
2016-09-30 15:34 ` Kyrill Tkachov
2016-09-30 17:02 ` Richard Biener
2016-09-30 16:58 ` Kyrill Tkachov
2016-10-04 8:18 ` Richard Biener
2016-10-03 13:02 ` Kyrill Tkachov
2016-10-03 16:43 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.1609300852060.26629@t29.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
--cc=wschmidt@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).