From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-240373-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24049 invoked by alias); 10 May 2009 11:48:24 -0000
Received: (qmail 23832 invoked by uid 22791); 10 May 2009 11:48:16 -0000
X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 	tests=AWL,BAYES_00,J_CHICKENPOX_35,J_CHICKENPOX_51
X-Spam-Check-By: sourceware.org
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 10 May 2009 11:48:05 +0000
Received: from relay2.suse.de (mail2.suse.de [195.135.221.8]) 	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) 	(No client certificate requested) 	by mx2.suse.de (Postfix) with ESMTP id 9B1BA79727; 	Sun, 10 May 2009 13:48:01 +0200 (CEST)
Date: Sun, 10 May 2009 11:48:00 -0000
From: Richard Guenther <rguenther@suse.de>
To: Martin Jambor <mjambor@suse.cz>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, 	Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [PATCH 3/5] New intraprocedural Scalar Reduction of  Aggregates.
In-Reply-To: <20090510103233.GA28857@alvy.suse.cz>
Message-ID: <alpine.LNX.2.00.0905101302200.25789@zhemvz.fhfr.qr>
References: <20090428100429.051912011@virgil.suse.cz> <20090428100449.910685488@virgil.suse.cz> <20090428101423.GA23578@virgil.suse.cz> <alpine.LNX.2.00.0904291228350.25789@zhemvz.fhfr.qr> <20090510103233.GA28857@alvy.suse.cz>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2009-05/txt/msg00508.txt.bz2

On Sun, 10 May 2009, Martin Jambor wrote:

> > >       expr = TREE_OPERAND (expr, 0);
> > >       bit_ref = true;
> > >     }
> > >   else
> > >     bit_ref = false;
> > > 
> > >   while (TREE_CODE (expr) == NOP_EXPR
> > 
> > CONVERT_EXPR_P (expr)
> 
> OK... but  at another place  in the email  you said it might  not even
> appear in a valid gimple statement?  Should I remove it altogether?

Indeed.  If you not build trees from tuple stmts then a
NOP_EXPR cannot appear as a rhs1 or rhs2 of an assignment (instead
it is always the subcode in gimple stmt and the rhs1 is simply sth
valid for is_gimple_val).

> > > 	 || TREE_CODE (expr) == VIEW_CONVERT_EXPR
> > > 	 || TREE_CODE (expr) == REALPART_EXPR
> > > 	 || TREE_CODE (expr) == IMAGPART_EXPR)
> > >     expr = TREE_OPERAND (expr, 0);
> > 
> > Why do this here btw, and not just lump ...
> > 
> > >   switch (TREE_CODE (expr))
> > >     {
> > >     case ADDR_EXPR:
> > >     case SSA_NAME:
> > >     case INDIRECT_REF:
> > >       break;
> > > 
> > >     case VAR_DECL:
> > >     case PARM_DECL:
> > >     case RESULT_DECL:
> > >     case COMPONENT_REF:
> > >     case ARRAY_REF:
> > >       ret = create_access (expr, write);
> > >       break;
> > 
> > ... this ...
> > 
> > >     case REALPART_EXPR:
> > >     case IMAGPART_EXPR:
> > >       expr = TREE_OPERAND (expr, 0);
> > >       ret = create_access (expr, write);
> > 
> > ... and this together?  Won't you create bogus accesses if you
> > strip for example IMAGPART_EXPR (which has non-zero offset)?
> 
> That would  break the complex  number into its components.   I thought
> that they are  meant to stay together for  some reason, otherwise they
> would not be represented explicitly  in gimple... do you think it does
> not matter?  What about vectors then?
> 
> The access is not bogus because modification functions take care of
> these statements in a special way.  However, if it is indeed OK to
> split complex numbers into their components, I will gladly simplify
> this as you suggested.

Yes, it is valid to split them (and complex lowering indeed does that).
It _might_ be useful to keep a complex together in a single SSA_NAME
for optimization purposes, but I guess you detect that anyway if there
is a read of the whole complex element into a register and keep it
that way.

I would favor simplifying SRA in this case and just split them if
that is valid.

> > >       break;
> > > 
> > >     case ARRAY_RANGE_REF:
> > 
> > it should just be handled fine I think.
> 
> OK, I will try that at some later stage.
> 
> > >     default:
> > >       walk_tree (&safe_expr, disqualify_all, NULL, NULL);
> > 
> > and if not, this should just disqualify the base of the access, like
> > get_base_address (safe_expr) (save_expr you mean?) and then if that
> > is a DECL, disqualify that decl.
> 
> I'll test just doing nothing, things handled by get_base_address are
> either fine or already accounted for.

Fine with me.

> > >     return SRA_SA_NONE;
> > >
> > >   lhs_ptr = gimple_assign_lhs_ptr (stmt);
> > >   rhs_ptr = gimple_assign_rhs1_ptr (stmt);
> > 
> > you probably don't need to pass pointers to trees everywhere as you
> > are not changing them.
> 
> Well, this  function is a  callback called by scan_function  which can
> also  call  sra_modify_expr  in  the  last  stage  of  the  pass  when
> statements  are modified.   I have  considered splitting  the function
> into two but  in the end I  thought they would be too  similar and the
> overhead is hopefully manageable.

Yeah, I noticed this later.  It is somewhat confusing at first sight,
so maybe just amending the comment before this function could
clarify things.

> > >   if (disqualify_ops_if_throwing_stmt (stmt, lhs_ptr, rhs_ptr))
> > >     return SRA_SA_NONE;
> > > 
> > >   racc = build_access_from_expr_1 (rhs_ptr, gsi, false);
> > >   lacc = build_access_from_expr_1 (lhs_ptr, gsi, true);
> > 
> > just avoid calling into build_access_from_expr_1 for SSA_NAMEs
> > or is_gimple_min_invariant lhs/rhs, that should make that
> > function more regular.
> 
> In what sense?  build_access_from_expr_1 looks at TREE_CODE anyway and
> can discard the two cases,  without for example looking into ADR_EXPRs
> like is_gimple_min_invariant().
> 
> But if you really think it is indeed beneficial, I can do that, sure -
> to me it just looks ugly).

Ok, just keep it as is.

> > >   if (lacc && racc
> > >       && !lacc->grp_unscalarizable_region
> > >       && !racc->grp_unscalarizable_region
> > >       && AGGREGATE_TYPE_P (TREE_TYPE (*lhs_ptr))
> > >       && lacc->size <= racc->size
> > >       && useless_type_conversion_p (lacc->type, racc->type))
> > 
> > useless_type_conversion_p should be always true here.
> 
> I don't think so, build_access_from_expr_1 can look through V_C_Es and
> the types of accesses are the type of the operand in such cases..

Ok, but what is the point of looking through V_C_Es there if it makes
this test fail?  Hmm, IIRC this was only to track struct copies, right?
I guess it's ok then.

> > That would just be useless information.  I guess you copied this
> > from old SRA?
> 
> Yes.  All this fancy naming stuff  is quite useless but I find it very
> handy when debugging SRA issues.

Yeah, sort of.  Still using no name in that case will do exactly
the same thing ;)

> > > static tree
> > > create_access_replacement (struct access *access)
> > > {
> > >   tree repl;
> > > 
> > >   repl = make_rename_temp (access->type, "SR");
> > >   get_var_ann (repl);
> > >   add_referenced_var (repl);
> > > 
> > >   DECL_SOURCE_LOCATION (repl) = DECL_SOURCE_LOCATION (access->base);
> > >   DECL_ARTIFICIAL (repl) = 1;
> > > 
> > >   if (DECL_NAME (access->base) && !DECL_IGNORED_P (access->base))
> > 
> > at least && !DECL_ARTIFICIAL (access->base) I think.
> 
> This part is also largely copied from the old SRA.  So far it seems to
> work nicely, replacements of artificial declarations get SRsome_number
> fancy names and that makes  them easy to distinguish.  Nevertheless, I
> can change the condition if it is somehow wrong.  Or do you expect any
> other problems beside not-so-fancy fancy names?

No, it merely uses up memory.  Not that other passes do not do this ...

Thus, I probably do not care too much.

> > >   if (access->grp_bfr_lhs)
> > >     DECL_GIMPLE_REG_P (repl) = 0;
> > 
> > But you never set it (see update_address_taken for more cases,
> > most notably VIEW_CONVERT_EXPR on the lhs which need to be taken
> > care of).  You should set it for COMPLEX_TYPE and VECTOR_TYPE 
> > replacements.
> 
> This function  is the  only place where  I still  use make_rename_temp
> which sets it  exactly in these two cases.  I did  not really know why
> it is  required in these two  cases and only  in these two cases  so I
> left it there, at least for  now.  I guess I understand that now after
> seeing update_address_taken.
> 
> I can  replace this  with calling create_tmp_var()  and doing  all the
> rest  that make_rename_temp does  - I  believe that  you intend  to to
> remove it - I have just not found out why it is so bad.

The bad thing about it is that it supports using the SSA renamer
to write a single variable into SSA.  That is usually more costly
than just manually allocating SSA_NAMEs and updating SSA form,
which is usally very easy.

It's not used much, in which case the easiest thing might be to
fix all remaining uses to manually update SSA form.

But yes, I now see why that zeroing is necessary.

> > 
> > CONVERT_EXPR_P (expr)
> > 
> > >       || TREE_CODE (expr) == VIEW_CONVERT_EXPR)
> > 
> > VIEW_CONVERT_EXPR is also a handled_component_p.
> > 
> > Note that NOP_EXPR should never occur here - that would be invalid
> > gimple.  So I think you can (and should) just delete the above.
> 
> I haven't  seen a NOP_EXPR for a  while, do they still  exist in lower
> gimple?  Thus I have removed their handling.
> 
> Removing diving through V_C_E breaks ADA, though.  The reason is that
> we get a different size (and max_size) when calling
> get_ref_base_and_extent on the V_C_E and on its argument.  However, I
> believe both should be represented by a single access representative.

Yeah, I remember this :/  It is technically invalid GIMPLE that the
Ada FE generates though.  The size of the V_C_E result has to match
that of the operand.

Please add a FIXME before this stripping refering to the Ada problem.

> > >   tree repl = get_access_replacement (access);
> > >   if (!TREE_ADDRESSABLE (type))
> > >     {
> > >       tree tmp = create_tmp_var (type, "SRvce");
> > > 
> > >       add_referenced_var (tmp);
> > >       if (is_gimple_reg_type (type))
> > > 	tmp = make_ssa_name (tmp, NULL);
> > 
> > Should be always is_gimple_reg_type () if it is a type suitable for
> > a SRA scalar replacement. 
> 
> No, it is the type suitable for  the statement, it can be a union type
> or a record with only one field. But see the more thorough explanation
> below...

I think it should be always a register type, but see below... ;)

> > But you should set DECL_GIMPLE_REG_P for
> > VECTOR and COMPLEX types here.
> > 
> > >       if (write)
> > > 	{
> > > 	  gimple stmt;
> > > 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (repl), tmp);
> > 
> > This needs to either always fold to plain 'tmp' or tmp has to be a
> > non-register.  Otherwise you will create invalid gimple.
> > 
> > > 	  *expr = tmp;
> > > 	  if (is_gimple_reg_type (type))
> > > 	    SSA_NAME_DEF_STMT (tmp) = gsi_stmt (*gsi);
> > 
> > See above.
> > 
> > > 	  stmt = gimple_build_assign (repl, conv);
> > > 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> > > 	  update_stmt (stmt);
> > > 	}
> > >       else
> > > 	{
> > > 	  gimple stmt;
> > > 	  tree conv = fold_build1 (VIEW_CONVERT_EXPR, type, repl);
> > > 
> > > 	  stmt = gimple_build_assign (tmp, conv);
> > > 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > > 	  if (is_gimple_reg_type (type))
> > > 	    SSA_NAME_DEF_STMT (tmp) = stmt;
> > 
> > See above.  (I wonder if the patch still passes bootstrap & regtest
> > after the typecking patch)
> > 
> > > 	  *expr = tmp;
> > > 	  update_stmt (stmt);
> > > 	}
> > >     }
> > >   else
> > >     {
> > >       if (write)
> > > 	{
> > > 	  gimple stmt;
> > > 
> > > 	  stmt = gimple_build_assign (repl, unshare_expr (access->expr));
> > > 	  gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> > > 	  update_stmt (stmt);
> > > 	}
> > >       else
> > > 	{
> > > 	  gimple stmt;
> > > 
> > > 	  stmt = gimple_build_assign (unshare_expr (access->expr), repl);
> > > 	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > > 	  update_stmt (stmt);
> > > 	}
> > 
> > I don't understand this path.  Are the types here always compatible?
> 
> And I don't really understand the comments.  The function is called by
> sra_modify_expr (the function doing the replacements in all non-assign
> statements) when it  needs to replace a reference by  a scalar but the
> types don't  match.  This can happen  when replacing a  V_C_E, a union
> access  when we  picked a  different  type that  the one  used in  the
> statement or (and this case can be remarkably irritating) an access to
> a records with only one (scalar) field.
> 
> My original idea  was to simply put a V_C_E in  the place.  However, I
> believe there are places where this  is not possible - or at least one
> case, a  LHS of  a call statement  because V_C_Es  of gimple_registers
> (ssa_names) are not allowed on  LHSs.  My initial idea to handle these
> cases  were to  create a  new temporary  with a  matching and  a V_C_E
> assign statement  (with the V_C_E always  on the RHS -  I believe that
> works even  with gimple  registers) that would  do the  conversion and
> load/store  it   to  the  replacement  variable  (this   is  what  the
> !TREE_ADDRESSABLE branch does).
> 
> The problem  with this idea  are TREE_ADDRESSABLE types.   These types
> need to be  constructed and thus we cannot  create temporary variables
> of these types.   On the other hand they absolutely  need to be SRAed,
> not doing  so slows down tramp3d by  a factor of two  (and the current
> SRA also breaks them up).  And  quite a few C++ classes are such types
> that   are  "non-addressable"   and  have   only  one   scalar  field.
> Identifying  such records  is possible,  I  soon realized  that I  can
> simply leave the statement as it  is and produce a new statement to do
> load/store  from the original  field (that's  what the  outermost else
> branch does).
> 
> Does this make sense or is there some fundamental flaw in my reasoning
> about gimple again?  Does this explain what the function does?

Ok, so the case in question is

  struct X { int i; } x;
  x = foo ();

where you want to scalarize x.  Indeed the obvious scalarization would
be

  x = foo ();
  SR_1 = x.i;

For all other LHS cases (not calls) you can move the V_C_E to the RHS
and should be fine.

I don't understand the TREE_ADDRESSABLE thingy yet.  Especially why
the type should be a non-register type.  My understanding was that
in reads the replacement LHS will be a register, for mismatched types
you simply put a V_C_E around the RHS.  For writes the replacement
RHS will be a register, possibly surrounded by a V_C_E.  The special
case is calls where you cannot put a V_C_E around the "RHS".

So, can you explain with an example, how it comes that the scalar
replacement type is a non-register type?  (Whatever our conclusion
will be, the function should have a big comment enumerating the
cases we have to deal with)

> It certainly passes bootstrap and testing, I use --enable-checking=yes.

That's good.

> > >     {
> > >       /* I have never seen this code path trigger but if it can happen the
> > > 	 following should handle it gracefully.  */
> > 
> > It can trigger for vector constants.
> 
> OK, I'll remove the comment.  Apparently there are none in the
> testsuite, I believe I tested with a gcc_unreachable here.

Err, for vector constants we have VECTOR_CST, so it triggers for
non-constant vector constructors like

 vector int x = { a, b, c, d };

> > >   update_stmt (aux_stmt);
> > >   new_stmt = gimple_build_assign (get_access_replacement (access),
> > > 				  fold_build2 (COMPLEX_EXPR, access->type,
> > > 					       rp, ip));
> > >   gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
> > >   update_stmt (new_stmt);
> > 
> > Hm.  So you do what complex lowering does here.  Note that this may
> > create loads from uninitialized memory with all its problems.
> 
> Yes,  but I  have not  had any  such problems  with complex  types (as
> opposed to  simple loads from half-initialized  records, for example).
> OTOH, I  have also contemplated  setting DECL_GIMPLE_REG_P to  zero of
> complex replacement which appear in IMAG_PART or REAL_PART on a LHS of
> a statement.

Yes, that's necessary.  I still think SRA should not bother about
this at all ;)

> > WRT the complex stuff.  If you would do scalarization and analysis
> > just on the components (not special case REAL/IMAGPART_EXPR everywhere)
> > it should work better, correct?  You still could handle group
> > scalarization for the case of for example passing a complex argument
> > to a function.
> 
> Well, my reasoning was that if complex types were first-class citizens
> in gimple  (as opposed to a record),  there was a reason  to keep them
> together  and  so  I  attempted   that.   But  again,  if  that  is  a
> misconception of mine and there  is no point in keeping them together,
> I will gladly remove this.

It's not clear.  Complex lowering decomposes all complex variables
to components if possible.  Again, simplifying SRA is probably better.

> > void bar(_Complex float);
> > void foo(float x, float y)
> > {
> >   _Complex float z = x;
> >   __imag z = y;
> >   bar(z);
> > }
> > 
> > The same applies for vectors - the REAL/IMAGPART_EXPRs equivalent
> > there is BIT_FIELD_REF.
> 
> These are handled  by setting DECL_GIMPLE_REG_P to zero  if a B_F_R is
> on a LHS.  I believe the current SRA does the same.  It works fine and
> there's a lot less fuss about them.
>  
> > >   return SRA_SA_PROCESSED;
> > > }
> > > 
> > > /* Return true iff T has a VIEW_CONVERT_EXPR among its handled components.  */
> > > 
> > > static bool
> > > contains_view_convert_expr_p (tree t)
> > > {
> > >   while (1)
> > >     {
> > >       if (TREE_CODE (t) == VIEW_CONVERT_EXPR)
> > > 	return true;
> > >       if (!handled_component_p (t))
> > > 	return false;
> > >       t = TREE_OPERAND (t, 0);
> > >     }
> > > }
> > 
> > Place this in tree-flow-inline.h next to ref_contains_array_ref, also
> > structure the loop in the same way.
> 
> OK,  but I'd like  the function  to work  if passed  declarations too.
> Thus I cannot really use a  do-while loop.  I'll send it in a separate
> patch.
> 
> > > /* Change STMT to assign compatible types by means of adding component or array
> > >    references or VIEW_CONVERT_EXPRs.  All parameters have the same meaning as
> > >    variable with the same names in sra_modify_assign.  This is done in a
> > >    such a complicated way in order to make
> > >    testsuite/g++.dg/tree-ssa/ssa-sra-2.C happy and so it helps in at least some
> > >    cases.  */
> > > 
> > > static void
> > > fix_modified_assign_compatibility (gimple_stmt_iterator *gsi, gimple *stmt,
> > > 				   struct access *lacc, struct access *racc,
> > > 				   tree lhs, tree *rhs, tree ltype, tree rtype)
> > > {
> > >   if (racc && racc->grp_to_be_replaced && AGGREGATE_TYPE_P (ltype)
> > >       && !access_has_children_p (lacc))
> > >     {
> > >       tree expr = unshare_expr (lhs);
> > >       bool found = build_ref_for_offset (&expr, ltype, racc->offset, rtype,
> > > 					 false);
> > >       if (found)
> > > 	{
> > > 	  gimple_assign_set_lhs (*stmt, expr);
> > > 	  return;
> > > 	}
> > >     }
> > > 
> > >   if (lacc && lacc->grp_to_be_replaced && AGGREGATE_TYPE_P (rtype)
> > >       && !access_has_children_p (racc))
> > >     {
> > >       tree expr = unshare_expr (*rhs);
> > >       bool found = build_ref_for_offset (&expr, rtype, lacc->offset, ltype,
> > > 					 false);
> > >       if (found)
> > > 	{
> > > 	  gimple_assign_set_rhs1 (*stmt, expr);
> > > 	  return;
> > > 	}
> > >     }
> > > 
> > >   *rhs = fold_build1 (VIEW_CONVERT_EXPR, ltype, *rhs);
> > >   gimple_assign_set_rhs_from_tree (gsi, *rhs);
> > >   *stmt = gsi_stmt (*gsi);
> > 
> > Reading this I have a deja-vu - isn't there another function in this
> > file doing the same thing?  You are doing much unsharing even though
> > you re-build the access tree from scratch?
> 
> This function has a similar purpose as fix_incompatible_types_for_expr
> but this time  only for assign statements.  That  is easier because we
> can always put the  V_C_E on the RHS and be safe  and so no additional
> statements need to be generated.
> 
> However,  the V_C_Es  rather than  COMPONENT_REFs and  ARRAY_REFs feel
> unnatural for  accessing fields from  single field records  and unions
> and single  element arrays.  According to  the comment I  used to have
> problems of some sort with that  in the ssa-sra-2.C testcase but I can
> no longer reproduce them (and don't remember them).
> 
> I  call  unshare_expr  in  this  context  only when  one  side  of  an
> assignment statement is  a scalar replacement and the  other one is an
> aggregate (but not necessarily a declaration) which can happen only in
> the cases listed  above.  That is not very many  calls and chances are
> good that build_ref_for_offset succeeds.
> 
> Does that explain what is going on here?

Yes.  I guess merging the two functions would make sense to me
(or putting them next to each other at least).  The function signatures
also look seemingly weird (that you store into *rhs but still set
the stmt rhs, etc.).  I have another look here with the new version.

> > > }
> > > 
> > > /* Callback of scan_function to process assign statements.  It examines both
> > >    sides of the statement, replaces them with a scalare replacement if there is
> > >    one and generating copying of replacements if scalarized aggregates have been
> > >    used in the assignment.  STMT is a pointer to the assign statement, GSI is
> > >    used to hold generated statements for type conversions and subtree
> > >    copying.  */
> > > 
> > > static enum scan_assign_result
> > > sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi,
> > > 		   void *data ATTRIBUTE_UNUSED)
> > > {
> > >   struct access *lacc, *racc;
> > >   tree ltype, rtype;
> > >   tree lhs, rhs;
> > >   bool modify_this_stmt;
> > > 
> > >   if (gimple_assign_rhs2 (*stmt))
> > 
> > !gimple_assign_single_p (*stmt)
> > 
> > (the only gimple assign that may access memory)
> 
> OK
> 
> > 
> > >     return SRA_SA_NONE;
> > >   lhs = gimple_assign_lhs (*stmt);
> > >   rhs = gimple_assign_rhs1 (*stmt);
> > > 
> > >   if (TREE_CODE (rhs) == CONSTRUCTOR)
> > >     return sra_modify_constructor_assign (stmt, gsi);
> > > 
> > >   if (TREE_CODE (lhs) == REALPART_EXPR || TREE_CODE (lhs) == IMAGPART_EXPR)
> > >     return sra_modify_partially_complex_lhs (*stmt, gsi);
> > > 
> > >   if (TREE_CODE (rhs) == REALPART_EXPR || TREE_CODE (rhs) == IMAGPART_EXPR
> > >       || TREE_CODE (rhs) == BIT_FIELD_REF || TREE_CODE (lhs) == BIT_FIELD_REF)
> > >     {
> > >       modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (*stmt),
> > > 					  gsi, false, data);
> > >       modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (*stmt),
> > > 					   gsi, true, data);
> > >       return modify_this_stmt ? SRA_SA_PROCESSED : SRA_SA_NONE;
> > >     }
> > > 
> > >   lacc = get_access_for_expr (lhs);
> > >   racc = get_access_for_expr (rhs);
> > >   if (!lacc && !racc)
> > >     return SRA_SA_NONE;
> > > 
> > >   modify_this_stmt = ((lacc && lacc->grp_to_be_replaced)
> > > 		      || (racc && racc->grp_to_be_replaced));
> > > 
> > >   if (lacc && lacc->grp_to_be_replaced)
> > >     {
> > >       lhs = get_access_replacement (lacc);
> > >       gimple_assign_set_lhs (*stmt, lhs);
> > >       ltype = lacc->type;
> > >     }
> > >   else
> > >     ltype = TREE_TYPE (lhs);
> > > 
> > >   if (racc && racc->grp_to_be_replaced)
> > >     {
> > >       rhs = get_access_replacement (racc);
> > >       gimple_assign_set_rhs1 (*stmt, rhs);
> > >       rtype = racc->type;
> > >     }
> > >   else
> > >     rtype = TREE_TYPE (rhs);
> > > 
> > >   /* The possibility that gimple_assign_set_rhs_from_tree() might reallocate
> > >      the statement makes the position of this pop_stmt_changes() a bit awkward
> > >      but hopefully make some sense.  */
> > 
> > I don't see pop_stmt_changes().
> 
> Yeah, the comment is outdated. I've removed it.
>  
> > >   if (modify_this_stmt)
> > >     {
> > >       if (!useless_type_conversion_p (ltype, rtype))
> > > 	fix_modified_assign_compatibility (gsi, stmt, lacc, racc,
> > > 					   lhs, &rhs, ltype, rtype);
> > >     }
> > > 
> > >   if (contains_view_convert_expr_p (rhs) || contains_view_convert_expr_p (lhs)
> > >       || (access_has_children_p (racc)
> > > 	  && !ref_expr_for_all_replacements_p (racc, lhs, racc->offset))
> > >       || (access_has_children_p (lacc)
> > > 	  && !ref_expr_for_all_replacements_p (lacc, rhs, lacc->offset)))
> > 
> > ?  A comment is missing what this case is about ...
> > 
> > (this smells like fixup that could be avoided by doing things correct
> > in the first place)
> 
> From this  point on,  the function deals  with assignments  in between
> aggregates  when at least  one has  scalar reductions  of some  of its
> components.  There are three possible  scenarios: Both the LHS and RHS
> have to-be-scalarized components,  2) only the RHS has  or 3) only the
> LHS has.
> 
> In the first  case, we would like to load the  LHS components from RHS
> components whenever possible.  If that  is not possible, we would like
> to read it  directly from the RHS (after updating it  by storing in it
> its own components).  If there are some necessary unscalarized data in
> the  LHS, those will  be loaded  by the  original assignment  too.  If
> neither of these cases happen,  the original statement can be removed.
> Most of this is done by load_assign_lhs_subreplacements.
> 
> In  the  second  case, we  would  like  to  store all  RHS  scalarized
> components  directly  into  LHS   and  if  they  cover  the  aggregate
> completely, remove the statement too.   In the third case, we want the
> LHS components to be loaded directly from the RHS (DSE will remove the
> original statement if it becomes redundant).
> 
> This is a bit complex but  manageable when types match and when unions
> do not cause confusion in a way that we cannot really load a component
> of LHS from the RHS or  vice versa (the access representing this level
> can  have subaccesses  that are  accessible only  through  a different
> union field  at a higher  level - different  from the one used  in the
> examined expression).  Unions are fun.
> 
> Therefore, I specially handle a fourth case, happening when there is a
> specific  type  cast  or  it  is impossible  to  locate  a  scalarized
> subaccess on  the other  side of the  expression.  If that  happens, I
> simply "refresh"  the RHS  by storing in  it is  scalarized components
> leave the original statement there to do the copying and then load the
> scalar replacements of the LHS.  This is what the first branch does.
> 
> Is it  clearer now?  Perhaps I  should put these five  paragraphs as a
> comment into the function?

Yes - that would be nice.

Thanks,
Richard.