From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-329120-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 15387 invoked by alias); 15 Oct 2012 16:50:06 -0000
Received: (qmail 15281 invoked by uid 22791); 15 Oct 2012 16:50:02 -0000
X-SWARE-Spam-Status: No, hits=-4.3 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_CP
X-Spam-Check-By: sourceware.org
Received: from mail-wi0-f179.google.com (HELO mail-wi0-f179.google.com) (209.85.212.179)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 15 Oct 2012 16:49:54 +0000
Received: by mail-wi0-f179.google.com with SMTP id hq7so1878532wib.8        for <gcc-patches@gcc.gnu.org>; Mon, 15 Oct 2012 09:49:51 -0700 (PDT)
Received: by 10.216.131.218 with SMTP id m68mr7532288wei.195.1350319791664;        Mon, 15 Oct 2012 09:49:51 -0700 (PDT)
Received: from sandifor-thinkpad.stglab.manchester.uk.ibm.com (gbibp9ph1--blueice2n1.emea.ibm.com. [195.212.29.75])        by mx.google.com with ESMTPS id ct3sm14739114wib.5.2012.10.15.09.49.45        (version=TLSv1/SSLv3 cipher=OTHER);        Mon, 15 Oct 2012 09:49:47 -0700 (PDT)
From: Richard Sandiford <rdsandiford@googlemail.com>
To: Vladimir Makarov <vmakarov@redhat.com>
Mail-Followup-To: Vladimir Makarov <vmakarov@redhat.com>,GCC Patches <gcc-patches@gcc.gnu.org>, rdsandiford@googlemail.com
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: RFC: LRA for x86/x86-64 [7/9] -- continuation
References: <5064DA43.90803@redhat.com>
Date: Mon, 15 Oct 2012 17:06:00 -0000
In-Reply-To: <5064DA43.90803@redhat.com> (Vladimir Makarov's message of "Thu,	27 Sep 2012 18:59:15 -0400")
Message-ID: <87sj9fd6jf.fsf@sandifor-thinkpad.stglab.manchester.uk.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-10/txt/msg01447.txt.bz2

Hi Vlad,

Some comments about the rest of LRA.  Nothing major here...

Vladimir Makarov <vmakarov@redhat.com> writes:
> +/* Info about register in an insn.  */
> +struct lra_insn_reg
> +{
> +  /* The biggest mode through which the insn refers to the register
> +     (remember the register can be accessed through a subreg in the
> +     insn).  */
> +  ENUM_BITFIELD(machine_mode) biggest_mode : 16;

AFAICT, this is actually always the mode of a specific reference,
and if there are references to the same register in different modes,
those references get their own lra_insn_regs.  "mode" might be better
than "biggest_mode" if so.

> +/* Static part (common info for insns with the same ICODE) of LRA
> +   internal insn info.	It exists in at most one exemplar for each
> +   non-negative ICODE.	Warning: if the structure definition is
> +   changed, the initializer for debug_insn_static_data in lra.c should
> +   be changed too.  */

Probably worth saying (before the warning) that there is also
one structure for each asm.

> +/* LRA internal info about an insn (LRA internal insn
> +   representation).  */
> +struct lra_insn_recog_data
> +{
> +  int icode; /* The insn code.	*/
> +  rtx insn; /* The insn itself.	 */
> +  /* Common data for insns with the same ICODE.	 */
> +  struct lra_static_insn_data *insn_static_data;

Maybe worth mentioning asms here too.

> +  /* Two arrays of size correspondingly equal to the operand and the
> +     duplication numbers: */
> +  rtx **operand_loc; /* The operand locations, NULL if no operands.  */
> +  rtx **dup_loc; /* The dup locations, NULL if no dups.	 */
> +  /* Number of hard registers implicitly used in given call insn.  The
> +     value can be NULL or points to array of the hard register numbers
> +     ending with a negative value.  */
> +  int *arg_hard_regs;
> +#ifdef HAVE_ATTR_enabled
> +  /* Alternative enabled for the insn.	NULL for debug insns.  */
> +  bool *alternative_enabled_p;
> +#endif
> +  /* The alternative should be used for the insn, -1 if invalid, or we
> +     should try to use any alternative, or the insn is a debug
> +     insn.  */
> +  int used_insn_alternative;
> +  struct lra_insn_reg *regs;  /* Always NULL for a debug insn.	*/

Comments consistently above the field.

> +extern void lra_expand_reg_info (void);

This doesn't exist any more.

> +extern int lra_constraint_new_insn_uid_start;

Just saying in case: this seems to be write-only, with lra-constraints.c
instead using a static variable to track the uid start.

I realise you might want to keep it anyway for consistency with
lra_constraint_new_regno_start, or for debugging.

> +extern rtx lra_secondary_memory[NUM_MACHINE_MODES];

This doesn't exist any more.

> +/* lra-saves.c: */
> +
> +extern bool lra_save_restore (void);

Same for this file & function.

> +/* The function returns TRUE if at least one hard register from ones
> +   starting with HARD_REGNO and containing value of MODE are in set
> +   HARD_REGSET.	 */
> +static inline bool
> +lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
> +				 HARD_REG_SET hard_regset)
> +{
> +  int i;
> +
> +  lra_assert (hard_regno >= 0);
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +    if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
> +      return true;
> +  return false;
> +}

This is the same as overlaps_hard_reg_set_p.

> +/* Return hard regno and offset of (sub-)register X through arguments
> +   HARD_REGNO and OFFSET.  If it is not (sub-)register or the hard
> +   register is unknown, then return -1 and 0 correspondingly.  */

The function seems to return -1 for both.

> +/* Add hard registers starting with HARD_REGNO and holding value of
> +   MODE to the set S.  */
> +static inline void
> +lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
> +{
> +  int i;
> +
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +    SET_HARD_REG_BIT (*s, hard_regno + i);
> +}

This is add_to_hard_reg_set.

> +   Here is block diagram of LRA passes:
> +
> +	  ---------------------				    
> +	 | Undo inheritance    |      ---------------	     --------------- 
> +	 | for spilled pseudos)|     | Memory-memory |	    | New (and old) |
> +	 | and splits (for     |<----| move coalesce |<-----|	 pseudos    |
> +	 | pseudos got the     |      ---------------	    |	assignment  |
> +  Start	 |  same  hard regs)   |			     --------------- 
> +    |	  ---------------------					    ^
> +    V		  |		 ----------------		    |
> + -----------	  V		| Update virtual |		    |
> +|  Remove   |----> ------------>|    register	 |		    |
> +| scratches |	  ^		|  displacements |		    |
> + -----------	  |		 ----------------		    |
> +		  |			 |			    |
> +		  |			 V	   New		    |
> +	 ----------------    No	   ------------	 pseudos   -------------------
> +	| Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
> +	|    to memory	 |<-------|    RTL     |--------->|  transformations  |
> +	|  substitution	 |	  | transfor-  |	  |    in EBB scope   |
> +	 ----------------	  |  mations   |	   -------------------
> +		|		    ------------ 
> +		V
> +    -------------------------
> +   | Hard regs substitution, |
> +   |  devirtalization, and   |------> Finish
> +   | restoring scratches got |
> +   |	     memory	     |
> +    -------------------------

This is a great diagram, thanks.

> +/* Create and return a new reg from register FROM corresponding to
> +   machine description operand of mode MD_MODE.	 Initialize its
> +   register class to RCLASS.  Print message about assigning class
> +   RCLASS containing new register name TITLE unless it is NULL.	 The
> +   created register will have unique held value.  */
> +rtx
> +lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
> +				      enum reg_class rclass, const char *title)

Comment says FROM, but parameter is called ORIGINAL.  The code copes
with both null and non-register ORIGINALs, which aren't mentinoed in
the comment.

> +/* Target checks operands through operand predicates to recognize an
> +   insn.  We should have a special precaution to generate add insns
> +   which are frequent results of elimination.
> +
> +   Emit insns for x = y + z.  X can be used to store intermediate
> +   values and should be not in Y and Z when we use x to store an
> +   intermediate value.	*/ 

I think this should say what Y and Z are allowed to be, since it's
more than just registers and constants.

> +/* Map INSN_UID -> the operand alternative data (NULL if unknown).  We
> +   assume that this data is valid until register info is changed
> +   because classes in the data can be changed.	*/
> +struct operand_alternative *op_alt_data[LAST_INSN_CODE];

In that case I think it should be in a target_globals structure,
a bit like target_ira.

> +	    for (curr = list; curr != NULL; curr = curr->next)
> +	      if (curr->regno == regno)
> +		break;
> +	    if (curr == NULL || curr->subreg_p != subreg_p
> +		|| curr->biggest_mode != mode)
> +	      {
> +		/* This is a new hard regno or the info can not be
> +		   integrated into the found structure.	 */
> +#ifdef STACK_REGS
> +		early_clobber
> +		  = (early_clobber
> +		     /* This clobber is to inform popping floating
> +			point stack only.  */
> +		     && ! (FIRST_STACK_REG <= regno
> +			   && regno <= LAST_STACK_REG));
> +#endif
> +		list = new_insn_reg (regno, type, mode, subreg_p,
> +				     early_clobber, list);
> +	      }
> +	    else
> +	      {
> +		if (curr->type != type)
> +		  curr->type = OP_INOUT;
> +		if (curr->early_clobber != early_clobber)
> +		  curr->early_clobber = true;
> +	      }

OK, so this is probably only a technicality, but I think this should be:

	    for (curr = list; curr != NULL; curr = curr->next)
	      if (curr->regno == regno
		  && curr->subreg_p == subreg_p
		  && curr->biggest_mode == mode)
		{
		  ..reuse..;
		  return list;
		}
	     ..new entry..;
	     return list;

> +      icode = INSN_CODE (insn);
> +      if (icode < 0)
> +	/* It might be a new simple insn which is not recognized yet.  */
> +	INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);

Any reason not to use recog_memoized here?

> +      n = insn_static_data->n_operands;
> +      if (n == 0)
> +	locs = NULL;
> +      else
> +	{
> +	  
> +	  locs = (rtx **) xmalloc (n * sizeof (rtx *));
> +	  memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
> +	}

Excess blank line after "else" (sorry!)

> +  /* Some output operand can be recognized only from the context not
> +     from the constraints which are empty in this case.	 Call insn may
> +     contain a hard register in set destination with empty constraint
> +     and extract_insn treats them as an input.	*/
> +  for (i = 0; i < insn_static_data->n_operands; i++)
> +    {
> +      int j;
> +      rtx pat, set;
> +      struct lra_operand_data *operand = &insn_static_data->operand[i];
> +
> +      /* ??? Should we treat 'X' the same way.	It looks to me that
> +	 'X' means anything and empty constraint means we do not
> +	 care.	*/

FWIW, I think any X output operand has to be "=X" or "+X"; just "X"
would be as wrong as "r".  genrecog is supposed to complain about that
for insns, and parse_output_constraint for asms.

So I agree the code is correct in just handling empty constraints.

> +/* Update all the insn info about INSN.	 It is usually called when
> +   something in the insn was changed.  Return the udpated info.	 */

Typo: updated.

> +  for (i = 0; i < reg_info_size; i++)
> +    {
> +      bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
> +#ifdef STACK_REGS
> +      lra_reg_info[i].no_stack_p = false;
> +#endif
> +      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
> +      lra_reg_info[i].preferred_hard_regno1 = -1;
> +      lra_reg_info[i].preferred_hard_regno2 = -1;
> +      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
> +      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
> +      lra_reg_info[i].live_ranges = NULL;
> +      lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
> +      lra_reg_info[i].last_reload = 0;
> +      lra_reg_info[i].restore_regno = -1;
> +      lra_reg_info[i].val = get_new_reg_value ();
> +      lra_reg_info[i].copies = NULL;
> +    }

The same loop (with a different start index) appears in expand_reg_info.
It'd be nice to factor it out, so that there's only one place to update
if the structure is changed.

> +	  for (curr = data->regs; curr != NULL; curr = curr->next)
> +	    if (curr->regno == regno)
> +	      break;
> +	  if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
> +	    /* The info can not be integrated into the found
> +	       structure.  */
> +	    data->regs = new_insn_reg (regno, type, mode, subreg_p,
> +				       early_clobber, data->regs);
> +	  else
> +	    {
> +	      if (curr->type != type)
> +		curr->type = OP_INOUT;
> +	      if (curr->early_clobber != early_clobber)
> +		curr->early_clobber = true;
> +	    }
> +	  lra_assert (curr != NULL);
> +	}

Same loop comment as for collect_non_operand_hard_regs.  Maybe another
factoring opportunity.

> +		/* Some ports don't recognize the following addresses
> +		   as legitimate.  Although they are legitimate if
> +		   they satisfies the constraints and will be checked
> +		   by insn constraints which we ignore here.  */
> +		&& GET_CODE (XEXP (op, 0)) != UNSPEC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_DEC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_INC
> +		&& GET_CODE (XEXP (op, 0)) != POST_DEC
> +		&& GET_CODE (XEXP (op, 0)) != POST_INC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_MODIFY
> +		&& GET_CODE (XEXP (op, 0)) != POST_MODIFY)

GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) == RTX_AUTOINC

> +/* Determine if the current function has an exception receiver block
> +   that reaches the exit block via non-exceptional edges  */
> +static bool
> +has_nonexceptional_receiver (void)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  basic_block *tos, *worklist, bb;
> +
> +  /* If we're not optimizing, then just err on the safe side.  */
> +  if (!optimize)
> +    return true;
> +  
> +  /* First determine which blocks can reach exit via normal paths.  */
> +  tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
> +
> +  FOR_EACH_BB (bb)
> +    bb->flags &= ~BB_REACHABLE;
> +
> +  /* Place the exit block on our worklist.  */
> +  EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
> +  *tos++ = EXIT_BLOCK_PTR;
> +  
> +  /* Iterate: find everything reachable from what we've already seen.  */
> +  while (tos != worklist)
> +    {
> +      bb = *--tos;
> +
> +      FOR_EACH_EDGE (e, ei, bb->preds)
> +	if (!(e->flags & EDGE_ABNORMAL))
> +	  {
> +	    basic_block src = e->src;
> +
> +	    if (!(src->flags & BB_REACHABLE))
> +	      {
> +		src->flags |= BB_REACHABLE;
> +		*tos++ = src;
> +	      }
> +	  }
> +    }
> +  free (worklist);
> +
> +  /* Now see if there's a reachable block with an exceptional incoming
> +     edge.  */
> +  FOR_EACH_BB (bb)
> +    if (bb->flags & BB_REACHABLE)
> +      FOR_EACH_EDGE (e, ei, bb->preds)
> +	if (e->flags & EDGE_ABNORMAL)
> +	  return true;
> +
> +  /* No exceptional block reached exit unexceptionally.	 */
> +  return false;
> +}

Looks like we could just early out on the first loop and get rid
of the second.

> +/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
> +   We change pseudos by hard registers without notification of DF and
> +   that can make the notes obsolete.  DF-infrastructure does not deal
> +   with REG_INC notes -- so we should regenerate them here.  */

These days passes are supposed to regenerate REG_DEAD and REG_UNUSED
notes if they need them, so that part might not be necessary.
The REG_INC bit is still needed though...

> +/* Initialize LRA data once per function.  */
> +void
> +lra_init (void)
> +{
> +  init_op_alt_data ();
> +}

I think it's more like:

/* Initialize LRA whenever register-related information is changed.  */


In summary, LRA looks really good to me FWIW.  Thanks for all your hard work.

Getting rid of reload always seemed like a pipe dream, and if the only
known drawback of this replacement is that it takes a while on extreme
testcases, that's an amazing achievement.  (Not to say compile time
isn't important, just that there were so many other hurdles to overcome.)

It looks like opinion has crystalised in favour of merging LRA for 4.8.
I hope that's what happens.  I don't see that anything would be gained
by delaying it to 4.9.  The code's not going to get any more testing on the
branch that it already has; whenever we merge, the stress test is always
going to be trunk.

Richard