public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* peephole2 vs cond-exec vs df
@ 2010-04-22 16:36 Bernd Schmidt
  2010-06-07 14:46 ` Resubmit/ping: " Bernd Schmidt
  2010-06-14 12:28 ` Paolo Bonzini
  0 siblings, 2 replies; 29+ messages in thread
From: Bernd Schmidt @ 2010-04-22 16:36 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

On targets with conditional execution, the peephole2 pass fails to retry
insns it generates against other peepholes.  This code is responsible:
                 if (targetm.have_conditional_execution ())
                    {
                      for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
                        peep2_insn_data[i].insn = NULL_RTX;
                      peep2_insn_data[peep2_current].insn = PEEP2_EOB;
                      peep2_current_count = 0;
                    }
                  else

This appears to have been added a long time ago, when we had code that
tried to track register lifetimes accurately through cond_exec insns.
However, all that code was apparently lost when df was introduced -
df_simulate_one_insn_backwards has no trace of it - so we needlessly
restrict cond_exec ports here.

My first reaction was to simply delete the special case from peephole2;
this seems easier than re-adding the lost code to df, and for my ARM
peepholes patch it would be good to be able to apply peepholes on insns
generated by another peephole2.

Thinking about it some more, I came up with two ideas which I think are
better.

First, I noticed that sometimes small peepholes are applied when a
larger one would have matched.  I've modified the code so that it now
tries to fill the buffer up to the maximum number of insns it can hold,
and only then does it start to apply peepholes.  If nothing matched, it
discards the first insn from the buffer, the empty space is then filled
in the next iteration.  This eliminates the need in my ARM patch to
apply peepholes multiple times, since we should always see as many insns
as the largest peephole can handle.

Second, I've changed it to use a forward scan.  As far as I am aware, in
the presence of conditional execution, a forward scan does not need to
keep track of extra state - it only relies on correct REG_DEAD notes.  I
don't actually know whether df produces correct death notes in the
presence of conditional execution (I suspect it does not - can anyone
say for sure?), but in any case using a forward scan here shifts the
problem out of recog.c entirely.  Only when performing a substitution do
we process the new insns in a backward scan, since they won't have
REG_DEAD notes.  While this leaves us with potentially overestimating
liveness if a substitution produces cond_exec insns, I think on the
whole it's the most accurate algorithm we can easily implement for
peephole2.

Comments?  (Approvals?  Testing now, i686-linux and arm-linux-gnueabi.)


Bernd

[-- Attachment #2: peep2-forward-v3.diff --]
[-- Type: text/plain, Size: 20356 bytes --]

	* recog.c (peep2_do_rebuild_jump_labels, peep2_do_cleanup_cfg): New
	static variables.
	(peep2_buf_position): New static function.
	(peep2_regno_dead_p, peep2_reg_dead_p, peep2_find_free_register,
	peephole2_optimize): Use it.
	(peep2_attempt, peep2_update_life): New static functions, broken out
	of peephole2_optimize.
	(peep2_fill_buffer): New static function.
	(peephole2_optimize): Change the main loop to try to fill the buffer
	with the maximum number of insns before matching them against
	peepholes.  Use a forward scan.  Remove special case for targets with
	conditional execution.

Index: recog.c
===================================================================
*** recog.c	(revision 158639)
--- recog.c	(working copy)
*************** struct peep2_insn_data
*** 2911,2916 ****
--- 2911,2920 ----
  
  static struct peep2_insn_data peep2_insn_data[MAX_INSNS_PER_PEEP2 + 1];
  static int peep2_current;
+ 
+ static bool peep2_do_rebuild_jump_labels;
+ static bool peep2_do_cleanup_cfg;
+ 
  /* The number of instructions available to match a peep2.  */
  int peep2_current_count;
  
*************** int peep2_current_count;
*** 2919,2924 ****
--- 2923,2938 ----
     DF_LIVE_OUT for the block.  */
  #define PEEP2_EOB	pc_rtx
  
+ /* Wrap N to fit into the peep2_insn_data buffer.  */
+ 
+ static int
+ peep2_buf_position (int n)
+ {
+   if (n >= MAX_INSNS_PER_PEEP2 + 1)
+     n -= MAX_INSNS_PER_PEEP2 + 1;
+   return n;
+ }
+ 
  /* Return the Nth non-note insn after `current', or return NULL_RTX if it
     does not exist.  Used by the recognizer to find the next insn to match
     in a multi-insn pattern.  */
*************** peep2_next_insn (int n)
*** 2928,2936 ****
  {
    gcc_assert (n <= peep2_current_count);
  
!   n += peep2_current;
!   if (n >= MAX_INSNS_PER_PEEP2 + 1)
!     n -= MAX_INSNS_PER_PEEP2 + 1;
  
    return peep2_insn_data[n].insn;
  }
--- 2942,2948 ----
  {
    gcc_assert (n <= peep2_current_count);
  
!   n = peep2_buf_position (peep2_current + n);
  
    return peep2_insn_data[n].insn;
  }
*************** peep2_regno_dead_p (int ofs, int regno)
*** 2943,2951 ****
  {
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs += peep2_current;
!   if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
!     ofs -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
--- 2955,2961 ----
  {
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs = peep2_buf_position (peep2_current + ofs);
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
*************** peep2_reg_dead_p (int ofs, rtx reg)
*** 2961,2969 ****
  
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs += peep2_current;
!   if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
!     ofs -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
--- 2971,2977 ----
  
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs = peep2_buf_position (peep2_current + ofs);
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
*************** peep2_find_free_register (int from, int 
*** 2998,3009 ****
    gcc_assert (from < MAX_INSNS_PER_PEEP2 + 1);
    gcc_assert (to < MAX_INSNS_PER_PEEP2 + 1);
  
!   from += peep2_current;
!   if (from >= MAX_INSNS_PER_PEEP2 + 1)
!     from -= MAX_INSNS_PER_PEEP2 + 1;
!   to += peep2_current;
!   if (to >= MAX_INSNS_PER_PEEP2 + 1)
!     to -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
    REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);
--- 3006,3013 ----
    gcc_assert (from < MAX_INSNS_PER_PEEP2 + 1);
    gcc_assert (to < MAX_INSNS_PER_PEEP2 + 1);
  
!   from = peep2_buf_position (peep2_current + from);
!   to = peep2_buf_position (peep2_current + to);
  
    gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
    REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);
*************** peep2_find_free_register (int from, int 
*** 3012,3019 ****
      {
        HARD_REG_SET this_live;
  
!       if (++from >= MAX_INSNS_PER_PEEP2 + 1)
! 	from = 0;
        gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
        REG_SET_TO_HARD_REG_SET (this_live, peep2_insn_data[from].live_before);
        IOR_HARD_REG_SET (live, this_live);
--- 3016,3022 ----
      {
        HARD_REG_SET this_live;
  
!       from = peep2_buf_position (from + 1);
        gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
        REG_SET_TO_HARD_REG_SET (this_live, peep2_insn_data[from].live_before);
        IOR_HARD_REG_SET (live, this_live);
*************** peep2_reinit_state (regset live)
*** 3106,3341 ****
    COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
  }
  
  /* Perform the peephole2 optimization pass.  */
  
  static void
  peephole2_optimize (void)
  {
!   rtx insn, prev;
!   bitmap live;
    int i;
    basic_block bb;
!   bool do_cleanup_cfg = false;
!   bool do_rebuild_jump_labels = false;
  
    df_set_flags (DF_LR_RUN_DCE);
    df_analyze ();
  
    /* Initialize the regsets we're going to use.  */
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
    live = BITMAP_ALLOC (&reg_obstack);
  
    FOR_EACH_BB_REVERSE (bb)
      {
        rtl_profile_for_bb (bb);
  
        /* Start up propagation.  */
!       bitmap_copy (live, DF_LR_OUT (bb));
!       df_simulate_initialize_backwards (bb, live);
        peep2_reinit_state (live);
  
!       for (insn = BB_END (bb); ; insn = prev)
  	{
! 	  prev = PREV_INSN (insn);
! 	  if (NONDEBUG_INSN_P (insn))
! 	    {
! 	      rtx attempt, before_try, x;
! 	      int match_len;
! 	      rtx note;
! 	      bool was_call = false;
! 
! 	      /* Record this insn.  */
! 	      if (--peep2_current < 0)
! 		peep2_current = MAX_INSNS_PER_PEEP2;
! 	      if (peep2_current_count < MAX_INSNS_PER_PEEP2
! 		  && peep2_insn_data[peep2_current].insn == NULL_RTX)
! 		peep2_current_count++;
! 	      peep2_insn_data[peep2_current].insn = insn;
! 	      df_simulate_one_insn_backwards (bb, insn, live);
! 	      COPY_REG_SET (peep2_insn_data[peep2_current].live_before, live);
! 
! 	      if (RTX_FRAME_RELATED_P (insn))
! 		{
! 		  /* If an insn has RTX_FRAME_RELATED_P set, peephole
! 		     substitution would lose the
! 		     REG_FRAME_RELATED_EXPR that is attached.  */
! 		  peep2_reinit_state (live);
! 		  attempt = NULL;
! 		}
! 	      else
! 		/* Match the peephole.  */
! 		attempt = peephole2_insns (PATTERN (insn), insn, &match_len);
  
! 	      if (attempt != NULL)
  		{
! 		  /* If we are splitting a CALL_INSN, look for the CALL_INSN
! 		     in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
! 		     cfg-related call notes.  */
! 		  for (i = 0; i <= match_len; ++i)
! 		    {
! 		      int j;
! 		      rtx old_insn, new_insn, note;
! 
! 		      j = i + peep2_current;
! 		      if (j >= MAX_INSNS_PER_PEEP2 + 1)
! 			j -= MAX_INSNS_PER_PEEP2 + 1;
! 		      old_insn = peep2_insn_data[j].insn;
! 		      if (!CALL_P (old_insn))
! 			continue;
! 		      was_call = true;
! 
! 		      new_insn = attempt;
! 		      while (new_insn != NULL_RTX)
! 			{
! 			  if (CALL_P (new_insn))
! 			    break;
! 			  new_insn = NEXT_INSN (new_insn);
! 			}
! 
! 		      gcc_assert (new_insn != NULL_RTX);
! 
! 		      CALL_INSN_FUNCTION_USAGE (new_insn)
! 			= CALL_INSN_FUNCTION_USAGE (old_insn);
! 
! 		      for (note = REG_NOTES (old_insn);
! 			   note;
! 			   note = XEXP (note, 1))
! 			switch (REG_NOTE_KIND (note))
! 			  {
! 			  case REG_NORETURN:
! 			  case REG_SETJMP:
! 			    add_reg_note (new_insn, REG_NOTE_KIND (note),
! 					  XEXP (note, 0));
! 			    break;
! 			  default:
! 			    /* Discard all other reg notes.  */
! 			    break;
! 			  }
! 
! 		      /* Croak if there is another call in the sequence.  */
! 		      while (++i <= match_len)
! 			{
! 			  j = i + peep2_current;
! 			  if (j >= MAX_INSNS_PER_PEEP2 + 1)
! 			    j -= MAX_INSNS_PER_PEEP2 + 1;
! 			  old_insn = peep2_insn_data[j].insn;
! 			  gcc_assert (!CALL_P (old_insn));
! 			}
! 		      break;
! 		    }
! 
! 		  i = match_len + peep2_current;
! 		  if (i >= MAX_INSNS_PER_PEEP2 + 1)
! 		    i -= MAX_INSNS_PER_PEEP2 + 1;
! 
! 		  note = find_reg_note (peep2_insn_data[i].insn,
! 					REG_EH_REGION, NULL_RTX);
! 
! 		  /* Replace the old sequence with the new.  */
! 		  attempt = emit_insn_after_setloc (attempt,
! 						    peep2_insn_data[i].insn,
! 				       INSN_LOCATOR (peep2_insn_data[i].insn));
! 		  before_try = PREV_INSN (insn);
! 		  delete_insn_chain (insn, peep2_insn_data[i].insn, false);
! 
! 		  /* Re-insert the EH_REGION notes.  */
! 		  if (note || (was_call && nonlocal_goto_handler_labels))
! 		    {
! 		      edge eh_edge;
! 		      edge_iterator ei;
! 
! 		      FOR_EACH_EDGE (eh_edge, ei, bb->succs)
! 			if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
! 			  break;
! 
! 		      if (note)
! 			copy_reg_eh_region_note_backward (note, attempt,
! 							  before_try);
! 
! 		      if (eh_edge)
! 			for (x = attempt ; x != before_try ; x = PREV_INSN (x))
! 			  if (x != BB_END (bb)
! 			      && (can_throw_internal (x)
! 				  || can_nonlocal_goto (x)))
! 			    {
! 			      edge nfte, nehe;
! 			      int flags;
! 
! 			      nfte = split_block (bb, x);
! 			      flags = (eh_edge->flags
! 				       & (EDGE_EH | EDGE_ABNORMAL));
! 			      if (CALL_P (x))
! 				flags |= EDGE_ABNORMAL_CALL;
! 			      nehe = make_edge (nfte->src, eh_edge->dest,
! 						flags);
! 
! 			      nehe->probability = eh_edge->probability;
! 			      nfte->probability
! 				= REG_BR_PROB_BASE - nehe->probability;
! 
! 			      do_cleanup_cfg |= purge_dead_edges (nfte->dest);
! 			      bb = nfte->src;
! 			      eh_edge = nehe;
! 			    }
! 
! 		      /* Converting possibly trapping insn to non-trapping is
! 			 possible.  Zap dummy outgoing edges.  */
! 		      do_cleanup_cfg |= purge_dead_edges (bb);
! 		    }
! 
! 		  if (targetm.have_conditional_execution ())
! 		    {
! 		      for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
! 			peep2_insn_data[i].insn = NULL_RTX;
! 		      peep2_insn_data[peep2_current].insn = PEEP2_EOB;
! 		      peep2_current_count = 0;
! 		    }
! 		  else
! 		    {
! 		      /* Back up lifetime information past the end of the
! 			 newly created sequence.  */
! 		      if (++i >= MAX_INSNS_PER_PEEP2 + 1)
! 			i = 0;
! 		      bitmap_copy (live, peep2_insn_data[i].live_before);
! 
! 		      /* Update life information for the new sequence.  */
! 		      x = attempt;
! 		      do
! 			{
! 			  if (INSN_P (x))
! 			    {
! 			      if (--i < 0)
! 				i = MAX_INSNS_PER_PEEP2;
! 			      if (peep2_current_count < MAX_INSNS_PER_PEEP2
! 				  && peep2_insn_data[i].insn == NULL_RTX)
! 				peep2_current_count++;
! 			      peep2_insn_data[i].insn = x;
! 			      df_insn_rescan (x);
! 			      df_simulate_one_insn_backwards (bb, x, live);
! 			      bitmap_copy (peep2_insn_data[i].live_before,
! 					   live);
! 			    }
! 			  x = PREV_INSN (x);
! 			}
! 		      while (x != prev);
! 
! 		      peep2_current = i;
! 		    }
! 
! 		  /* If we generated a jump instruction, it won't have
! 		     JUMP_LABEL set.  Recompute after we're done.  */
! 		  for (x = attempt; x != before_try; x = PREV_INSN (x))
! 		    if (JUMP_P (x))
! 		      {
! 		        do_rebuild_jump_labels = true;
! 			break;
! 		      }
  		}
  	    }
  
! 	  if (insn == BB_HEAD (bb))
  	    break;
  	}
      }
  
--- 3109,3405 ----
    COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
  }
  
+ /* While scanning basic block BB, we found a match of length MATCH_LEN,
+    starting at INSN.  Perform the replacement, removing the old insns and
+    replacing them with ATTEMPT.  Returns the last insn emitted.  */
+ 
+ static rtx
+ peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
+ {
+   int i;
+   rtx last, note, before_try, x;
+   bool was_call = false;
+ 
+   /* If we are splitting a CALL_INSN, look for the CALL_INSN
+      in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
+      cfg-related call notes.  */
+   for (i = 0; i <= match_len; ++i)
+     {
+       int j;
+       rtx old_insn, new_insn, note;
+ 
+       j = peep2_buf_position (peep2_current + i);
+       old_insn = peep2_insn_data[j].insn;
+       if (!CALL_P (old_insn))
+ 	continue;
+       was_call = true;
+ 
+       new_insn = attempt;
+       while (new_insn != NULL_RTX)
+ 	{
+ 	  if (CALL_P (new_insn))
+ 	    break;
+ 	  new_insn = NEXT_INSN (new_insn);
+ 	}
+ 
+       gcc_assert (new_insn != NULL_RTX);
+ 
+       CALL_INSN_FUNCTION_USAGE (new_insn)
+ 	= CALL_INSN_FUNCTION_USAGE (old_insn);
+ 
+       for (note = REG_NOTES (old_insn);
+ 	   note;
+ 	   note = XEXP (note, 1))
+ 	switch (REG_NOTE_KIND (note))
+ 	  {
+ 	  case REG_NORETURN:
+ 	  case REG_SETJMP:
+ 	    add_reg_note (new_insn, REG_NOTE_KIND (note),
+ 			  XEXP (note, 0));
+ 	    break;
+ 	  default:
+ 	    /* Discard all other reg notes.  */
+ 	    break;
+ 	  }
+ 
+       /* Croak if there is another call in the sequence.  */
+       while (++i <= match_len)
+ 	{
+ 	  j = peep2_buf_position (peep2_current + i);
+ 	  old_insn = peep2_insn_data[j].insn;
+ 	  gcc_assert (!CALL_P (old_insn));
+ 	}
+       break;
+     }
+ 
+   i = peep2_buf_position (peep2_current + match_len);
+ 
+   note = find_reg_note (peep2_insn_data[i].insn, REG_EH_REGION, NULL_RTX);
+ 
+   /* Replace the old sequence with the new.  */
+   last = emit_insn_after_setloc (attempt,
+ 				 peep2_insn_data[i].insn,
+ 				 INSN_LOCATOR (peep2_insn_data[i].insn));
+   before_try = PREV_INSN (insn);
+   delete_insn_chain (insn, peep2_insn_data[i].insn, false);
+ 
+   /* Re-insert the EH_REGION notes.  */
+   if (note || (was_call && nonlocal_goto_handler_labels))
+     {
+       edge eh_edge;
+       edge_iterator ei;
+ 
+       FOR_EACH_EDGE (eh_edge, ei, bb->succs)
+ 	if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
+ 	  break;
+ 
+       if (note)
+ 	copy_reg_eh_region_note_backward (note, last, before_try);
+ 
+       if (eh_edge)
+ 	for (x = last; x != before_try; x = PREV_INSN (x))
+ 	  if (x != BB_END (bb)
+ 	      && (can_throw_internal (x)
+ 		  || can_nonlocal_goto (x)))
+ 	    {
+ 	      edge nfte, nehe;
+ 	      int flags;
+ 
+ 	      nfte = split_block (bb, x);
+ 	      flags = (eh_edge->flags
+ 		       & (EDGE_EH | EDGE_ABNORMAL));
+ 	      if (CALL_P (x))
+ 		flags |= EDGE_ABNORMAL_CALL;
+ 	      nehe = make_edge (nfte->src, eh_edge->dest,
+ 				flags);
+ 
+ 	      nehe->probability = eh_edge->probability;
+ 	      nfte->probability
+ 		= REG_BR_PROB_BASE - nehe->probability;
+ 
+ 	      peep2_do_cleanup_cfg |= purge_dead_edges (nfte->dest);
+ 	      bb = nfte->src;
+ 	      eh_edge = nehe;
+ 	    }
+ 
+       /* Converting possibly trapping insn to non-trapping is
+ 	 possible.  Zap dummy outgoing edges.  */
+       peep2_do_cleanup_cfg |= purge_dead_edges (bb);
+     }
+ 
+   /* If we generated a jump instruction, it won't have
+      JUMP_LABEL set.  Recompute after we're done.  */
+   for (x = last; x != before_try; x = PREV_INSN (x))
+     if (JUMP_P (x))
+       {
+ 	peep2_do_rebuild_jump_labels = true;
+ 	break;
+       }
+ 
+   return last;
+ }
+ 
+ /* After performing a replacement in basic block BB, fix up the life
+    information in our buffer.  LAST is the last of the insns that we
+    emitted as a replacement.  PREV is the insn before the start of
+    the replacement.  MATCH_LEN is the number of instructions that were
+    matched, and which now need to be replaced in the buffer.  */
+ 
+ static void
+ peep2_update_life (basic_block bb, int match_len, rtx last, rtx prev)
+ {
+   int i = peep2_buf_position (peep2_current + match_len + 1);
+   rtx x;
+   regset_head live;
+ 
+   INIT_REG_SET (&live);
+   COPY_REG_SET (&live, peep2_insn_data[i].live_before);
+ 
+   gcc_assert (peep2_current_count >= match_len + 1);
+   peep2_current_count -= match_len + 1;
+ 
+   x = last;
+   do
+     {
+       if (INSN_P (x))
+ 	{
+ 	  df_insn_rescan (x);
+ 	  if (peep2_current_count < MAX_INSNS_PER_PEEP2)
+ 	    {
+ 	      peep2_current_count++;
+ 	      if (--i < 0)
+ 		i = MAX_INSNS_PER_PEEP2;
+ 	      peep2_insn_data[i].insn = x;
+ 	      df_simulate_one_insn_backwards (bb, x, &live);
+ 	      COPY_REG_SET (peep2_insn_data[i].live_before, &live);
+ 	    }
+ 	}
+       x = PREV_INSN (x);
+     }
+   while (x != prev);
+   CLEAR_REG_SET (&live);
+ 
+   peep2_current = i;
+ }
+ 
+ /* Add INSN, which is in BB, at the end of the peep2 insn buffer if possible.
+    Return true if we added it, false otherwise.  */
+ 
+ static bool
+ peep2_fill_buffer (basic_block bb, rtx insn, regset live)
+ {
+   int pos;
+ 
+   if (peep2_current_count == MAX_INSNS_PER_PEEP2)
+     return false;
+ 
+   /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
+      the REG_FRAME_RELATED_EXPR that is attached.  */
+   if (RTX_FRAME_RELATED_P (insn))
+     {
+       /* Let the buffer drain first.  */
+       if (peep2_current_count > 0)
+ 	return false;
+       df_simulate_one_insn_forwards (bb, insn, live);
+       return true;
+     }
+ 
+   pos = peep2_buf_position (peep2_current + peep2_current_count);
+   peep2_insn_data[pos].insn = insn;
+   COPY_REG_SET (peep2_insn_data[pos].live_before, live);
+   peep2_current_count++;
+ 
+   df_simulate_one_insn_forwards (bb, insn, live);
+   return true;
+ }
+ 
  /* Perform the peephole2 optimization pass.  */
  
  static void
  peephole2_optimize (void)
  {
!   rtx insn;
!   bitmap live, saved_live;
!   rtx saved_live_insn;
    int i;
    basic_block bb;
! 
!   peep2_do_cleanup_cfg = false;
!   peep2_do_rebuild_jump_labels = false;
  
    df_set_flags (DF_LR_RUN_DCE);
+   df_note_add_problem ();
    df_analyze ();
  
    /* Initialize the regsets we're going to use.  */
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
    live = BITMAP_ALLOC (&reg_obstack);
+   saved_live = BITMAP_ALLOC (&reg_obstack);
  
    FOR_EACH_BB_REVERSE (bb)
      {
+       bool past_end = false;
+       int pos;
+ 
        rtl_profile_for_bb (bb);
  
        /* Start up propagation.  */
!       bitmap_copy (live, DF_LR_IN (bb));
!       df_simulate_initialize_forwards (bb, live);
        peep2_reinit_state (live);
  
!       saved_live_insn = NULL_RTX;
!       
!       insn = BB_HEAD (bb);
!       for (;;)
  	{
! 	  rtx attempt, head;
! 	  int match_len;
  
! 	  if (!past_end && !NONDEBUG_INSN_P (insn))
! 	    {
! 	    next_insn:
! 	      insn = NEXT_INSN (insn);
! 	      if (insn == saved_live_insn)
  		{
! 		  COPY_REG_SET (live, saved_live);
! 		  saved_live_insn = NULL_RTX;
  		}
+ 	      if (insn == NEXT_INSN (BB_END (bb)))
+ 		past_end = true;
+ 	      continue;
  	    }
+ 	  if (!past_end && peep2_fill_buffer (bb, insn, live))
+ 	    goto next_insn;
  
! 	  /* If we did not fill an empty buffer, it signals the end of the
! 	     block.  */
! 	  if (peep2_current_count == 0)
  	    break;
+ 
+ 	  /* The buffer filled to the current maximum, so try to match.  */
+ 
+ 	  pos = peep2_buf_position (peep2_current + peep2_current_count);
+ 	  peep2_insn_data[pos].insn = PEEP2_EOB;
+ 	  COPY_REG_SET (peep2_insn_data[pos].live_before, live);
+ 
+ 	  /* Match the peephole.  */
+ 	  head = peep2_insn_data[peep2_current].insn;
+ 	  attempt = peephole2_insns (PATTERN (head), head, &match_len);
+ 	  if (attempt != NULL)
+ 	    {
+ 	      rtx before_head = PREV_INSN (head);
+ 	      rtx last;
+ 	      last = peep2_attempt (bb, head, match_len, attempt);
+ 	      peep2_update_life (bb, match_len, last, PREV_INSN (attempt));
+ 	    }
+ 	  else
+ 	    {
+ 	      /* If no match, advance the buffer by one insn.  */
+ 	      peep2_current = peep2_buf_position (peep2_current + 1);
+ 	      peep2_current_count--;
+ 	    }
  	}
      }
  
*************** peephole2_optimize (void)
*** 3343,3349 ****
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      BITMAP_FREE (peep2_insn_data[i].live_before);
    BITMAP_FREE (live);
!   if (do_rebuild_jump_labels)
      rebuild_jump_labels (get_insns ());
  }
  #endif /* HAVE_peephole2 */
--- 3407,3413 ----
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      BITMAP_FREE (peep2_insn_data[i].live_before);
    BITMAP_FREE (live);
!   if (peep2_do_rebuild_jump_labels)
      rebuild_jump_labels (get_insns ());
  }
  #endif /* HAVE_peephole2 */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Resubmit/ping: peephole2 vs cond-exec vs df
  2010-04-22 16:36 peephole2 vs cond-exec vs df Bernd Schmidt
@ 2010-06-07 14:46 ` Bernd Schmidt
  2010-06-14 10:17   ` Bernd Schmidt
  2010-06-29  4:46   ` Resubmit/ping: " Richard Henderson
  2010-06-14 12:28 ` Paolo Bonzini
  1 sibling, 2 replies; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-07 14:46 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3006 bytes --]

On 04/22/2010 06:14 PM, Bernd Schmidt wrote:
> On targets with conditional execution, the peephole2 pass fails to retry
> insns it generates against other peepholes.  This code is responsible:
>                  if (targetm.have_conditional_execution ())
>                     {
>                       for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
>                         peep2_insn_data[i].insn = NULL_RTX;
>                       peep2_insn_data[peep2_current].insn = PEEP2_EOB;
>                       peep2_current_count = 0;
>                     }
>                   else
> 
> This appears to have been added a long time ago, when we had code that
> tried to track register lifetimes accurately through cond_exec insns.
> However, all that code was apparently lost when df was introduced -
> df_simulate_one_insn_backwards has no trace of it - so we needlessly
> restrict cond_exec ports here.
> 
> My first reaction was to simply delete the special case from peephole2;
> this seems easier than re-adding the lost code to df, and for my ARM
> peepholes patch it would be good to be able to apply peepholes on insns
> generated by another peephole2.
> 
> Thinking about it some more, I came up with two ideas which I think are
> better.
> 
> First, I noticed that sometimes small peepholes are applied when a
> larger one would have matched.  I've modified the code so that it now
> tries to fill the buffer up to the maximum number of insns it can hold,
> and only then does it start to apply peepholes.  If nothing matched, it
> discards the first insn from the buffer, the empty space is then filled
> in the next iteration.  This eliminates the need in my ARM patch to
> apply peepholes multiple times, since we should always see as many insns
> as the largest peephole can handle.
> 
> Second, I've changed it to use a forward scan.  As far as I am aware, in
> the presence of conditional execution, a forward scan does not need to
> keep track of extra state - it only relies on correct REG_DEAD notes.  I
> don't actually know whether df produces correct death notes in the
> presence of conditional execution (I suspect it does not - can anyone
> say for sure?), but in any case using a forward scan here shifts the
> problem out of recog.c entirely.  Only when performing a substitution do
> we process the new insns in a backward scan, since they won't have
> REG_DEAD notes.  While this leaves us with potentially overestimating
> liveness if a substitution produces cond_exec insns, I think on the
> whole it's the most accurate algorithm we can easily implement for
> peephole2.

Note that I'm convinced now that df does _not_ produce correct REG_DEAD
notes in the presence of conditional execution.  However, I believe this
patch will enable peephole2 to make use of any future enhancements to df
in this regard without further changes.

Here's a new version with a small tweak for i386.md to deal with the new
order.  Bootstrapped and regression tested on i686-linux.  Ok?


Bernd

[-- Attachment #2: peep2-forward-v4.diff --]
[-- Type: text/plain, Size: 21127 bytes --]

	* recog.c (peep2_do_rebuild_jump_labels, peep2_do_cleanup_cfg): New
	static variables.
	(peep2_buf_position): New static function.
	(peep2_regno_dead_p, peep2_reg_dead_p, peep2_find_free_register,
	peephole2_optimize): Use it.
	(peep2_attempt, peep2_update_life): New static functions, broken out
	of peephole2_optimize.
	(peep2_fill_buffer): New static function.
	(peephole2_optimize): Change the main loop to try to fill the buffer
	with the maximum number of insns before matching them against
	peepholes.  Use a forward scan.  Remove special case for targets with
	conditional execution.
	* config/i386/i386.md (peephole2 for arithmetic ops with memory):
	Rewrite so as not to expect the second insn to have had a peephole
	applied yet.

Index: recog.c
===================================================================
--- recog.c	(revision 160261)
+++ recog.c	(working copy)
@@ -2906,6 +2906,10 @@ struct peep2_insn_data
 
 static struct peep2_insn_data peep2_insn_data[MAX_INSNS_PER_PEEP2 + 1];
 static int peep2_current;
+
+static bool peep2_do_rebuild_jump_labels;
+static bool peep2_do_cleanup_cfg;
+
 /* The number of instructions available to match a peep2.  */
 int peep2_current_count;
 
@@ -2914,6 +2918,16 @@ int peep2_current_count;
    DF_LIVE_OUT for the block.  */
 #define PEEP2_EOB	pc_rtx
 
+/* Wrap N to fit into the peep2_insn_data buffer.  */
+
+static int
+peep2_buf_position (int n)
+{
+  if (n >= MAX_INSNS_PER_PEEP2 + 1)
+    n -= MAX_INSNS_PER_PEEP2 + 1;
+  return n;
+}
+
 /* Return the Nth non-note insn after `current', or return NULL_RTX if it
    does not exist.  Used by the recognizer to find the next insn to match
    in a multi-insn pattern.  */
@@ -2923,9 +2937,7 @@ peep2_next_insn (int n)
 {
   gcc_assert (n <= peep2_current_count);
 
-  n += peep2_current;
-  if (n >= MAX_INSNS_PER_PEEP2 + 1)
-    n -= MAX_INSNS_PER_PEEP2 + 1;
+  n = peep2_buf_position (peep2_current + n);
 
   return peep2_insn_data[n].insn;
 }
@@ -2938,9 +2950,7 @@ peep2_regno_dead_p (int ofs, int regno)
 {
   gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
 
-  ofs += peep2_current;
-  if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
-    ofs -= MAX_INSNS_PER_PEEP2 + 1;
+  ofs = peep2_buf_position (peep2_current + ofs);
 
   gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
 
@@ -2956,9 +2966,7 @@ peep2_reg_dead_p (int ofs, rtx reg)
 
   gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
 
-  ofs += peep2_current;
-  if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
-    ofs -= MAX_INSNS_PER_PEEP2 + 1;
+  ofs = peep2_buf_position (peep2_current + ofs);
 
   gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
 
@@ -2993,12 +3001,8 @@ peep2_find_free_register (int from, int 
   gcc_assert (from < MAX_INSNS_PER_PEEP2 + 1);
   gcc_assert (to < MAX_INSNS_PER_PEEP2 + 1);
 
-  from += peep2_current;
-  if (from >= MAX_INSNS_PER_PEEP2 + 1)
-    from -= MAX_INSNS_PER_PEEP2 + 1;
-  to += peep2_current;
-  if (to >= MAX_INSNS_PER_PEEP2 + 1)
-    to -= MAX_INSNS_PER_PEEP2 + 1;
+  from = peep2_buf_position (peep2_current + from);
+  to = peep2_buf_position (peep2_current + to);
 
   gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
   REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);
@@ -3007,8 +3011,7 @@ peep2_find_free_register (int from, int 
     {
       HARD_REG_SET this_live;
 
-      if (++from >= MAX_INSNS_PER_PEEP2 + 1)
-	from = 0;
+      from = peep2_buf_position (from + 1);
       gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
       REG_SET_TO_HARD_REG_SET (this_live, peep2_insn_data[from].live_before);
       IOR_HARD_REG_SET (live, this_live);
@@ -3101,236 +3104,297 @@ peep2_reinit_state (regset live)
   COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
 }
 
-/* Perform the peephole2 optimization pass.  */
+/* While scanning basic block BB, we found a match of length MATCH_LEN,
+   starting at INSN.  Perform the replacement, removing the old insns and
+   replacing them with ATTEMPT.  Returns the last insn emitted.  */
 
-static void
-peephole2_optimize (void)
+static rtx
+peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
 {
-  rtx insn, prev;
-  bitmap live;
   int i;
-  basic_block bb;
-  bool do_cleanup_cfg = false;
-  bool do_rebuild_jump_labels = false;
+  rtx last, note, before_try, x;
+  bool was_call = false;
 
-  df_set_flags (DF_LR_RUN_DCE);
-  df_analyze ();
+  /* If we are splitting a CALL_INSN, look for the CALL_INSN
+     in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
+     cfg-related call notes.  */
+  for (i = 0; i <= match_len; ++i)
+    {
+      int j;
+      rtx old_insn, new_insn, note;
 
-  /* Initialize the regsets we're going to use.  */
-  for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
-    peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
-  live = BITMAP_ALLOC (&reg_obstack);
+      j = peep2_buf_position (peep2_current + i);
+      old_insn = peep2_insn_data[j].insn;
+      if (!CALL_P (old_insn))
+	continue;
+      was_call = true;
 
-  FOR_EACH_BB_REVERSE (bb)
-    {
-      rtl_profile_for_bb (bb);
+      new_insn = attempt;
+      while (new_insn != NULL_RTX)
+	{
+	  if (CALL_P (new_insn))
+	    break;
+	  new_insn = NEXT_INSN (new_insn);
+	}
 
-      /* Start up propagation.  */
-      bitmap_copy (live, DF_LR_OUT (bb));
-      df_simulate_initialize_backwards (bb, live);
-      peep2_reinit_state (live);
+      gcc_assert (new_insn != NULL_RTX);
 
-      for (insn = BB_END (bb); ; insn = prev)
+      CALL_INSN_FUNCTION_USAGE (new_insn)
+	= CALL_INSN_FUNCTION_USAGE (old_insn);
+
+      for (note = REG_NOTES (old_insn);
+	   note;
+	   note = XEXP (note, 1))
+	switch (REG_NOTE_KIND (note))
+	  {
+	  case REG_NORETURN:
+	  case REG_SETJMP:
+	    add_reg_note (new_insn, REG_NOTE_KIND (note),
+			  XEXP (note, 0));
+	    break;
+	  default:
+	    /* Discard all other reg notes.  */
+	    break;
+	  }
+
+      /* Croak if there is another call in the sequence.  */
+      while (++i <= match_len)
 	{
-	  prev = PREV_INSN (insn);
-	  if (NONDEBUG_INSN_P (insn))
+	  j = peep2_buf_position (peep2_current + i);
+	  old_insn = peep2_insn_data[j].insn;
+	  gcc_assert (!CALL_P (old_insn));
+	}
+      break;
+    }
+
+  i = peep2_buf_position (peep2_current + match_len);
+
+  note = find_reg_note (peep2_insn_data[i].insn, REG_EH_REGION, NULL_RTX);
+
+  /* Replace the old sequence with the new.  */
+  last = emit_insn_after_setloc (attempt,
+				 peep2_insn_data[i].insn,
+				 INSN_LOCATOR (peep2_insn_data[i].insn));
+  before_try = PREV_INSN (insn);
+  delete_insn_chain (insn, peep2_insn_data[i].insn, false);
+
+  /* Re-insert the EH_REGION notes.  */
+  if (note || (was_call && nonlocal_goto_handler_labels))
+    {
+      edge eh_edge;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (eh_edge, ei, bb->succs)
+	if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
+	  break;
+
+      if (note)
+	copy_reg_eh_region_note_backward (note, last, before_try);
+
+      if (eh_edge)
+	for (x = last; x != before_try; x = PREV_INSN (x))
+	  if (x != BB_END (bb)
+	      && (can_throw_internal (x)
+		  || can_nonlocal_goto (x)))
 	    {
-	      rtx attempt, before_try, x;
-	      int match_len;
-	      rtx note;
-	      bool was_call = false;
+	      edge nfte, nehe;
+	      int flags;
 
-	      /* Record this insn.  */
-	      if (--peep2_current < 0)
-		peep2_current = MAX_INSNS_PER_PEEP2;
-	      if (peep2_current_count < MAX_INSNS_PER_PEEP2
-		  && peep2_insn_data[peep2_current].insn == NULL_RTX)
-		peep2_current_count++;
-	      peep2_insn_data[peep2_current].insn = insn;
-	      df_simulate_one_insn_backwards (bb, insn, live);
-	      COPY_REG_SET (peep2_insn_data[peep2_current].live_before, live);
+	      nfte = split_block (bb, x);
+	      flags = (eh_edge->flags
+		       & (EDGE_EH | EDGE_ABNORMAL));
+	      if (CALL_P (x))
+		flags |= EDGE_ABNORMAL_CALL;
+	      nehe = make_edge (nfte->src, eh_edge->dest,
+				flags);
 
-	      if (RTX_FRAME_RELATED_P (insn))
-		{
-		  /* If an insn has RTX_FRAME_RELATED_P set, peephole
-		     substitution would lose the
-		     REG_FRAME_RELATED_EXPR that is attached.  */
-		  peep2_reinit_state (live);
-		  attempt = NULL;
-		}
-	      else
-		/* Match the peephole.  */
-		attempt = peephole2_insns (PATTERN (insn), insn, &match_len);
+	      nehe->probability = eh_edge->probability;
+	      nfte->probability
+		= REG_BR_PROB_BASE - nehe->probability;
 
-	      if (attempt != NULL)
-		{
-		  /* If we are splitting a CALL_INSN, look for the CALL_INSN
-		     in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
-		     cfg-related call notes.  */
-		  for (i = 0; i <= match_len; ++i)
-		    {
-		      int j;
-		      rtx old_insn, new_insn, note;
+	      peep2_do_cleanup_cfg |= purge_dead_edges (nfte->dest);
+	      bb = nfte->src;
+	      eh_edge = nehe;
+	    }
 
-		      j = i + peep2_current;
-		      if (j >= MAX_INSNS_PER_PEEP2 + 1)
-			j -= MAX_INSNS_PER_PEEP2 + 1;
-		      old_insn = peep2_insn_data[j].insn;
-		      if (!CALL_P (old_insn))
-			continue;
-		      was_call = true;
+      /* Converting possibly trapping insn to non-trapping is
+	 possible.  Zap dummy outgoing edges.  */
+      peep2_do_cleanup_cfg |= purge_dead_edges (bb);
+    }
 
-		      new_insn = attempt;
-		      while (new_insn != NULL_RTX)
-			{
-			  if (CALL_P (new_insn))
-			    break;
-			  new_insn = NEXT_INSN (new_insn);
-			}
+  /* If we generated a jump instruction, it won't have
+     JUMP_LABEL set.  Recompute after we're done.  */
+  for (x = last; x != before_try; x = PREV_INSN (x))
+    if (JUMP_P (x))
+      {
+	peep2_do_rebuild_jump_labels = true;
+	break;
+      }
 
-		      gcc_assert (new_insn != NULL_RTX);
+  return last;
+}
 
-		      CALL_INSN_FUNCTION_USAGE (new_insn)
-			= CALL_INSN_FUNCTION_USAGE (old_insn);
+/* After performing a replacement in basic block BB, fix up the life
+   information in our buffer.  LAST is the last of the insns that we
+   emitted as a replacement.  PREV is the insn before the start of
+   the replacement.  MATCH_LEN is the number of instructions that were
+   matched, and which now need to be replaced in the buffer.  */
 
-		      for (note = REG_NOTES (old_insn);
-			   note;
-			   note = XEXP (note, 1))
-			switch (REG_NOTE_KIND (note))
-			  {
-			  case REG_NORETURN:
-			  case REG_SETJMP:
-			    add_reg_note (new_insn, REG_NOTE_KIND (note),
-					  XEXP (note, 0));
-			    break;
-			  default:
-			    /* Discard all other reg notes.  */
-			    break;
-			  }
+static void
+peep2_update_life (basic_block bb, int match_len, rtx last, rtx prev)
+{
+  int i = peep2_buf_position (peep2_current + match_len + 1);
+  rtx x;
+  regset_head live;
 
-		      /* Croak if there is another call in the sequence.  */
-		      while (++i <= match_len)
-			{
-			  j = i + peep2_current;
-			  if (j >= MAX_INSNS_PER_PEEP2 + 1)
-			    j -= MAX_INSNS_PER_PEEP2 + 1;
-			  old_insn = peep2_insn_data[j].insn;
-			  gcc_assert (!CALL_P (old_insn));
-			}
-		      break;
-		    }
+  INIT_REG_SET (&live);
+  COPY_REG_SET (&live, peep2_insn_data[i].live_before);
 
-		  i = match_len + peep2_current;
-		  if (i >= MAX_INSNS_PER_PEEP2 + 1)
-		    i -= MAX_INSNS_PER_PEEP2 + 1;
+  gcc_assert (peep2_current_count >= match_len + 1);
+  peep2_current_count -= match_len + 1;
 
-		  note = find_reg_note (peep2_insn_data[i].insn,
-					REG_EH_REGION, NULL_RTX);
+  x = last;
+  do
+    {
+      if (INSN_P (x))
+	{
+	  df_insn_rescan (x);
+	  if (peep2_current_count < MAX_INSNS_PER_PEEP2)
+	    {
+	      peep2_current_count++;
+	      if (--i < 0)
+		i = MAX_INSNS_PER_PEEP2;
+	      peep2_insn_data[i].insn = x;
+	      df_simulate_one_insn_backwards (bb, x, &live);
+	      COPY_REG_SET (peep2_insn_data[i].live_before, &live);
+	    }
+	}
+      x = PREV_INSN (x);
+    }
+  while (x != prev);
+  CLEAR_REG_SET (&live);
 
-		  /* Replace the old sequence with the new.  */
-		  attempt = emit_insn_after_setloc (attempt,
-						    peep2_insn_data[i].insn,
-				       INSN_LOCATOR (peep2_insn_data[i].insn));
-		  before_try = PREV_INSN (insn);
-		  delete_insn_chain (insn, peep2_insn_data[i].insn, false);
+  peep2_current = i;
+}
 
-		  /* Re-insert the EH_REGION notes.  */
-		  if (note || (was_call && nonlocal_goto_handler_labels))
-		    {
-		      edge eh_edge;
-		      edge_iterator ei;
+/* Add INSN, which is in BB, at the end of the peep2 insn buffer if possible.
+   Return true if we added it, false otherwise.  */
 
-		      FOR_EACH_EDGE (eh_edge, ei, bb->succs)
-			if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
-			  break;
+static bool
+peep2_fill_buffer (basic_block bb, rtx insn, regset live)
+{
+  int pos;
 
-		      if (note)
-			copy_reg_eh_region_note_backward (note, attempt,
-							  before_try);
+  if (peep2_current_count == MAX_INSNS_PER_PEEP2)
+    return false;
 
-		      if (eh_edge)
-			for (x = attempt ; x != before_try ; x = PREV_INSN (x))
-			  if (x != BB_END (bb)
-			      && (can_throw_internal (x)
-				  || can_nonlocal_goto (x)))
-			    {
-			      edge nfte, nehe;
-			      int flags;
+  /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
+     the REG_FRAME_RELATED_EXPR that is attached.  */
+  if (RTX_FRAME_RELATED_P (insn))
+    {
+      /* Let the buffer drain first.  */
+      if (peep2_current_count > 0)
+	return false;
+      df_simulate_one_insn_forwards (bb, insn, live);
+      return true;
+    }
 
-			      nfte = split_block (bb, x);
-			      flags = (eh_edge->flags
-				       & (EDGE_EH | EDGE_ABNORMAL));
-			      if (CALL_P (x))
-				flags |= EDGE_ABNORMAL_CALL;
-			      nehe = make_edge (nfte->src, eh_edge->dest,
-						flags);
+  pos = peep2_buf_position (peep2_current + peep2_current_count);
+  peep2_insn_data[pos].insn = insn;
+  COPY_REG_SET (peep2_insn_data[pos].live_before, live);
+  peep2_current_count++;
 
-			      nehe->probability = eh_edge->probability;
-			      nfte->probability
-				= REG_BR_PROB_BASE - nehe->probability;
+  df_simulate_one_insn_forwards (bb, insn, live);
+  return true;
+}
 
-			      do_cleanup_cfg |= purge_dead_edges (nfte->dest);
-			      bb = nfte->src;
-			      eh_edge = nehe;
-			    }
+/* Perform the peephole2 optimization pass.  */
 
-		      /* Converting possibly trapping insn to non-trapping is
-			 possible.  Zap dummy outgoing edges.  */
-		      do_cleanup_cfg |= purge_dead_edges (bb);
-		    }
+static void
+peephole2_optimize (void)
+{
+  rtx insn;
+  bitmap live, saved_live;
+  rtx saved_live_insn;
+  int i;
+  basic_block bb;
 
-		  if (targetm.have_conditional_execution ())
-		    {
-		      for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
-			peep2_insn_data[i].insn = NULL_RTX;
-		      peep2_insn_data[peep2_current].insn = PEEP2_EOB;
-		      peep2_current_count = 0;
-		    }
-		  else
-		    {
-		      /* Back up lifetime information past the end of the
-			 newly created sequence.  */
-		      if (++i >= MAX_INSNS_PER_PEEP2 + 1)
-			i = 0;
-		      bitmap_copy (live, peep2_insn_data[i].live_before);
+  peep2_do_cleanup_cfg = false;
+  peep2_do_rebuild_jump_labels = false;
 
-		      /* Update life information for the new sequence.  */
-		      x = attempt;
-		      do
-			{
-			  if (INSN_P (x))
-			    {
-			      if (--i < 0)
-				i = MAX_INSNS_PER_PEEP2;
-			      if (peep2_current_count < MAX_INSNS_PER_PEEP2
-				  && peep2_insn_data[i].insn == NULL_RTX)
-				peep2_current_count++;
-			      peep2_insn_data[i].insn = x;
-			      df_insn_rescan (x);
-			      df_simulate_one_insn_backwards (bb, x, live);
-			      bitmap_copy (peep2_insn_data[i].live_before,
-					   live);
-			    }
-			  x = PREV_INSN (x);
-			}
-		      while (x != prev);
+  df_set_flags (DF_LR_RUN_DCE);
+  df_note_add_problem ();
+  df_analyze ();
 
-		      peep2_current = i;
-		    }
+  /* Initialize the regsets we're going to use.  */
+  for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
+    peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
+  live = BITMAP_ALLOC (&reg_obstack);
+  saved_live = BITMAP_ALLOC (&reg_obstack);
 
-		  /* If we generated a jump instruction, it won't have
-		     JUMP_LABEL set.  Recompute after we're done.  */
-		  for (x = attempt; x != before_try; x = PREV_INSN (x))
-		    if (JUMP_P (x))
-		      {
-		        do_rebuild_jump_labels = true;
-			break;
-		      }
+  FOR_EACH_BB_REVERSE (bb)
+    {
+      bool past_end = false;
+      int pos;
+
+      rtl_profile_for_bb (bb);
+
+      /* Start up propagation.  */
+      bitmap_copy (live, DF_LR_IN (bb));
+      df_simulate_initialize_forwards (bb, live);
+      peep2_reinit_state (live);
+
+      saved_live_insn = NULL_RTX;
+      
+      insn = BB_HEAD (bb);
+      for (;;)
+	{
+	  rtx attempt, head;
+	  int match_len;
+
+	  if (!past_end && !NONDEBUG_INSN_P (insn))
+	    {
+	    next_insn:
+	      insn = NEXT_INSN (insn);
+	      if (insn == saved_live_insn)
+		{
+		  COPY_REG_SET (live, saved_live);
+		  saved_live_insn = NULL_RTX;
 		}
+	      if (insn == NEXT_INSN (BB_END (bb)))
+		past_end = true;
+	      continue;
 	    }
+	  if (!past_end && peep2_fill_buffer (bb, insn, live))
+	    goto next_insn;
 
-	  if (insn == BB_HEAD (bb))
+	  /* If we did not fill an empty buffer, it signals the end of the
+	     block.  */
+	  if (peep2_current_count == 0)
 	    break;
+
+	  /* The buffer filled to the current maximum, so try to match.  */
+
+	  pos = peep2_buf_position (peep2_current + peep2_current_count);
+	  peep2_insn_data[pos].insn = PEEP2_EOB;
+	  COPY_REG_SET (peep2_insn_data[pos].live_before, live);
+
+	  /* Match the peephole.  */
+	  head = peep2_insn_data[peep2_current].insn;
+	  attempt = peephole2_insns (PATTERN (head), head, &match_len);
+	  if (attempt != NULL)
+	    {
+	      rtx before_head = PREV_INSN (head);
+	      rtx last;
+	      last = peep2_attempt (bb, head, match_len, attempt);
+	      peep2_update_life (bb, match_len, last, PREV_INSN (attempt));
+	    }
+	  else
+	    {
+	      /* If no match, advance the buffer by one insn.  */
+	      peep2_current = peep2_buf_position (peep2_current + 1);
+	      peep2_current_count--;
+	    }
 	}
     }
 
@@ -3338,7 +3402,7 @@ peephole2_optimize (void)
   for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
     BITMAP_FREE (peep2_insn_data[i].live_before);
   BITMAP_FREE (live);
-  if (do_rebuild_jump_labels)
+  if (peep2_do_rebuild_jump_labels)
     rebuild_jump_labels (get_insns ());
 }
 #endif /* HAVE_peephole2 */
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 160261)
+++ config/i386/i386.md	(working copy)
@@ -18097,15 +18097,14 @@ (define_peephole2
 ;;  leal    (%edx,%eax,4), %eax
 
 (define_peephole2
-  [(parallel [(set (match_operand 0 "register_operand" "")
+  [(match_scratch:SI 5 "r")
+   (parallel [(set (match_operand 0 "register_operand" "")
 		   (ashift (match_operand 1 "register_operand" "")
 			   (match_operand 2 "const_int_operand" "")))
 	       (clobber (reg:CC FLAGS_REG))])
-   (set (match_operand 3 "register_operand")
-        (match_operand 4 "x86_64_general_operand" ""))
-   (parallel [(set (match_operand 5 "register_operand" "")
-		   (plus (match_operand 6 "register_operand" "")
-			 (match_operand 7 "register_operand" "")))
+   (parallel [(set (match_operand 3 "register_operand" "")
+		   (plus (match_dup 0)
+			 (match_operand 4 "x86_64_general_operand" "")))
 		   (clobber (reg:CC FLAGS_REG))])]
   "INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) <= 3
    /* Validate MODE for lea.  */
@@ -18115,30 +18114,21 @@ (define_peephole2
        || GET_MODE (operands[0]) == SImode
        || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
    /* We reorder load and the shift.  */
-   && !rtx_equal_p (operands[1], operands[3])
-   && !reg_overlap_mentioned_p (operands[0], operands[4])
-   /* Last PLUS must consist of operand 0 and 3.  */
-   && !rtx_equal_p (operands[0], operands[3])
-   && (rtx_equal_p (operands[3], operands[6])
-       || rtx_equal_p (operands[3], operands[7]))
-   && (rtx_equal_p (operands[0], operands[6])
-       || rtx_equal_p (operands[0], operands[7]))
-   /* The intermediate operand 0 must die or be same as output.  */
-   && (rtx_equal_p (operands[0], operands[5])
-       || peep2_reg_dead_p (3, operands[0]))"
-  [(set (match_dup 3) (match_dup 4))
+   && !reg_overlap_mentioned_p (operands[0], operands[4])"
+  [(set (match_dup 5) (match_dup 4))
    (set (match_dup 0) (match_dup 1))]
 {
-  enum machine_mode mode = GET_MODE (operands[5]) == DImode ? DImode : SImode;
+  enum machine_mode mode = GET_MODE (operands[1]) == DImode ? DImode : SImode;
   int scale = 1 << INTVAL (operands[2]);
   rtx index = gen_lowpart (Pmode, operands[1]);
-  rtx base = gen_lowpart (Pmode, operands[3]);
-  rtx dest = gen_lowpart (mode, operands[5]);
+  rtx base = gen_lowpart (Pmode, operands[5]);
+  rtx dest = gen_lowpart (mode, operands[3]);
 
   operands[1] = gen_rtx_PLUS (Pmode, base,
   			      gen_rtx_MULT (Pmode, index, GEN_INT (scale)));
   if (mode != Pmode)
     operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
+  operands[5] = base;
   operands[0] = dest;
 })
 \f

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-07 14:46 ` Resubmit/ping: " Bernd Schmidt
@ 2010-06-14 10:17   ` Bernd Schmidt
  2010-06-21 14:14     ` Ping^5: " Bernd Schmidt
  2010-06-29  4:46   ` Resubmit/ping: " Richard Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-14 10:17 UTC (permalink / raw)
  To: GCC Patches

Another ping:  Change peephole2 to do a forward scan, and to fill its
buffer before trying to match things:
  http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00536.html


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: peephole2 vs cond-exec vs df
  2010-04-22 16:36 peephole2 vs cond-exec vs df Bernd Schmidt
  2010-06-07 14:46 ` Resubmit/ping: " Bernd Schmidt
@ 2010-06-14 12:28 ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2010-06-14 12:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches


> Second, I've changed it to use a forward scan.  As far as I am aware, in
> the presence of conditional execution, a forward scan does not need to
> keep track of extra state - it only relies on correct REG_DEAD notes.  I
> don't actually know whether df produces correct death notes in the
> presence of conditional execution (I suspect it does not - can anyone
> say for sure?), but in any case using a forward scan here shifts the
> problem out of recog.c entirely.  Only when performing a substitution do
> we process the new insns in a backward scan, since they won't have
> REG_DEAD notes.

Yes, I thought a bit about it and it seems accurate.  At least I 
couldn't find a counterexample. :)

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping^5: peephole2 vs cond-exec vs df
  2010-06-14 10:17   ` Bernd Schmidt
@ 2010-06-21 14:14     ` Bernd Schmidt
  2010-06-28 12:37       ` Ping^6: " Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-21 14:14 UTC (permalink / raw)
  To: GCC Patches

Change peephole2 to do a forward scan, and to fill its buffer before
trying to match things:
   http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00536.html


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Ping^6: peephole2 vs cond-exec vs df
  2010-06-21 14:14     ` Ping^5: " Bernd Schmidt
@ 2010-06-28 12:37       ` Bernd Schmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-28 12:37 UTC (permalink / raw)
  To: GCC Patches

Change peephole2 to do a forward scan, and to fill its buffer before
trying to match things:
   http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00536.html


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-07 14:46 ` Resubmit/ping: " Bernd Schmidt
  2010-06-14 10:17   ` Bernd Schmidt
@ 2010-06-29  4:46   ` Richard Henderson
  2010-06-29  8:34     ` Bernd Schmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2010-06-29  4:46 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

> +  /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
> +     the REG_FRAME_RELATED_EXPR that is attached.  */
> +  if (RTX_FRAME_RELATED_P (insn))
> +    {
> +      /* Let the buffer drain first.  */
> +      if (peep2_current_count > 0)
> +	return false;
> +      df_simulate_one_insn_forwards (bb, insn, live);
> +      return true;

You've changed the logic so that frame-related insns are allowed to be
peepholed, so long as they begin a match.  Previously we skipped these
insns entirely, which I think is more correct.

> +	  /* If we did not fill an empty buffer, it signals the end of the
> +	     block.  */
> +	  if (peep2_current_count == 0)
>  	    break;
> +
> +	  /* The buffer filled to the current maximum, so try to match.  */

Why do you wait until you've completely filled the buffer before trying
a match?  Doesn't this lead to useless work?  Or does this on average
pay off because of less calls into peep2_recog (although more calls into
df scanning insns)?


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29  4:46   ` Resubmit/ping: " Richard Henderson
@ 2010-06-29  8:34     ` Bernd Schmidt
  2010-06-29  8:58       ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-29  8:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 06/29/2010 01:58 AM, Richard Henderson wrote:
>> +  /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
>> +     the REG_FRAME_RELATED_EXPR that is attached.  */
>> +  if (RTX_FRAME_RELATED_P (insn))
>> +    {
>> +      /* Let the buffer drain first.  */
>> +      if (peep2_current_count > 0)
>> +	return false;
>> +      df_simulate_one_insn_forwards (bb, insn, live);
>> +      return true;
> 
> You've changed the logic so that frame-related insns are allowed to be
> peepholed, so long as they begin a match.  Previously we skipped these
> insns entirely, which I think is more correct.

I'm not sure this is true.  It never adds a RTX_FRAME_RELATED_P insn to
the buffer, it just returns true so that we process the next insn?

I admit the buffer logic is a bit convoluted, and I'd welcome
suggestions how to make it clearer.  An explicit state machine?

>> +	  /* If we did not fill an empty buffer, it signals the end of the
>> +	     block.  */
>> +	  if (peep2_current_count == 0)
>>  	    break;
>> +
>> +	  /* The buffer filled to the current maximum, so try to match.  */
> 
> Why do you wait until you've completely filled the buffer before trying
> a match?  Doesn't this lead to useless work?  Or does this on average
> pay off because of less calls into peep2_recog (although more calls into
> df scanning insns)?

To ensure that we always match the longest possible peephole if there
are similar ones.  A target can order them in order of descending
length, but that won't always work if we start to match against
partially filled buffers.  It's 2am so I may be blind but I don't see
how it causes more calls into df_simulate or other useless work?  There
should be exactly one such call per insn as it is added to the buffer
(plus any updates necessary after we match something and have to
regenerate life information).  Life information is stored in the buffer
so we don't have to do anything when advancing the buffer start.

It turns out that this doesn't completely solve the problem I was hoping
to solve (ARM peepholes for ldm/stm generation - there are several for
sequences of different lengths, and I was hoping to avoid having
patterns that match previous combinations), so I guess I could drop this
part.  Conceptually, I still think it's better to do it this way.

Looking at the patch again now, I think the saved_live thing may be a
remnant from an earlier attempt to also regenerate life information
forwards after a substitution.  That probably needs to go, I'll have a
look again in the morning.

Thanks for reviewing this.


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29  8:34     ` Bernd Schmidt
@ 2010-06-29  8:58       ` Richard Henderson
  2010-06-29 14:31         ` Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2010-06-29  8:58 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 06/28/2010 05:26 PM, Bernd Schmidt wrote:
> On 06/29/2010 01:58 AM, Richard Henderson wrote:
>>> +  /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
>>> +     the REG_FRAME_RELATED_EXPR that is attached.  */
>>> +  if (RTX_FRAME_RELATED_P (insn))
>>> +    {
>>> +      /* Let the buffer drain first.  */
>>> +      if (peep2_current_count > 0)
>>> +	return false;
>>> +      df_simulate_one_insn_forwards (bb, insn, live);
>>> +      return true;
>>
>> You've changed the logic so that frame-related insns are allowed to be
>> peepholed, so long as they begin a match.  Previously we skipped these
>> insns entirely, which I think is more correct.
> 
> I'm not sure this is true.  It never adds a RTX_FRAME_RELATED_P insn to
> the buffer, it just returns true so that we process the next insn?
> 
> I admit the buffer logic is a bit convoluted, and I'd welcome
> suggestions how to make it clearer.  An explicit state machine?

Ah, I missed that we didn't add the insn to the buffer.  I think
all that would be needed is a comment here to that effect.

>> Why do you wait until you've completely filled the buffer before trying
>> a match?  Doesn't this lead to useless work?  Or does this on average
>> pay off because of less calls into peep2_recog (although more calls into
>> df scanning insns)?
> 
> To ensure that we always match the longest possible peephole if there
> are similar ones.

A worthy goal; I hadn't thought of that.  Another comment to add?

> length, but that won't always work if we start to match against
> partially filled buffers.  It's 2am so I may be blind but I don't see
> how it causes more calls into df_simulate or other useless work?  There
> should be exactly one such call per insn as it is added to the buffer
> (plus any updates necessary after we match something and have to
> regenerate life information).  Life information is stored in the buffer
> so we don't have to do anything when advancing the buffer start.

The case in which we have matches is what I had in mind.  Say the
buffer has length 10.  If we fill the buffer then match the first
insn, that invalidates the life information for the entire buffer
does it not?  In which case we've wasted effort on 9 insns.

Unless we deem the peep2 life changes to be strictly local to the
region matched, in which case we can simply re-use the life info
computed the first time around.  I haven't gone back to the patch 
to see if that's what's going on...


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29  8:58       ` Richard Henderson
@ 2010-06-29 14:31         ` Bernd Schmidt
  2010-06-29 18:28           ` Richard Henderson
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-29 14:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On 06/29/2010 04:32 AM, Richard Henderson wrote:
>> length, but that won't always work if we start to match against
>> partially filled buffers.  It's 2am so I may be blind but I don't see
>> how it causes more calls into df_simulate or other useless work?  There
>> should be exactly one such call per insn as it is added to the buffer
>> (plus any updates necessary after we match something and have to
>> regenerate life information).  Life information is stored in the buffer
>> so we don't have to do anything when advancing the buffer start.
> 
> The case in which we have matches is what I had in mind.  Say the
> buffer has length 10.  If we fill the buffer then match the first
> insn, that invalidates the life information for the entire buffer
> does it not?  In which case we've wasted effort on 9 insns.

No, those parts of the buffer that weren't part of the match remain
unaffected, we keep both the insns and their life information.  We only
rebuild life for the new insns produced by the match.

Here's a new version with a few more comments and a few remnants of old
code removed.  I've also removed some dead code found in genrecog.c (got
sidetracked today into debugging the current peephole2 code again...);
this was left in after your r34208 patch.

Bootstrapping now.


Bernd

[-- Attachment #2: peep2-forward-v6.diff --]
[-- Type: text/plain, Size: 25057 bytes --]

	* recog.c (peep2_do_rebuild_jump_labels, peep2_do_cleanup_cfg): New
	static variables.
	(peep2_buf_position): New static function.
	(peep2_regno_dead_p, peep2_reg_dead_p, peep2_find_free_register,
	peephole2_optimize): Use it.
	(peep2_attempt, peep2_update_life): New static functions, broken out
	of peephole2_optimize.
	(peep2_fill_buffer): New static function.
	(peephole2_optimize): Change the main loop to try to fill the buffer
	with the maximum number of insns before matching them against
	peepholes.  Use a forward scan.  Remove special case for targets with
	conditional execution.
	* genrecog.c (change_state): Delete dead code.
	* config/i386/i386.md (peephole2 for arithmetic ops with memory):
	Rewrite so as not to expect the second insn to have had a peephole
	applied yet.

Index: recog.c
===================================================================
*** recog.c	(revision 161371)
--- recog.c	(working copy)
***************
*** 2958,2963 ****
--- 2958,2967 ----
  
  static struct peep2_insn_data peep2_insn_data[MAX_INSNS_PER_PEEP2 + 1];
  static int peep2_current;
+ 
+ static bool peep2_do_rebuild_jump_labels;
+ static bool peep2_do_cleanup_cfg;
+ 
  /* The number of instructions available to match a peep2.  */
  int peep2_current_count;
  
***************
*** 2966,2971 ****
--- 2970,2985 ----
     DF_LIVE_OUT for the block.  */
  #define PEEP2_EOB	pc_rtx
  
+ /* Wrap N to fit into the peep2_insn_data buffer.  */
+ 
+ static int
+ peep2_buf_position (int n)
+ {
+   if (n >= MAX_INSNS_PER_PEEP2 + 1)
+     n -= MAX_INSNS_PER_PEEP2 + 1;
+   return n;
+ }
+ 
  /* Return the Nth non-note insn after `current', or return NULL_RTX if it
     does not exist.  Used by the recognizer to find the next insn to match
     in a multi-insn pattern.  */
***************
*** 2975,2983 ****
  {
    gcc_assert (n <= peep2_current_count);
  
!   n += peep2_current;
!   if (n >= MAX_INSNS_PER_PEEP2 + 1)
!     n -= MAX_INSNS_PER_PEEP2 + 1;
  
    return peep2_insn_data[n].insn;
  }
--- 2989,2995 ----
  {
    gcc_assert (n <= peep2_current_count);
  
!   n = peep2_buf_position (peep2_current + n);
  
    return peep2_insn_data[n].insn;
  }
***************
*** 2990,2998 ****
  {
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs += peep2_current;
!   if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
!     ofs -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
--- 3002,3008 ----
  {
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs = peep2_buf_position (peep2_current + ofs);
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
***************
*** 3008,3016 ****
  
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs += peep2_current;
!   if (ofs >= MAX_INSNS_PER_PEEP2 + 1)
!     ofs -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
--- 3018,3024 ----
  
    gcc_assert (ofs < MAX_INSNS_PER_PEEP2 + 1);
  
!   ofs = peep2_buf_position (peep2_current + ofs);
  
    gcc_assert (peep2_insn_data[ofs].insn != NULL_RTX);
  
***************
*** 3045,3056 ****
    gcc_assert (from < MAX_INSNS_PER_PEEP2 + 1);
    gcc_assert (to < MAX_INSNS_PER_PEEP2 + 1);
  
!   from += peep2_current;
!   if (from >= MAX_INSNS_PER_PEEP2 + 1)
!     from -= MAX_INSNS_PER_PEEP2 + 1;
!   to += peep2_current;
!   if (to >= MAX_INSNS_PER_PEEP2 + 1)
!     to -= MAX_INSNS_PER_PEEP2 + 1;
  
    gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
    REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);
--- 3053,3060 ----
    gcc_assert (from < MAX_INSNS_PER_PEEP2 + 1);
    gcc_assert (to < MAX_INSNS_PER_PEEP2 + 1);
  
!   from = peep2_buf_position (peep2_current + from);
!   to = peep2_buf_position (peep2_current + to);
  
    gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
    REG_SET_TO_HARD_REG_SET (live, peep2_insn_data[from].live_before);
***************
*** 3059,3066 ****
      {
        HARD_REG_SET this_live;
  
!       if (++from >= MAX_INSNS_PER_PEEP2 + 1)
! 	from = 0;
        gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
        REG_SET_TO_HARD_REG_SET (this_live, peep2_insn_data[from].live_before);
        IOR_HARD_REG_SET (live, this_live);
--- 3063,3069 ----
      {
        HARD_REG_SET this_live;
  
!       from = peep2_buf_position (from + 1);
        gcc_assert (peep2_insn_data[from].insn != NULL_RTX);
        REG_SET_TO_HARD_REG_SET (this_live, peep2_insn_data[from].live_before);
        IOR_HARD_REG_SET (live, this_live);
***************
*** 3153,3388 ****
    COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
  }
  
! /* Perform the peephole2 optimization pass.  */
  
! static void
! peephole2_optimize (void)
  {
-   rtx insn, prev;
-   bitmap live;
    int i;
!   basic_block bb;
!   bool do_cleanup_cfg = false;
!   bool do_rebuild_jump_labels = false;
  
!   df_set_flags (DF_LR_RUN_DCE);
!   df_analyze ();
  
!   /* Initialize the regsets we're going to use.  */
!   for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
!     peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
!   live = BITMAP_ALLOC (&reg_obstack);
  
!   FOR_EACH_BB_REVERSE (bb)
!     {
!       rtl_profile_for_bb (bb);
  
!       /* Start up propagation.  */
!       bitmap_copy (live, DF_LR_OUT (bb));
!       df_simulate_initialize_backwards (bb, live);
!       peep2_reinit_state (live);
  
!       for (insn = BB_END (bb); ; insn = prev)
  	{
! 	  prev = PREV_INSN (insn);
! 	  if (NONDEBUG_INSN_P (insn))
  	    {
! 	      rtx attempt, before_try, x;
! 	      int match_len;
! 	      rtx note;
! 	      bool was_call = false;
  
! 	      /* Record this insn.  */
! 	      if (--peep2_current < 0)
! 		peep2_current = MAX_INSNS_PER_PEEP2;
! 	      if (peep2_current_count < MAX_INSNS_PER_PEEP2
! 		  && peep2_insn_data[peep2_current].insn == NULL_RTX)
! 		peep2_current_count++;
! 	      peep2_insn_data[peep2_current].insn = insn;
! 	      df_simulate_one_insn_backwards (bb, insn, live);
! 	      COPY_REG_SET (peep2_insn_data[peep2_current].live_before, live);
  
! 	      if (RTX_FRAME_RELATED_P (insn))
! 		{
! 		  /* If an insn has RTX_FRAME_RELATED_P set, peephole
! 		     substitution would lose the
! 		     REG_FRAME_RELATED_EXPR that is attached.  */
! 		  peep2_reinit_state (live);
! 		  attempt = NULL;
! 		}
! 	      else
! 		/* Match the peephole.  */
! 		attempt = peephole2_insns (PATTERN (insn), insn, &match_len);
  
! 	      if (attempt != NULL)
! 		{
! 		  /* If we are splitting a CALL_INSN, look for the CALL_INSN
! 		     in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
! 		     cfg-related call notes.  */
! 		  for (i = 0; i <= match_len; ++i)
! 		    {
! 		      int j;
! 		      rtx old_insn, new_insn, note;
  
! 		      j = i + peep2_current;
! 		      if (j >= MAX_INSNS_PER_PEEP2 + 1)
! 			j -= MAX_INSNS_PER_PEEP2 + 1;
! 		      old_insn = peep2_insn_data[j].insn;
! 		      if (!CALL_P (old_insn))
! 			continue;
! 		      was_call = true;
  
! 		      new_insn = attempt;
! 		      while (new_insn != NULL_RTX)
! 			{
! 			  if (CALL_P (new_insn))
! 			    break;
! 			  new_insn = NEXT_INSN (new_insn);
! 			}
  
! 		      gcc_assert (new_insn != NULL_RTX);
  
! 		      CALL_INSN_FUNCTION_USAGE (new_insn)
! 			= CALL_INSN_FUNCTION_USAGE (old_insn);
  
! 		      for (note = REG_NOTES (old_insn);
! 			   note;
! 			   note = XEXP (note, 1))
! 			switch (REG_NOTE_KIND (note))
! 			  {
! 			  case REG_NORETURN:
! 			  case REG_SETJMP:
! 			    add_reg_note (new_insn, REG_NOTE_KIND (note),
! 					  XEXP (note, 0));
! 			    break;
! 			  default:
! 			    /* Discard all other reg notes.  */
! 			    break;
! 			  }
  
! 		      /* Croak if there is another call in the sequence.  */
! 		      while (++i <= match_len)
! 			{
! 			  j = i + peep2_current;
! 			  if (j >= MAX_INSNS_PER_PEEP2 + 1)
! 			    j -= MAX_INSNS_PER_PEEP2 + 1;
! 			  old_insn = peep2_insn_data[j].insn;
! 			  gcc_assert (!CALL_P (old_insn));
! 			}
! 		      break;
! 		    }
  
! 		  i = match_len + peep2_current;
! 		  if (i >= MAX_INSNS_PER_PEEP2 + 1)
! 		    i -= MAX_INSNS_PER_PEEP2 + 1;
  
! 		  note = find_reg_note (peep2_insn_data[i].insn,
! 					REG_EH_REGION, NULL_RTX);
  
! 		  /* Replace the old sequence with the new.  */
! 		  attempt = emit_insn_after_setloc (attempt,
! 						    peep2_insn_data[i].insn,
! 				       INSN_LOCATOR (peep2_insn_data[i].insn));
! 		  before_try = PREV_INSN (insn);
! 		  delete_insn_chain (insn, peep2_insn_data[i].insn, false);
  
! 		  /* Re-insert the EH_REGION notes.  */
! 		  if (note || (was_call && nonlocal_goto_handler_labels))
! 		    {
! 		      edge eh_edge;
! 		      edge_iterator ei;
  
! 		      FOR_EACH_EDGE (eh_edge, ei, bb->succs)
! 			if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
! 			  break;
  
! 		      if (note)
! 			copy_reg_eh_region_note_backward (note, attempt,
! 							  before_try);
  
! 		      if (eh_edge)
! 			for (x = attempt ; x != before_try ; x = PREV_INSN (x))
! 			  if (x != BB_END (bb)
! 			      && (can_throw_internal (x)
! 				  || can_nonlocal_goto (x)))
! 			    {
! 			      edge nfte, nehe;
! 			      int flags;
  
! 			      nfte = split_block (bb, x);
! 			      flags = (eh_edge->flags
! 				       & (EDGE_EH | EDGE_ABNORMAL));
! 			      if (CALL_P (x))
! 				flags |= EDGE_ABNORMAL_CALL;
! 			      nehe = make_edge (nfte->src, eh_edge->dest,
! 						flags);
  
! 			      nehe->probability = eh_edge->probability;
! 			      nfte->probability
! 				= REG_BR_PROB_BASE - nehe->probability;
  
! 			      do_cleanup_cfg |= purge_dead_edges (nfte->dest);
! 			      bb = nfte->src;
! 			      eh_edge = nehe;
! 			    }
  
! 		      /* Converting possibly trapping insn to non-trapping is
! 			 possible.  Zap dummy outgoing edges.  */
! 		      do_cleanup_cfg |= purge_dead_edges (bb);
! 		    }
  
! 		  if (targetm.have_conditional_execution ())
! 		    {
! 		      for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
! 			peep2_insn_data[i].insn = NULL_RTX;
! 		      peep2_insn_data[peep2_current].insn = PEEP2_EOB;
! 		      peep2_current_count = 0;
! 		    }
! 		  else
! 		    {
! 		      /* Back up lifetime information past the end of the
! 			 newly created sequence.  */
! 		      if (++i >= MAX_INSNS_PER_PEEP2 + 1)
! 			i = 0;
! 		      bitmap_copy (live, peep2_insn_data[i].live_before);
  
! 		      /* Update life information for the new sequence.  */
! 		      x = attempt;
! 		      do
! 			{
! 			  if (INSN_P (x))
! 			    {
! 			      if (--i < 0)
! 				i = MAX_INSNS_PER_PEEP2;
! 			      if (peep2_current_count < MAX_INSNS_PER_PEEP2
! 				  && peep2_insn_data[i].insn == NULL_RTX)
! 				peep2_current_count++;
! 			      peep2_insn_data[i].insn = x;
! 			      df_insn_rescan (x);
! 			      df_simulate_one_insn_backwards (bb, x, live);
! 			      bitmap_copy (peep2_insn_data[i].live_before,
! 					   live);
! 			    }
! 			  x = PREV_INSN (x);
! 			}
! 		      while (x != prev);
  
! 		      peep2_current = i;
! 		    }
  
! 		  /* If we generated a jump instruction, it won't have
! 		     JUMP_LABEL set.  Recompute after we're done.  */
! 		  for (x = attempt; x != before_try; x = PREV_INSN (x))
! 		    if (JUMP_P (x))
! 		      {
! 		        do_rebuild_jump_labels = true;
! 			break;
! 		      }
! 		}
  	    }
  
! 	  if (insn == BB_HEAD (bb))
  	    break;
  	}
      }
  
--- 3156,3451 ----
    COPY_REG_SET (peep2_insn_data[MAX_INSNS_PER_PEEP2].live_before, live);
  }
  
! /* While scanning basic block BB, we found a match of length MATCH_LEN,
!    starting at INSN.  Perform the replacement, removing the old insns and
!    replacing them with ATTEMPT.  Returns the last insn emitted.  */
  
! static rtx
! peep2_attempt (basic_block bb, rtx insn, int match_len, rtx attempt)
  {
    int i;
!   rtx last, note, before_try, x;
!   bool was_call = false;
  
!   /* If we are splitting a CALL_INSN, look for the CALL_INSN
!      in SEQ and copy our CALL_INSN_FUNCTION_USAGE and other
!      cfg-related call notes.  */
!   for (i = 0; i <= match_len; ++i)
!     {
!       int j;
!       rtx old_insn, new_insn, note;
  
!       j = peep2_buf_position (peep2_current + i);
!       old_insn = peep2_insn_data[j].insn;
!       if (!CALL_P (old_insn))
! 	continue;
!       was_call = true;
  
!       new_insn = attempt;
!       while (new_insn != NULL_RTX)
! 	{
! 	  if (CALL_P (new_insn))
! 	    break;
! 	  new_insn = NEXT_INSN (new_insn);
! 	}
  
!       gcc_assert (new_insn != NULL_RTX);
  
!       CALL_INSN_FUNCTION_USAGE (new_insn)
! 	= CALL_INSN_FUNCTION_USAGE (old_insn);
! 
!       for (note = REG_NOTES (old_insn);
! 	   note;
! 	   note = XEXP (note, 1))
! 	switch (REG_NOTE_KIND (note))
! 	  {
! 	  case REG_NORETURN:
! 	  case REG_SETJMP:
! 	    add_reg_note (new_insn, REG_NOTE_KIND (note),
! 			  XEXP (note, 0));
! 	    break;
! 	  default:
! 	    /* Discard all other reg notes.  */
! 	    break;
! 	  }
! 
!       /* Croak if there is another call in the sequence.  */
!       while (++i <= match_len)
  	{
! 	  j = peep2_buf_position (peep2_current + i);
! 	  old_insn = peep2_insn_data[j].insn;
! 	  gcc_assert (!CALL_P (old_insn));
! 	}
!       break;
!     }
! 
!   i = peep2_buf_position (peep2_current + match_len);
! 
!   note = find_reg_note (peep2_insn_data[i].insn, REG_EH_REGION, NULL_RTX);
! 
!   /* Replace the old sequence with the new.  */
!   last = emit_insn_after_setloc (attempt,
! 				 peep2_insn_data[i].insn,
! 				 INSN_LOCATOR (peep2_insn_data[i].insn));
!   before_try = PREV_INSN (insn);
!   delete_insn_chain (insn, peep2_insn_data[i].insn, false);
! 
!   /* Re-insert the EH_REGION notes.  */
!   if (note || (was_call && nonlocal_goto_handler_labels))
!     {
!       edge eh_edge;
!       edge_iterator ei;
! 
!       FOR_EACH_EDGE (eh_edge, ei, bb->succs)
! 	if (eh_edge->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
! 	  break;
! 
!       if (note)
! 	copy_reg_eh_region_note_backward (note, last, before_try);
! 
!       if (eh_edge)
! 	for (x = last; x != before_try; x = PREV_INSN (x))
! 	  if (x != BB_END (bb)
! 	      && (can_throw_internal (x)
! 		  || can_nonlocal_goto (x)))
  	    {
! 	      edge nfte, nehe;
! 	      int flags;
  
! 	      nfte = split_block (bb, x);
! 	      flags = (eh_edge->flags
! 		       & (EDGE_EH | EDGE_ABNORMAL));
! 	      if (CALL_P (x))
! 		flags |= EDGE_ABNORMAL_CALL;
! 	      nehe = make_edge (nfte->src, eh_edge->dest,
! 				flags);
  
! 	      nehe->probability = eh_edge->probability;
! 	      nfte->probability
! 		= REG_BR_PROB_BASE - nehe->probability;
  
! 	      peep2_do_cleanup_cfg |= purge_dead_edges (nfte->dest);
! 	      bb = nfte->src;
! 	      eh_edge = nehe;
! 	    }
  
!       /* Converting possibly trapping insn to non-trapping is
! 	 possible.  Zap dummy outgoing edges.  */
!       peep2_do_cleanup_cfg |= purge_dead_edges (bb);
!     }
  
!   /* If we generated a jump instruction, it won't have
!      JUMP_LABEL set.  Recompute after we're done.  */
!   for (x = last; x != before_try; x = PREV_INSN (x))
!     if (JUMP_P (x))
!       {
! 	peep2_do_rebuild_jump_labels = true;
! 	break;
!       }
  
!   return last;
! }
  
! /* After performing a replacement in basic block BB, fix up the life
!    information in our buffer.  LAST is the last of the insns that we
!    emitted as a replacement.  PREV is the insn before the start of
!    the replacement.  MATCH_LEN is the number of instructions that were
!    matched, and which now need to be replaced in the buffer.  */
  
! static void
! peep2_update_life (basic_block bb, int match_len, rtx last, rtx prev)
! {
!   int i = peep2_buf_position (peep2_current + match_len + 1);
!   rtx x;
!   regset_head live;
  
!   INIT_REG_SET (&live);
!   COPY_REG_SET (&live, peep2_insn_data[i].live_before);
  
!   gcc_assert (peep2_current_count >= match_len + 1);
!   peep2_current_count -= match_len + 1;
  
!   x = last;
!   do
!     {
!       if (INSN_P (x))
! 	{
! 	  df_insn_rescan (x);
! 	  if (peep2_current_count < MAX_INSNS_PER_PEEP2)
! 	    {
! 	      peep2_current_count++;
! 	      if (--i < 0)
! 		i = MAX_INSNS_PER_PEEP2;
! 	      peep2_insn_data[i].insn = x;
! 	      df_simulate_one_insn_backwards (bb, x, &live);
! 	      COPY_REG_SET (peep2_insn_data[i].live_before, &live);
! 	    }
! 	}
!       x = PREV_INSN (x);
!     }
!   while (x != prev);
!   CLEAR_REG_SET (&live);
  
!   peep2_current = i;
! }
  
! /* Add INSN, which is in BB, at the end of the peep2 insn buffer if possible.
!    Return true if we added it, false otherwise.  The caller will try to match
!    peepholes against the buffer if we return false; otherwise it will try to
!    add more instructions to the buffer.  */
  
! static bool
! peep2_fill_buffer (basic_block bb, rtx insn, regset live)
! {
!   int pos;
  
!   /* Once we have filled the maximum number of insns the buffer can hold,
!      allow the caller to match the insns against peepholes.  We wait until
!      the buffer is full in case the target has similar peepholes of different
!      length; we always want to match the longest if possible.  */
!   if (peep2_current_count == MAX_INSNS_PER_PEEP2)
!     return false;
  
!   /* If an insn has RTX_FRAME_RELATED_P set, peephole substitution would lose
!      the REG_FRAME_RELATED_EXPR that is attached.  */
!   if (RTX_FRAME_RELATED_P (insn))
!     {
!       /* Let the buffer drain first.  */
!       if (peep2_current_count > 0)
! 	return false;
!       /* Step over the insn then return true without adding the insn
! 	 to the buffer; this will cause us to process the next
! 	 insn.  */
!       df_simulate_one_insn_forwards (bb, insn, live);
!       return true;
!     }
  
!   pos = peep2_buf_position (peep2_current + peep2_current_count);
!   peep2_insn_data[pos].insn = insn;
!   COPY_REG_SET (peep2_insn_data[pos].live_before, live);
!   peep2_current_count++;
  
!   df_simulate_one_insn_forwards (bb, insn, live);
!   return true;
! }
  
! /* Perform the peephole2 optimization pass.  */
  
! static void
! peephole2_optimize (void)
! {
!   rtx insn;
!   bitmap live;
!   int i;
!   basic_block bb;
  
!   peep2_do_cleanup_cfg = false;
!   peep2_do_rebuild_jump_labels = false;
  
!   df_set_flags (DF_LR_RUN_DCE);
!   df_note_add_problem ();
!   df_analyze ();
  
!   /* Initialize the regsets we're going to use.  */
!   for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
!     peep2_insn_data[i].live_before = BITMAP_ALLOC (&reg_obstack);
!   live = BITMAP_ALLOC (&reg_obstack);
  
!   FOR_EACH_BB_REVERSE (bb)
!     {
!       bool past_end = false;
!       int pos;
! 
!       rtl_profile_for_bb (bb);
! 
!       /* Start up propagation.  */
!       bitmap_copy (live, DF_LR_IN (bb));
!       df_simulate_initialize_forwards (bb, live);
!       peep2_reinit_state (live);
! 
!       insn = BB_HEAD (bb);
!       for (;;)
! 	{
! 	  rtx attempt, head;
! 	  int match_len;
! 
! 	  if (!past_end && !NONDEBUG_INSN_P (insn))
! 	    {
! 	    next_insn:
! 	      insn = NEXT_INSN (insn);
! 	      if (insn == NEXT_INSN (BB_END (bb)))
! 		past_end = true;
! 	      continue;
  	    }
+ 	  if (!past_end && peep2_fill_buffer (bb, insn, live))
+ 	    goto next_insn;
  
! 	  /* If we did not fill an empty buffer, it signals the end of the
! 	     block.  */
! 	  if (peep2_current_count == 0)
  	    break;
+ 
+ 	  /* The buffer filled to the current maximum, so try to match.  */
+ 
+ 	  pos = peep2_buf_position (peep2_current + peep2_current_count);
+ 	  peep2_insn_data[pos].insn = PEEP2_EOB;
+ 	  COPY_REG_SET (peep2_insn_data[pos].live_before, live);
+ 
+ 	  /* Match the peephole.  */
+ 	  head = peep2_insn_data[peep2_current].insn;
+ 	  attempt = peephole2_insns (PATTERN (head), head, &match_len);
+ 	  if (attempt != NULL)
+ 	    {
+ 	      rtx last;
+ 	      last = peep2_attempt (bb, head, match_len, attempt);
+ 	      peep2_update_life (bb, match_len, last, PREV_INSN (attempt));
+ 	    }
+ 	  else
+ 	    {
+ 	      /* If no match, advance the buffer by one insn.  */
+ 	      peep2_current = peep2_buf_position (peep2_current + 1);
+ 	      peep2_current_count--;
+ 	    }
  	}
      }
  
***************
*** 3390,3396 ****
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      BITMAP_FREE (peep2_insn_data[i].live_before);
    BITMAP_FREE (live);
!   if (do_rebuild_jump_labels)
      rebuild_jump_labels (get_insns ());
  }
  #endif /* HAVE_peephole2 */
--- 3453,3459 ----
    for (i = 0; i < MAX_INSNS_PER_PEEP2 + 1; ++i)
      BITMAP_FREE (peep2_insn_data[i].live_before);
    BITMAP_FREE (live);
!   if (peep2_do_rebuild_jump_labels)
      rebuild_jump_labels (get_insns ());
  }
  #endif /* HAVE_peephole2 */
Index: config/i386/i386.md
===================================================================
*** config/i386/i386.md	(revision 161371)
--- config/i386/i386.md	(working copy)
***************
*** 17558,17572 ****
  ;;  leal    (%edx,%eax,4), %eax
  
  (define_peephole2
!   [(parallel [(set (match_operand 0 "register_operand" "")
  		   (ashift (match_operand 1 "register_operand" "")
  			   (match_operand 2 "const_int_operand" "")))
  	       (clobber (reg:CC FLAGS_REG))])
!    (set (match_operand 3 "register_operand")
!         (match_operand 4 "x86_64_general_operand" ""))
!    (parallel [(set (match_operand 5 "register_operand" "")
! 		   (plus (match_operand 6 "register_operand" "")
! 			 (match_operand 7 "register_operand" "")))
  		   (clobber (reg:CC FLAGS_REG))])]
    "INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) <= 3
     /* Validate MODE for lea.  */
--- 17558,17571 ----
  ;;  leal    (%edx,%eax,4), %eax
  
  (define_peephole2
!   [(match_scratch:SI 5 "r")
!    (parallel [(set (match_operand 0 "register_operand" "")
  		   (ashift (match_operand 1 "register_operand" "")
  			   (match_operand 2 "const_int_operand" "")))
  	       (clobber (reg:CC FLAGS_REG))])
!    (parallel [(set (match_operand 3 "register_operand" "")
! 		   (plus (match_dup 0)
! 			 (match_operand 4 "x86_64_general_operand" "")))
  		   (clobber (reg:CC FLAGS_REG))])]
    "INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) <= 3
     /* Validate MODE for lea.  */
***************
*** 17576,17605 ****
         || GET_MODE (operands[0]) == SImode
         || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
     /* We reorder load and the shift.  */
!    && !rtx_equal_p (operands[1], operands[3])
!    && !reg_overlap_mentioned_p (operands[0], operands[4])
!    /* Last PLUS must consist of operand 0 and 3.  */
!    && !rtx_equal_p (operands[0], operands[3])
!    && (rtx_equal_p (operands[3], operands[6])
!        || rtx_equal_p (operands[3], operands[7]))
!    && (rtx_equal_p (operands[0], operands[6])
!        || rtx_equal_p (operands[0], operands[7]))
!    /* The intermediate operand 0 must die or be same as output.  */
!    && (rtx_equal_p (operands[0], operands[5])
!        || peep2_reg_dead_p (3, operands[0]))"
!   [(set (match_dup 3) (match_dup 4))
     (set (match_dup 0) (match_dup 1))]
  {
!   enum machine_mode mode = GET_MODE (operands[5]) == DImode ? DImode : SImode;
    int scale = 1 << INTVAL (operands[2]);
    rtx index = gen_lowpart (Pmode, operands[1]);
!   rtx base = gen_lowpart (Pmode, operands[3]);
!   rtx dest = gen_lowpart (mode, operands[5]);
  
    operands[1] = gen_rtx_PLUS (Pmode, base,
    			      gen_rtx_MULT (Pmode, index, GEN_INT (scale)));
    if (mode != Pmode)
      operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
    operands[0] = dest;
  })
  \f
--- 17575,17595 ----
         || GET_MODE (operands[0]) == SImode
         || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
     /* We reorder load and the shift.  */
!    && !reg_overlap_mentioned_p (operands[0], operands[4])"
!   [(set (match_dup 5) (match_dup 4))
     (set (match_dup 0) (match_dup 1))]
  {
!   enum machine_mode mode = GET_MODE (operands[1]) == DImode ? DImode : SImode;
    int scale = 1 << INTVAL (operands[2]);
    rtx index = gen_lowpart (Pmode, operands[1]);
!   rtx base = gen_lowpart (Pmode, operands[5]);
!   rtx dest = gen_lowpart (mode, operands[3]);
  
    operands[1] = gen_rtx_PLUS (Pmode, base,
    			      gen_rtx_MULT (Pmode, index, GEN_INT (scale)));
    if (mode != Pmode)
      operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
+   operands[5] = base;
    operands[0] = dest;
  })
  \f
Index: genrecog.c
===================================================================
*** genrecog.c	(revision 161371)
--- genrecog.c	(working copy)
***************
*** 1761,1780 ****
    int odepth = strlen (oldpos);
    int ndepth = strlen (newpos);
    int depth;
-   int old_has_insn, new_has_insn;
  
    /* Pop up as many levels as necessary.  */
    for (depth = odepth; strncmp (oldpos, newpos, depth) != 0; --depth)
      continue;
  
-   /* Hunt for the last [A-Z] in both strings.  */
-   for (old_has_insn = odepth - 1; old_has_insn >= 0; --old_has_insn)
-     if (ISUPPER (oldpos[old_has_insn]))
-       break;
-   for (new_has_insn = ndepth - 1; new_has_insn >= 0; --new_has_insn)
-     if (ISUPPER (newpos[new_has_insn]))
-       break;
- 
    /* Go down to desired level.  */
    while (depth < ndepth)
      {
--- 1761,1771 ----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29 14:31         ` Bernd Schmidt
@ 2010-06-29 18:28           ` Richard Henderson
  2010-06-30  2:01           ` Andrew Pinski
  2010-06-30  7:46           ` H.J. Lu
  2 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2010-06-29 18:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 06/29/2010 06:22 AM, Bernd Schmidt wrote:
> On 06/29/2010 04:32 AM, Richard Henderson wrote:
>>> length, but that won't always work if we start to match against
>>> partially filled buffers.  It's 2am so I may be blind but I don't see
>>> how it causes more calls into df_simulate or other useless work?  There
>>> should be exactly one such call per insn as it is added to the buffer
>>> (plus any updates necessary after we match something and have to
>>> regenerate life information).  Life information is stored in the buffer
>>> so we don't have to do anything when advancing the buffer start.
>>
>> The case in which we have matches is what I had in mind.  Say the
>> buffer has length 10.  If we fill the buffer then match the first
>> insn, that invalidates the life information for the entire buffer
>> does it not?  In which case we've wasted effort on 9 insns.
> 
> No, those parts of the buffer that weren't part of the match remain
> unaffected, we keep both the insns and their life information.  We only
> rebuild life for the new insns produced by the match.
> 
> Here's a new version with a few more comments and a few remnants of old
> code removed.  I've also removed some dead code found in genrecog.c (got
> sidetracked today into debugging the current peephole2 code again...);
> this was left in after your r34208 patch.
> 
> Bootstrapping now.

Thanks for the extra comments.
Ok if that boot succeeds.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29 14:31         ` Bernd Schmidt
  2010-06-29 18:28           ` Richard Henderson
@ 2010-06-30  2:01           ` Andrew Pinski
  2010-06-30  2:03             ` Andrew Pinski
  2010-06-30  6:10             ` H.J. Lu
  2010-06-30  7:46           ` H.J. Lu
  2 siblings, 2 replies; 29+ messages in thread
From: Andrew Pinski @ 2010-06-30  2:01 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> No, those parts of the buffer that weren't part of the match remain
> unaffected, we keep both the insns and their life information.  We only
> rebuild life for the new insns produced by the match.
>
> Here's a new version with a few more comments and a few remnants of old
> code removed.  I've also removed some dead code found in genrecog.c (got
> sidetracked today into debugging the current peephole2 code again...);
> this was left in after your r34208 patch.

I think this causes a bootstrap failure on x86_64-linux-gnu:
/home/apinski/src/gcc-fsf/local/gcc/objdir/./prev-gcc/xgcc
-B/home/apinski/src/gcc-fsf/local/gcc/objdir/./prev-gcc/
-B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/bin/
-B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/bin/
-B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/lib/ -isystem
/home/apinski/local-gcc/x86_64-unknown-linux-gnu/include -isystem
/home/apinski/local-gcc/x86_64-unknown-linux-gnu/sys-include    -c
-g -O2 -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual
-Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute
-pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
-Werror -Wold-style-definition -Wc++-compat -fno-common
-DHAVE_CONFIG_H -I. -I. -I/home/apinski/src/gcc-fsf/local//gcc/gcc
-I/home/apinski/src/gcc-fsf/local//gcc/gcc/.
-I/home/apinski/src/gcc-fsf/local//gcc/gcc/../include
-I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libcpp/include
-I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libdecnumber
-I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libdecnumber/bid
-I../libdecnumber  -DCLOOG_PPL_BACKEND
/home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c -o coverage.o
/home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c: In function
‘htab_counts_entry_hash’:
/home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
unrecognizable insn:
(insn 25 7 26 2
/home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
dx)
        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
/home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
compiler error: in extract_insn, at recog.c:2127
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  2:01           ` Andrew Pinski
@ 2010-06-30  2:03             ` Andrew Pinski
  2010-06-30  6:24               ` H.J. Lu
  2010-06-30  6:10             ` H.J. Lu
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Pinski @ 2010-06-30  2:03 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 5:41 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> No, those parts of the buffer that weren't part of the match remain
>> unaffected, we keep both the insns and their life information.  We only
>> rebuild life for the new insns produced by the match.
>>
>> Here's a new version with a few more comments and a few remnants of old
>> code removed.  I've also removed some dead code found in genrecog.c (got
>> sidetracked today into debugging the current peephole2 code again...);
>> this was left in after your r34208 patch.
>
> I think this causes a bootstrap failure on x86_64-linux-gnu:
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
> unrecognizable insn:
> (insn 25 7 26 2
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
> dx)
>        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
>                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
> compiler error: in extract_insn, at recog.c:2127

+  [(match_scratch:SI 5 "r")

I think the :SI part is incorrect, we need a DI mode on x86_64 rather
than a SImode.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  2:01           ` Andrew Pinski
  2010-06-30  2:03             ` Andrew Pinski
@ 2010-06-30  6:10             ` H.J. Lu
  1 sibling, 0 replies; 29+ messages in thread
From: H.J. Lu @ 2010-06-30  6:10 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Bernd Schmidt, Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 5:41 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> No, those parts of the buffer that weren't part of the match remain
>> unaffected, we keep both the insns and their life information.  We only
>> rebuild life for the new insns produced by the match.
>>
>> Here's a new version with a few more comments and a few remnants of old
>> code removed.  I've also removed some dead code found in genrecog.c (got
>> sidetracked today into debugging the current peephole2 code again...);
>> this was left in after your r34208 patch.
>
> I think this causes a bootstrap failure on x86_64-linux-gnu:
> /home/apinski/src/gcc-fsf/local/gcc/objdir/./prev-gcc/xgcc
> -B/home/apinski/src/gcc-fsf/local/gcc/objdir/./prev-gcc/
> -B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/bin/
> -B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/bin/
> -B/home/apinski/local-gcc/x86_64-unknown-linux-gnu/lib/ -isystem
> /home/apinski/local-gcc/x86_64-unknown-linux-gnu/include -isystem
> /home/apinski/local-gcc/x86_64-unknown-linux-gnu/sys-include    -c
> -g -O2 -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual
> -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute
> -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings
> -Werror -Wold-style-definition -Wc++-compat -fno-common
> -DHAVE_CONFIG_H -I. -I. -I/home/apinski/src/gcc-fsf/local//gcc/gcc
> -I/home/apinski/src/gcc-fsf/local//gcc/gcc/.
> -I/home/apinski/src/gcc-fsf/local//gcc/gcc/../include
> -I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libcpp/include
> -I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libdecnumber
> -I/home/apinski/src/gcc-fsf/local//gcc/gcc/../libdecnumber/bid
> -I../libdecnumber  -DCLOOG_PPL_BACKEND
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c -o coverage.o
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c: In function
> ‘htab_counts_entry_hash’:
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
> unrecognizable insn:
> (insn 25 7 26 2
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
> dx)
>        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
>                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
> compiler error: in extract_insn, at recog.c:2127
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.

I opened a bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44721


-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  2:03             ` Andrew Pinski
@ 2010-06-30  6:24               ` H.J. Lu
  2010-06-30  7:22                 ` H.J. Lu
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2010-06-30  6:24 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Bernd Schmidt, Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 5:43 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 5:41 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>> No, those parts of the buffer that weren't part of the match remain
>>> unaffected, we keep both the insns and their life information.  We only
>>> rebuild life for the new insns produced by the match.
>>>
>>> Here's a new version with a few more comments and a few remnants of old
>>> code removed.  I've also removed some dead code found in genrecog.c (got
>>> sidetracked today into debugging the current peephole2 code again...);
>>> this was left in after your r34208 patch.
>>
>> I think this causes a bootstrap failure on x86_64-linux-gnu:
>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
>> unrecognizable insn:
>> (insn 25 7 26 2
>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
>> dx)
>>        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
>>                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
>> compiler error: in extract_insn, at recog.c:2127
>
> +  [(match_scratch:SI 5 "r")
>
> I think the :SI part is incorrect, we need a DI mode on x86_64 rather
> than a SImode.
>

Like this?

-- 
H.J.
---
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 161586)
+++ gcc/config/i386/i386.md	(working copy)
@@ -17558,7 +17558,7 @@
 ;;  leal    (%edx,%eax,4), %eax

 (define_peephole2
-  [(match_scratch:SI 5 "r")
+  [(match_scratch:P 5 "r")
    (parallel [(set (match_operand 0 "register_operand" "")
 		   (ashift (match_operand 1 "register_operand" "")
 			   (match_operand 2 "const_int_operand" "")))

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  6:24               ` H.J. Lu
@ 2010-06-30  7:22                 ` H.J. Lu
  2010-06-30  8:49                   ` H.J. Lu
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2010-06-30  7:22 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Bernd Schmidt, Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 7:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 5:43 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Tue, Jun 29, 2010 at 5:41 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>>> No, those parts of the buffer that weren't part of the match remain
>>>> unaffected, we keep both the insns and their life information.  We only
>>>> rebuild life for the new insns produced by the match.
>>>>
>>>> Here's a new version with a few more comments and a few remnants of old
>>>> code removed.  I've also removed some dead code found in genrecog.c (got
>>>> sidetracked today into debugging the current peephole2 code again...);
>>>> this was left in after your r34208 patch.
>>>
>>> I think this causes a bootstrap failure on x86_64-linux-gnu:
>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
>>> unrecognizable insn:
>>> (insn 25 7 26 2
>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
>>> dx)
>>>        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
>>>                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
>>> compiler error: in extract_insn, at recog.c:2127
>>
>> +  [(match_scratch:SI 5 "r")
>>
>> I think the :SI part is incorrect, we need a DI mode on x86_64 rather
>> than a SImode.
>>
>
> Like this?
>
> --
> H.J.
> ---
> Index: gcc/config/i386/i386.md
> ===================================================================
> --- gcc/config/i386/i386.md     (revision 161586)
> +++ gcc/config/i386/i386.md     (working copy)
> @@ -17558,7 +17558,7 @@
>  ;;  leal    (%edx,%eax,4), %eax
>
>  (define_peephole2
> -  [(match_scratch:SI 5 "r")
> +  [(match_scratch:P 5 "r")
>    (parallel [(set (match_operand 0 "register_operand" "")
>                   (ashift (match_operand 1 "register_operand" "")
>                           (match_operand 2 "const_int_operand" "")))
>

It doesn't work.

-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-29 14:31         ` Bernd Schmidt
  2010-06-29 18:28           ` Richard Henderson
  2010-06-30  2:01           ` Andrew Pinski
@ 2010-06-30  7:46           ` H.J. Lu
  2 siblings, 0 replies; 29+ messages in thread
From: H.J. Lu @ 2010-06-30  7:46 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/29/2010 04:32 AM, Richard Henderson wrote:
>>> length, but that won't always work if we start to match against
>>> partially filled buffers.  It's 2am so I may be blind but I don't see
>>> how it causes more calls into df_simulate or other useless work?  There
>>> should be exactly one such call per insn as it is added to the buffer
>>> (plus any updates necessary after we match something and have to
>>> regenerate life information).  Life information is stored in the buffer
>>> so we don't have to do anything when advancing the buffer start.
>>
>> The case in which we have matches is what I had in mind.  Say the
>> buffer has length 10.  If we fill the buffer then match the first
>> insn, that invalidates the life information for the entire buffer
>> does it not?  In which case we've wasted effort on 9 insns.
>
> No, those parts of the buffer that weren't part of the match remain
> unaffected, we keep both the insns and their life information.  We only
> rebuild life for the new insns produced by the match.
>
> Here's a new version with a few more comments and a few remnants of old
> code removed.  I've also removed some dead code found in genrecog.c (got
> sidetracked today into debugging the current peephole2 code again...);
> this was left in after your r34208 patch.
>

i386 peephole2 change breaks bootstrap on Linux/x86-64. Also the
comments:

;; After splitting up read-modify operations, array accesses with memory
;; operands might end up in form:
;;  sall    $2, %eax
;;  movl    4(%esp), %edx
;;  addl    %edx, %eax
;; instead of pre-splitting:
;;  sall    $2, %eax
;;  addl    4(%esp), %eax
;; Turn it into:
;;  movl    4(%esp), %edx
;;  leal    (%edx,%eax,4), %eax

no longer applies to the new peephole2 pattern.


-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  7:22                 ` H.J. Lu
@ 2010-06-30  8:49                   ` H.J. Lu
  2010-06-30 10:15                     ` Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2010-06-30  8:49 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Bernd Schmidt, Richard Henderson, GCC Patches

On Tue, Jun 29, 2010 at 8:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 7:26 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Jun 29, 2010 at 5:43 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>> On Tue, Jun 29, 2010 at 5:41 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>>>> On Tue, Jun 29, 2010 at 6:22 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>>>> No, those parts of the buffer that weren't part of the match remain
>>>>> unaffected, we keep both the insns and their life information.  We only
>>>>> rebuild life for the new insns produced by the match.
>>>>>
>>>>> Here's a new version with a few more comments and a few remnants of old
>>>>> code removed.  I've also removed some dead code found in genrecog.c (got
>>>>> sidetracked today into debugging the current peephole2 code again...);
>>>>> this was left in after your r34208 patch.
>>>>
>>>> I think this causes a bootstrap failure on x86_64-linux-gnu:
>>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: error:
>>>> unrecognizable insn:
>>>> (insn 25 7 26 2
>>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:150 (set (reg:DI 1
>>>> dx)
>>>>        (mem/s:SI (plus:DI (reg/v/f:DI 5 di [orig:64 of ] [64])
>>>>                (const_int 4 [0x4])) [15 entry_2->ctr+0 S4 A32])) -1 (nil))
>>>> /home/apinski/src/gcc-fsf/local//gcc/gcc/coverage.c:151:1: internal
>>>> compiler error: in extract_insn, at recog.c:2127
>>>
>>> +  [(match_scratch:SI 5 "r")
>>>
>>> I think the :SI part is incorrect, we need a DI mode on x86_64 rather
>>> than a SImode.
>>>
>>
>> Like this?
>>
>> --
>> H.J.
>> ---
>> Index: gcc/config/i386/i386.md
>> ===================================================================
>> --- gcc/config/i386/i386.md     (revision 161586)
>> +++ gcc/config/i386/i386.md     (working copy)
>> @@ -17558,7 +17558,7 @@
>>  ;;  leal    (%edx,%eax,4), %eax
>>
>>  (define_peephole2
>> -  [(match_scratch:SI 5 "r")
>> +  [(match_scratch:P 5 "r")
>>    (parallel [(set (match_operand 0 "register_operand" "")
>>                   (ashift (match_operand 1 "register_operand" "")
>>                           (match_operand 2 "const_int_operand" "")))
>>
>
> It doesn't work.
>
> --
> H.J.
>

This seems to work.


-- 
H.J.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7003f52..c450c38 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17558,7 +17558,7 @@
 ;;  leal    (%edx,%eax,4), %eax

 (define_peephole2
-  [(match_scratch:SI 5 "r")
+  [(match_scratch:P 5 "r")
    (parallel [(set (match_operand 0 "register_operand" "")
 		   (ashift (match_operand 1 "register_operand" "")
 			   (match_operand 2 "const_int_operand" "")))
@@ -17587,9 +17587,12 @@

   operands[1] = gen_rtx_PLUS (Pmode, base,
   			      gen_rtx_MULT (Pmode, index, GEN_INT (scale)));
-  if (mode != Pmode)
-    operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
   operands[5] = base;
+  if (mode != Pmode)
+    {
+      operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
+      operands[5] = gen_rtx_SUBREG (mode, operands[5], 0);
+    }
   operands[0] = dest;
 })

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30  8:49                   ` H.J. Lu
@ 2010-06-30 10:15                     ` Bernd Schmidt
  2010-06-30 10:52                       ` Richard Guenther
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-30 10:15 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andrew Pinski, Richard Henderson, GCC Patches

On 06/30/2010 06:56 AM, H.J. Lu wrote:

> This seems to work.

Please commit then.  Sorry about the breakage, I guess one has to test
changes to i386.md on both targets.


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 10:15                     ` Bernd Schmidt
@ 2010-06-30 10:52                       ` Richard Guenther
  2010-06-30 11:09                         ` Bernd Schmidt
  2010-06-30 16:51                         ` H.J. Lu
  0 siblings, 2 replies; 29+ messages in thread
From: Richard Guenther @ 2010-06-30 10:52 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 10:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/30/2010 06:56 AM, H.J. Lu wrote:
>
>> This seems to work.
>
> Please commit then.  Sorry about the breakage, I guess one has to test
> changes to i386.md on both targets.

I have committed it.  Bootstrap continues until libjava which still
fails for me running out of virtual memory (16GB ulimit):

virtual memory exhausted: Cannot allocate memory
make[3]: *** [gnu/javax/imageio/jpeg.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
virtual memory exhausted: Cannot allocate memory
make[3]: *** [javax/swing/text.lo] Error 1

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 10:52                       ` Richard Guenther
@ 2010-06-30 11:09                         ` Bernd Schmidt
  2010-06-30 11:24                           ` Richard Guenther
  2010-06-30 16:51                         ` H.J. Lu
  1 sibling, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-30 11:09 UTC (permalink / raw)
  To: Richard Guenther; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On 06/30/2010 11:48 AM, Richard Guenther wrote:
> On Wed, Jun 30, 2010 at 10:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> On 06/30/2010 06:56 AM, H.J. Lu wrote:
>>
>>> This seems to work.
>>
>> Please commit then.  Sorry about the breakage, I guess one has to test
>> changes to i386.md on both targets.
> 
> I have committed it.  Bootstrap continues until libjava which still
> fails for me running out of virtual memory (16GB ulimit):

Is that new with the peephole2 patch?


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:09                         ` Bernd Schmidt
@ 2010-06-30 11:24                           ` Richard Guenther
  2010-06-30 11:34                             ` Richard Guenther
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Guenther @ 2010-06-30 11:24 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 11:49 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/30/2010 11:48 AM, Richard Guenther wrote:
>> On Wed, Jun 30, 2010 at 10:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>> On 06/30/2010 06:56 AM, H.J. Lu wrote:
>>>
>>>> This seems to work.
>>>
>>> Please commit then.  Sorry about the breakage, I guess one has to test
>>> changes to i386.md on both targets.
>>
>> I have committed it.  Bootstrap continues until libjava which still
>> fails for me running out of virtual memory (16GB ulimit):
>
> Is that new with the peephole2 patch?

No.  It is in fact

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44722

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:24                           ` Richard Guenther
@ 2010-06-30 11:34                             ` Richard Guenther
  2010-06-30 11:52                               ` Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Guenther @ 2010-06-30 11:34 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 11:52 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 11:49 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> On 06/30/2010 11:48 AM, Richard Guenther wrote:
>>> On Wed, Jun 30, 2010 at 10:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>>> On 06/30/2010 06:56 AM, H.J. Lu wrote:
>>>>
>>>>> This seems to work.
>>>>
>>>> Please commit then.  Sorry about the breakage, I guess one has to test
>>>> changes to i386.md on both targets.
>>>
>>> I have committed it.  Bootstrap continues until libjava which still
>>> fails for me running out of virtual memory (16GB ulimit):
>>
>> Is that new with the peephole2 patch?
>
> No.  It is in fact
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44722

We're peepholing in circles after your patch.  The peephole2
dump isn't very informative - how can I see which peephole2s
matched?

Richard.

> Richard.
>
>>
>> Bernd
>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:52                               ` Bernd Schmidt
@ 2010-06-30 11:52                                 ` Richard Guenther
  2010-06-30 11:53                                   ` Bernd Schmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Guenther @ 2010-06-30 11:52 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 12:19 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/30/2010 12:00 PM, Richard Guenther wrote:
>
>> We're peepholing in circles after your patch.  The peephole2
>> dump isn't very informative - how can I see which peephole2s
>> matched?
>
> Not easily.  Maybe breakpoints on all the gen_peephole2_ functions; we
> should probably add that to the dumps.
>
> Seems to be something involving FIX insns, I suspect
>
> ;; Shorten x87->SSE reload sequences of fix_trunc?f?i_sse patterns.
> (define_peephole2
>  [(set (match_operand:MODEF 0 "register_operand" "")
>        (match_operand:MODEF 1 "memory_operand" ""))
>   (set (match_operand:SSEMODEI24 2 "register_operand" "")
>        (fix:SSEMODEI24 (match_dup 0)))]
>  "TARGET_SHORTEN_X87_SSE
>   && peep2_reg_dead_p (2, operands[0])"
>  [(set (match_dup 2) (fix:SSEMODEI24 (match_dup 1)))]
>  "")

I suppose adding && !TARGET_AVOID_VECTOR_DECODE
would fix it.  Testing that.

> and
>
> (define_peephole2
>  [(match_scratch:SF 2 "x")
>   (set (match_operand:SSEMODEI24 0 "register_operand" "")
>        (fix:SSEMODEI24 (match_operand:SF 1 "memory_operand" "")))]
>  "TARGET_AVOID_VECTOR_DECODE && optimize_insn_for_speed_p ()"
>  [(set (match_dup 2) (match_dup 1))
>   (set (match_dup 0) (fix:SSEMODEI24 (match_dup 2)))]
>  "")
>
> are a loop.
>
>
> Bernd
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:34                             ` Richard Guenther
@ 2010-06-30 11:52                               ` Bernd Schmidt
  2010-06-30 11:52                                 ` Richard Guenther
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-30 11:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On 06/30/2010 12:00 PM, Richard Guenther wrote:

> We're peepholing in circles after your patch.  The peephole2
> dump isn't very informative - how can I see which peephole2s
> matched?

Not easily.  Maybe breakpoints on all the gen_peephole2_ functions; we
should probably add that to the dumps.

Seems to be something involving FIX insns, I suspect

;; Shorten x87->SSE reload sequences of fix_trunc?f?i_sse patterns.
(define_peephole2
  [(set (match_operand:MODEF 0 "register_operand" "")
	(match_operand:MODEF 1 "memory_operand" ""))
   (set (match_operand:SSEMODEI24 2 "register_operand" "")
	(fix:SSEMODEI24 (match_dup 0)))]
  "TARGET_SHORTEN_X87_SSE
   && peep2_reg_dead_p (2, operands[0])"
  [(set (match_dup 2) (fix:SSEMODEI24 (match_dup 1)))]
  "")

and

(define_peephole2
  [(match_scratch:SF 2 "x")
   (set (match_operand:SSEMODEI24 0 "register_operand" "")
	(fix:SSEMODEI24 (match_operand:SF 1 "memory_operand" "")))]
  "TARGET_AVOID_VECTOR_DECODE && optimize_insn_for_speed_p ()"
  [(set (match_dup 2) (match_dup 1))
   (set (match_dup 0) (fix:SSEMODEI24 (match_dup 2)))]
  "")

are a loop.


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:52                                 ` Richard Guenther
@ 2010-06-30 11:53                                   ` Bernd Schmidt
  2010-06-30 12:10                                     ` Richard Guenther
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schmidt @ 2010-06-30 11:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On 06/30/2010 12:31 PM, Richard Guenther wrote:
> On Wed, Jun 30, 2010 at 12:19 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> On 06/30/2010 12:00 PM, Richard Guenther wrote:
>>
>>> We're peepholing in circles after your patch.  The peephole2
>>> dump isn't very informative - how can I see which peephole2s
>>> matched?
>>
>> Not easily.  Maybe breakpoints on all the gen_peephole2_ functions; we
>> should probably add that to the dumps.
>>
>> Seems to be something involving FIX insns, I suspect
>>
>> ;; Shorten x87->SSE reload sequences of fix_trunc?f?i_sse patterns.
>> (define_peephole2
>>  [(set (match_operand:MODEF 0 "register_operand" "")
>>        (match_operand:MODEF 1 "memory_operand" ""))
>>   (set (match_operand:SSEMODEI24 2 "register_operand" "")
>>        (fix:SSEMODEI24 (match_dup 0)))]
>>  "TARGET_SHORTEN_X87_SSE
>>   && peep2_reg_dead_p (2, operands[0])"
>>  [(set (match_dup 2) (fix:SSEMODEI24 (match_dup 1)))]
>>  "")
> 
> I suppose adding && !TARGET_AVOID_VECTOR_DECODE
> would fix it.  Testing that.

I just added "0 &&" to one of the patterns, it also fixed the Fortran
testcase.  Also running a bootstrap...

Do you want to take it from here with your version?


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 11:53                                   ` Bernd Schmidt
@ 2010-06-30 12:10                                     ` Richard Guenther
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Guenther @ 2010-06-30 12:10 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 12:41 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 06/30/2010 12:31 PM, Richard Guenther wrote:
>> On Wed, Jun 30, 2010 at 12:19 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>> On 06/30/2010 12:00 PM, Richard Guenther wrote:
>>>
>>>> We're peepholing in circles after your patch.  The peephole2
>>>> dump isn't very informative - how can I see which peephole2s
>>>> matched?
>>>
>>> Not easily.  Maybe breakpoints on all the gen_peephole2_ functions; we
>>> should probably add that to the dumps.
>>>
>>> Seems to be something involving FIX insns, I suspect
>>>
>>> ;; Shorten x87->SSE reload sequences of fix_trunc?f?i_sse patterns.
>>> (define_peephole2
>>>  [(set (match_operand:MODEF 0 "register_operand" "")
>>>        (match_operand:MODEF 1 "memory_operand" ""))
>>>   (set (match_operand:SSEMODEI24 2 "register_operand" "")
>>>        (fix:SSEMODEI24 (match_dup 0)))]
>>>  "TARGET_SHORTEN_X87_SSE
>>>   && peep2_reg_dead_p (2, operands[0])"
>>>  [(set (match_dup 2) (fix:SSEMODEI24 (match_dup 1)))]
>>>  "")
>>
>> I suppose adding && !TARGET_AVOID_VECTOR_DECODE
>> would fix it.  Testing that.
>
> I just added "0 &&" to one of the patterns, it also fixed the Fortran
> testcase.  Also running a bootstrap...
>
> Do you want to take it from here with your version?

Yes.  I'm through bootstrap and into testing right now.

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 10:52                       ` Richard Guenther
  2010-06-30 11:09                         ` Bernd Schmidt
@ 2010-06-30 16:51                         ` H.J. Lu
  2010-07-01  9:26                           ` Bernd Schmidt
  1 sibling, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2010-06-30 16:51 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Bernd Schmidt, Andrew Pinski, Richard Henderson, GCC Patches

On Wed, Jun 30, 2010 at 2:48 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 30, 2010 at 10:50 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> On 06/30/2010 06:56 AM, H.J. Lu wrote:
>>
>>> This seems to work.
>>
>> Please commit then.  Sorry about the breakage, I guess one has to test
>> changes to i386.md on both targets.
>
> I have committed it.  Bootstrap continues until libjava which still
> fails for me running out of virtual memory (16GB ulimit):
>
> virtual memory exhausted: Cannot allocate memory
> make[3]: *** [gnu/javax/imageio/jpeg.lo] Error 1
> make[3]: *** Waiting for unfinished jobs....
> virtual memory exhausted: Cannot allocate memory
> make[3]: *** [javax/swing/text.lo] Error 1
>

I got bootstrap failure when configured with --with-cpu=atom:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44727

-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Resubmit/ping: peephole2 vs cond-exec vs df
  2010-06-30 16:51                         ` H.J. Lu
@ 2010-07-01  9:26                           ` Bernd Schmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Bernd Schmidt @ 2010-07-01  9:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Andrew Pinski, Richard Henderson, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 175 bytes --]

On 06/30/2010 05:48 PM, H.J. Lu wrote:

> I got bootstrap failure when configured with --with-cpu=atom:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44727

Fixed.


Bernd

[-- Attachment #2: atom-peep.diff --]
[-- Type: text/plain, Size: 1011 bytes --]

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 161655)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2010-07-01  Bernd Schmidt  <bernds@codesourcery.com>
+
+	PR target/44727
+	* config/i386/i386.md (peephole2 for arithmetic ops with memory):
+	Make sure operand 0 dies.
+
 2010-07-01  Richard Guenther  <rguenther@suse.de>
 
 	PR middle-end/42834
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 161655)
+++ config/i386/i386.md	(working copy)
@@ -17575,6 +17575,8 @@ (define_peephole2
 	    || GET_MODE (operands[0]) == HImode))
        || GET_MODE (operands[0]) == SImode
        || (TARGET_64BIT && GET_MODE (operands[0]) == DImode))
+   && (rtx_equal_p (operands[0], operands[3])
+       || peep2_reg_dead_p (2, operands[0]))
    /* We reorder load and the shift.  */
    && !reg_overlap_mentioned_p (operands[0], operands[4])"
   [(set (match_dup 5) (match_dup 4))

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-07-01  9:26 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-22 16:36 peephole2 vs cond-exec vs df Bernd Schmidt
2010-06-07 14:46 ` Resubmit/ping: " Bernd Schmidt
2010-06-14 10:17   ` Bernd Schmidt
2010-06-21 14:14     ` Ping^5: " Bernd Schmidt
2010-06-28 12:37       ` Ping^6: " Bernd Schmidt
2010-06-29  4:46   ` Resubmit/ping: " Richard Henderson
2010-06-29  8:34     ` Bernd Schmidt
2010-06-29  8:58       ` Richard Henderson
2010-06-29 14:31         ` Bernd Schmidt
2010-06-29 18:28           ` Richard Henderson
2010-06-30  2:01           ` Andrew Pinski
2010-06-30  2:03             ` Andrew Pinski
2010-06-30  6:24               ` H.J. Lu
2010-06-30  7:22                 ` H.J. Lu
2010-06-30  8:49                   ` H.J. Lu
2010-06-30 10:15                     ` Bernd Schmidt
2010-06-30 10:52                       ` Richard Guenther
2010-06-30 11:09                         ` Bernd Schmidt
2010-06-30 11:24                           ` Richard Guenther
2010-06-30 11:34                             ` Richard Guenther
2010-06-30 11:52                               ` Bernd Schmidt
2010-06-30 11:52                                 ` Richard Guenther
2010-06-30 11:53                                   ` Bernd Schmidt
2010-06-30 12:10                                     ` Richard Guenther
2010-06-30 16:51                         ` H.J. Lu
2010-07-01  9:26                           ` Bernd Schmidt
2010-06-30  6:10             ` H.J. Lu
2010-06-30  7:46           ` H.J. Lu
2010-06-14 12:28 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).