public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] splits after reload
@ 2004-02-04  9:16 Rakesh Kumar - Software, Noida
  2004-02-04  9:29 ` Richard Henderson
  2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  0 siblings, 2 replies; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-04  9:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: Joern Rennecke

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

Hi,

The problem has been previously discussed under
following threads

http://gcc.gnu.org/ml/gcc/2003-12/msg00700.html

http://gcc.gnu.org/ml/gcc/2004-01/msg00042.html

After little thought and some experimentation,
I realized that peephole2 is not the solution to
this problem. Peepholes work on the patterns of
insns and hence the insn chain can't be
traversed for recombinations.
At the same time, I agree that recording insns
at the time of split and hacking
peephole is fragile.

I'd like to advocate defining the target hook for
recombining splits in split_all_insns (). If an INSN
is split, the target hook is called with
PREV_INSN (INSN) and the last insn generated
in split as arguments. The target can check for
each insn in split sequence as possible candidate
for recombination. The default value for the
target hook is 0, so it would not burden the
targets which do not override it. Certainly,
it might help the targets which got interested
towards recombining splits.

I implemented the hook on SH4 target. I'm
combining only ADD insns, since that is
requirement for SH. There are a hoard of other
operations possible.

The attached patch is fully tested with GCC
testsuite and stress-1.17. I'm getting the
following size gains with -O2 -ml -m4 switches.

Stress		Mainline	With Split		%age
Testcase		GCC		Recombination	Gain
--------		--------	-------------	-----
ls_step.i		3968		3744			5.64
gentest.c		1184		1120			5.4
decode.i		1380		1316			4.63
dct64.i		1792		1728			3.57
navion_gear.i	1912		1848			3.35
ls_accel.i		3232		3136			2.97
scanline.i		2304		2240			2.77
vorbis_mdct.i	2688		2624			2.38
layer3.i		19952		19568			1.92
clip.i		14012		13788			1.6
tonal.i		10928		10768			1.46
layer2.i		2400		2368			1.33
g_misc.i		22140		21852			1.3
slalom.i		9376		9280			1.02

Other test cases showing below 1% improvement are:
attstub.i, avl.i, compress.i, gems.i, g_func.i,
imdecode.i, index.c, l3psy.i, linpackc.i, lndsub.i,
ls_aux.i, mathutil.i, player.i, quantize.i, rengine.i,
revolt.i, segment.i, shear.i, slib.i, smallft.i,
sound.i, xvalg.i

Please review the attached patch and let me know your
valuable comments. I'd be thankful.

Regards and Thanks,
Rakesh Kumar


[-- Attachment #2: ChangeLog --]
[-- Type: application/octet-stream, Size: 709 bytes --]

2004-01-29  Rakesh Kumar  <rakeshku@noida.hcltech.com>

	* target.h (recombine_splits): New target hook in struct gcc_target

	* target-def.h (TARGET_SPLIT_RECOMBINE): New macro corresponding to
	recombine_splits.

	* recog.c: Include target.h
	(split_all_insns): Call target hook for recombining splits.

	* config/sh/sh.c (try_recombine): New function to recombine
	the instruction with the instructions in the same basic block.
	(sh_split_recombine): New function to recognize the insns,
	which are potential candidates for recombination, from the
	split sequence and pass them on to try_split.
	(TARGET_SPLIT_RECOMBINE): Define it to sh_split_recombine.

	* doc/tm.texi (TARGET_SPLIT_RECOMBINE): Document

[-- Attachment #3: patch_recombine --]
[-- Type: application/octet-stream, Size: 11797 bytes --]

Index: gcc/target.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target.h,v
retrieving revision 1.76
diff -c -3 -p -r1.76 target.h
*** gcc/target.h	23 Jan 2004 21:05:18 -0000	1.76
--- gcc/target.h	4 Feb 2004 09:08:06 -0000
*************** struct gcc_target
*** 453,458 ****
--- 453,461 ----
         targetm.calls.strict_argument_naming().  */
      bool (*pretend_outgoing_varargs_named) (CUMULATIVE_ARGS *ca);
    } calls;
+ 
+   /* Function for recombining splits.  */
+   void (*recombine_splits) (rtx, rtx);
  };
  
  extern struct gcc_target targetm;
Index: gcc/target-def.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target-def.h,v
retrieving revision 1.64
diff -c -3 -p -r1.64 target-def.h
*** gcc/target-def.h	23 Jan 2004 21:05:18 -0000	1.64
--- gcc/target-def.h	4 Feb 2004 09:08:19 -0000
*************** Foundation, 59 Temple Place - Suite 330,
*** 301,306 ****
--- 301,307 ----
  #define TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P hook_bool_tree_false
  #define TARGET_MS_BITFIELD_LAYOUT_P hook_bool_tree_false
  #define TARGET_RTX_COSTS hook_bool_rtx_int_int_intp_false
+ #define TARGET_SPLIT_RECOMBINE 0
  
  #ifndef TARGET_INIT_LIBFUNCS
  #define TARGET_INIT_LIBFUNCS hook_void_void
*************** Foundation, 59 Temple Place - Suite 330,
*** 398,403 ****
--- 399,405 ----
    TARGET_ASM_FILE_START_APP_OFF,		\
    TARGET_ASM_FILE_START_FILE_DIRECTIVE,		\
    TARGET_CALLS,					\
+   TARGET_SPLIT_RECOMBINE,			\
  }
  
  #include "hooks.h"
Index: gcc/recog.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/recog.c,v
retrieving revision 1.195
diff -c -3 -p -r1.195 recog.c
*** gcc/recog.c	23 Jan 2004 23:49:36 -0000	1.195
--- gcc/recog.c	4 Feb 2004 09:08:31 -0000
*************** Software Foundation, 59 Temple Place - S
*** 38,43 ****
--- 38,44 ----
  #include "toplev.h"
  #include "basic-block.h"
  #include "output.h"
+ #include "target.h"
  #include "reload.h"
  
  #ifndef STACK_PUSH_CODE
*************** split_all_insns (int upd_life)
*** 2728,2734 ****
  
    FOR_EACH_BB_REVERSE (bb)
      {
!       rtx insn, next;
        bool finish = false;
  
        for (insn = BB_HEAD (bb); !finish ; insn = next)
--- 2729,2735 ----
  
    FOR_EACH_BB_REVERSE (bb)
      {
!       rtx insn, next, prev;
        bool finish = false;
  
        for (insn = BB_HEAD (bb); !finish ; insn = next)
*************** split_all_insns (int upd_life)
*** 2736,2741 ****
--- 2737,2743 ----
  	  /* Can't use `next_real_insn' because that might go across
  	     CODE_LABELS and short-out basic blocks.  */
  	  next = NEXT_INSN (insn);
+           prev = PREV_INSN (insn);
  	  finish = (insn == BB_END (bb));
  	  if (INSN_P (insn))
  	    {
*************** split_all_insns (int upd_life)
*** 2776,2781 ****
--- 2778,2793 ----
  		      while (GET_CODE (last) == BARRIER)
  			last = PREV_INSN (last);
  		      SET_BIT (blocks, bb->index);
+ 
+ 		      /* In SH4, the double precision load/stores are splitted
+ 			 into a sequence of three instruction - two single
+ 			 precision load/stores and one address arithmetic insn.
+ 			 This target hook tries to recombine the address
+ 			 arithmetic insn generated. Other targets may use this
+                          hook for target dependent operations on split sequences.  */
+                       if (*targetm.recombine_splits)
+                         (*targetm.recombine_splits)(last, prev);
+ 
  		      changed = true;
  		    }
  		}
Index: gcc/config/sh/sh.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/sh/sh.c,v
retrieving revision 1.250
diff -c -3 -p -r1.250 sh.c
*** gcc/config/sh/sh.c	23 Jan 2004 15:29:09 -0000	1.250
--- gcc/config/sh/sh.c	4 Feb 2004 09:08:45 -0000
*************** static void sh_setup_incoming_varargs (C
*** 249,254 ****
--- 249,256 ----
  static bool sh_strict_argument_naming (CUMULATIVE_ARGS *);
  static bool sh_pretend_outgoing_varargs_named (CUMULATIVE_ARGS *);
  static tree sh_build_builtin_va_list (void);
+ static void sh_split_recombine (rtx, rtx);
+ static void try_recombine (rtx, rtx, basic_block);
  
  \f
  /* Initialize the GCC target structure.  */
*************** static tree sh_build_builtin_va_list (vo
*** 319,324 ****
--- 321,329 ----
  #undef TARGET_ADDRESS_COST
  #define TARGET_ADDRESS_COST sh_address_cost
  
+ #undef TARGET_SPLIT_RECOMBINE
+ #define TARGET_SPLIT_RECOMBINE sh_split_recombine
+ 
  #undef TARGET_MACHINE_DEPENDENT_REORG
  #define TARGET_MACHINE_DEPENDENT_REORG sh_reorg
  
*************** check_use_sfunc_addr (rtx insn, rtx reg)
*** 9229,9234 ****
--- 9234,9383 ----
        return rtx_equal_p (extract_sfunc_addr (insn), reg);
      }
    abort ();
+ }
+ 
+ /* Try recombining the address arithmetic insns.
+    REG_RTX is the register used for address arithmetic
+    INSN is the insn for address arithmetic
+    BB is the basic block containing INSN.  */
+ static void
+ try_recombine (reg_rtx, insn, bb)
+    rtx reg_rtx;
+    rtx insn;
+    basic_block bb;
+ {
+   rtx scan, end_insn;
+   int move;
+   int inc = INTVAL (XEXP (SET_SRC (PATTERN (insn)), 1));
+ 
+   /* The addition of negative values is recombined with the
+      insns ahead in the instruction stream, and those of
+      positive values is recombined with prior insns. This
+      is an optimization based on the fact that negative add
+      is emitted in case of double precision loads and positive
+      add is emitted for double precision stores. In most cases
+      the REG_RTX is mentioned in preceding insn (for load) or
+      following insn (for store). We are not crossing the basic
+      blocks in any case.  */
+   if (inc < 0)
+     {
+       move = 1;
+       end_insn = BB_END (bb);
+     }
+   else if (inc > 0)
+     {
+       move = -1;
+       end_insn = BB_HEAD (bb);
+     }
+   else
+     return;
+ 
+   scan = (move == -1 ? PREV_INSN (insn) : NEXT_INSN (insn));
+ 
+   /* If insn is last insn in the function.  */
+   if (!scan)
+     return;
+ 
+   do
+     {
+        if (GET_CODE (scan) == CODE_LABEL || GET_CODE (scan) == BARRIER)
+          break;
+ 
+        if (NOTE_INSN_BASIC_BLOCK_P (scan))
+          break;
+ 
+        if (reg_mentioned_p (reg_rtx, scan))
+          {
+            if (GET_CODE (PATTERN (scan)) != PARALLEL && single_set (scan))
+              {
+                rtx set = single_set (scan);
+                rtx set_src = SET_SRC (set);
+                rtx set_dest = SET_DEST (set);
+ 
+                /* scan should be of form ADD #x, REG.  */
+                if (GET_CODE (set_dest) == REG && GET_CODE (set_src) == PLUS
+                       && GET_CODE (XEXP (set_src, 0)) == REG
+                       && REGNO (set_dest) == REGNO (XEXP (set_src, 0))
+                       && GET_CODE (XEXP (set_src, 1)) == CONST_INT)
+                  {
+                     rtx prev_scan = scan;
+                     int new_inc = INTVAL (XEXP (set_src, 1)) + inc;
+ 
+                     /* Recombination should not be costly. Any constant
+                        not satisfying this condition will go in literal
+                        pool.  */
+                     if (CONST_OK_FOR_I08 (new_inc))
+                       {
+                         XEXP (SET_SRC (PATTERN (insn)), 1) = GEN_INT (new_inc);
+                         inc = new_inc;
+ 
+                         scan = (move == -1 ? PREV_INSN (scan) : NEXT_INSN (scan));
+ 			/* Make the insn a NOTE.  */
+                         PUT_CODE (prev_scan, NOTE);
+                         NOTE_LINE_NUMBER (prev_scan) = NOTE_INSN_DELETED;
+                         NOTE_SOURCE_FILE (prev_scan) = 0;
+                         continue;
+                       }
+                   }
+                }
+              break;
+            }
+ 
+         scan = (move == -1 ? PREV_INSN (scan) : NEXT_INSN (scan));
+ 
+       } while (scan && scan != end_insn);
+ }
+ 
+ /* This funtion is called from split_all_insns in recog.c.
+    It looks for the potential candidates for the recombination
+    and passes them on to try_recombine. The recombination is
+    actually done by try_recombine.  */
+ static void
+ sh_split_recombine (rtx split_seq, rtx prev)
+ {
+   /* The optimization is local to the block.  */
+   basic_block bb = BLOCK_FOR_INSN (prev);
+ 
+   if (!reload_completed || !TARGET_SH4 || TARGET_FMOVD)
+     return;
+ 
+ /* Scan all instructions in split sequence for potential
+    recombination candidates.  */
+   while (split_seq != prev)
+     {
+       rtx prev_insn = PREV_INSN (split_seq);
+ 
+       if (INSN_P (split_seq))
+         {
+            rtx pat = PATTERN (split_seq);
+            rtx set = single_set (split_seq);
+ 
+            if (GET_CODE (pat) != PARALLEL && set)
+              {
+                rtx set_src = SET_SRC (set);
+                rtx set_dest = SET_DEST (set);
+ 
+                /* Look for the desired insn. recog can be used here,
+                   but that would be more expensive.  */
+                if (GET_CODE (set_dest) == REG && GET_CODE (set_src) == PLUS
+                      && GET_CODE (XEXP (set_src, 0)) == REG
+                      && REGNO (XEXP (set_src, 0)) == REGNO (set_dest)
+                      && GET_CODE (XEXP (set_src, 1)) == CONST_INT)
+                  {
+                     try_recombine (set_dest, split_seq, bb);
+ 
+                     /* Do not delete the insn with null effect. Make it a note.  */
+                     if (INTVAL (XEXP (SET_SRC (PATTERN (split_seq)), 1)) == 0)
+                       {
+                         PUT_CODE (split_seq, NOTE);
+                         NOTE_LINE_NUMBER (split_seq) = NOTE_INSN_DELETED;
+                         NOTE_SOURCE_FILE (split_seq) = 0;
+                       }
+                  }
+               }
+          }
+        split_seq = prev_insn;
+     }
  }
  
  #include "gt-sh.h"
Index: gcc/doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.288
diff -c -3 -p -r1.288 tm.texi
*** gcc/doc/tm.texi	23 Jan 2004 21:05:21 -0000	1.288
--- gcc/doc/tm.texi	4 Feb 2004 09:09:06 -0000
*************** more than this number of multiplications
*** 9219,9221 ****
--- 9219,9240 ----
  system library's @code{pow}, @code{powf} or @code{powl} routines.
  The default value places no upper bound on the multiplication count.
  @end defmac
+ 
+ @deftypefn {Target Hook} void TARGET_SPLIT_RECOMBINE (rtx @var{prev}, rtx @var{last})
+ This target hook provides the optimization for recombining splits.
+ 
+ The insn splitting happens multiple times during many optimization
+ passes. Combiner pass runs only once, even before splitter runs for
+ the first time. This hook provides an opportunity for recombining
+ the splits with the existing insn chain. @var{prev} is insn prior
+ to the insn that is splitted and @var{last} is the last insn emitted
+ in the split sequence. This hook is called in @var{split_all_insns}.
+ 
+ As an example, SH does not support @code{DFmode} loads/stores. Hence
+ after reload, it has to split @code{DFmode} load/store insn into
+ three insns - two @code{SFmode} loads/stores and one address
+ arithmetic insn. The address arithmetic insn is a potential candidate
+ for recombination here.
+ 
+ If you do not define the hook, the default value is 0.
+ @end deftypefn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-04  9:16 [PATCH] splits after reload Rakesh Kumar - Software, Noida
@ 2004-02-04  9:29 ` Richard Henderson
  2004-02-04 14:48   ` Joern Rennecke
  2004-02-21 13:45   ` Richard Henderson
  2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  1 sibling, 2 replies; 14+ messages in thread
From: Richard Henderson @ 2004-02-04  9:29 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida; +Cc: gcc-patches, Joern Rennecke

On Wed, Feb 04, 2004 at 02:50:21PM +0530, Rakesh Kumar - Software, Noida wrote:
> I'd like to advocate defining the target hook for
> recombining splits in split_all_insns ().

No.  Absolutely not.  This is not the purpose of split_all_insns.

It kinda sounds like you want reload-cse to run later than it
currently does.  That might be a quite reasonable solution, but
it would certainly have to be tested on many targets.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-04  9:29 ` Richard Henderson
@ 2004-02-04 14:48   ` Joern Rennecke
  2004-02-05  5:33     ` Richard Henderson
  2004-02-21 13:45     ` Joern Rennecke
  2004-02-21 13:45   ` Richard Henderson
  1 sibling, 2 replies; 14+ messages in thread
From: Joern Rennecke @ 2004-02-04 14:48 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Rakesh Kumar - Software Noida, gcc-patches, Joern Rennecke

> It kinda sounds like you want reload-cse to run later than it
> currently does.  That might be a quite reasonable solution, but
> it would certainly have to be tested on many targets.

If reload_cse does the job remains subject to benchmarking.  But even
if it didn't, a new, separate pass would certainly be cleaner, and pathologic
time behaviour can be better avoided or rectified by appropriate algorithms
and data strctures.  What might be slightly useful to avoid spending
too much time on insns that have been ground through all the optimizers
before is keeping track of cfun->emit->x_cur_insn_uid before the last
split pass.  OTOH, if we have a pass that keeps track of values in registers,
it can't really just ignore the old insns.

We could enable this reload_cse pass with a new -f option, and default
it to off unless the default is overridden by the targets OPTIMIZATION_OPTIONS.
This way, we avoid compile-time and correctness regressions in ports that
don't specifically enable the new pass.
(It makes still sense to test that the new option works on other targets,
 but any failure there wouldn't be a regression.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-04 14:48   ` Joern Rennecke
@ 2004-02-05  5:33     ` Richard Henderson
  2004-02-21 13:45       ` Richard Henderson
  2004-02-21 13:45     ` Joern Rennecke
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2004-02-05  5:33 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Rakesh Kumar - Software Noida, gcc-patches

On Wed, Feb 04, 2004 at 02:47:59PM +0000, Joern Rennecke wrote:
> If reload_cse does the job remains subject to benchmarking.  But even
> if it didn't, a new, separate pass would certainly be cleaner, and pathologic
> time behaviour can be better avoided or rectified by appropriate algorithms
> and data strctures.

I would certainly not object to reload_cse being completely replaced
on all targets.  Doing something special on only some targets seems
not really the right thing.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] splits after reload
  2004-02-04  9:16 [PATCH] splits after reload Rakesh Kumar - Software, Noida
  2004-02-04  9:29 ` Richard Henderson
@ 2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  1 sibling, 0 replies; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-21 13:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: Joern Rennecke

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

Hi,

The problem has been previously discussed under
following threads

http://gcc.gnu.org/ml/gcc/2003-12/msg00700.html

http://gcc.gnu.org/ml/gcc/2004-01/msg00042.html

After little thought and some experimentation,
I realized that peephole2 is not the solution to
this problem. Peepholes work on the patterns of
insns and hence the insn chain can't be
traversed for recombinations.
At the same time, I agree that recording insns
at the time of split and hacking
peephole is fragile.

I'd like to advocate defining the target hook for
recombining splits in split_all_insns (). If an INSN
is split, the target hook is called with
PREV_INSN (INSN) and the last insn generated
in split as arguments. The target can check for
each insn in split sequence as possible candidate
for recombination. The default value for the
target hook is 0, so it would not burden the
targets which do not override it. Certainly,
it might help the targets which got interested
towards recombining splits.

I implemented the hook on SH4 target. I'm
combining only ADD insns, since that is
requirement for SH. There are a hoard of other
operations possible.

The attached patch is fully tested with GCC
testsuite and stress-1.17. I'm getting the
following size gains with -O2 -ml -m4 switches.

Stress		Mainline	With Split		%age
Testcase		GCC		Recombination	Gain
--------		--------	-------------	-----
ls_step.i		3968		3744			5.64
gentest.c		1184		1120			5.4
decode.i		1380		1316			4.63
dct64.i		1792		1728			3.57
navion_gear.i	1912		1848			3.35
ls_accel.i		3232		3136			2.97
scanline.i		2304		2240			2.77
vorbis_mdct.i	2688		2624			2.38
layer3.i		19952		19568			1.92
clip.i		14012		13788			1.6
tonal.i		10928		10768			1.46
layer2.i		2400		2368			1.33
g_misc.i		22140		21852			1.3
slalom.i		9376		9280			1.02

Other test cases showing below 1% improvement are:
attstub.i, avl.i, compress.i, gems.i, g_func.i,
imdecode.i, index.c, l3psy.i, linpackc.i, lndsub.i,
ls_aux.i, mathutil.i, player.i, quantize.i, rengine.i,
revolt.i, segment.i, shear.i, slib.i, smallft.i,
sound.i, xvalg.i

Please review the attached patch and let me know your
valuable comments. I'd be thankful.

Regards and Thanks,
Rakesh Kumar


[-- Attachment #2: ChangeLog --]
[-- Type: application/octet-stream, Size: 709 bytes --]

2004-01-29  Rakesh Kumar  <rakeshku@noida.hcltech.com>

	* target.h (recombine_splits): New target hook in struct gcc_target

	* target-def.h (TARGET_SPLIT_RECOMBINE): New macro corresponding to
	recombine_splits.

	* recog.c: Include target.h
	(split_all_insns): Call target hook for recombining splits.

	* config/sh/sh.c (try_recombine): New function to recombine
	the instruction with the instructions in the same basic block.
	(sh_split_recombine): New function to recognize the insns,
	which are potential candidates for recombination, from the
	split sequence and pass them on to try_split.
	(TARGET_SPLIT_RECOMBINE): Define it to sh_split_recombine.

	* doc/tm.texi (TARGET_SPLIT_RECOMBINE): Document

[-- Attachment #3: patch_recombine --]
[-- Type: application/octet-stream, Size: 11797 bytes --]

Index: gcc/target.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target.h,v
retrieving revision 1.76
diff -c -3 -p -r1.76 target.h
*** gcc/target.h	23 Jan 2004 21:05:18 -0000	1.76
--- gcc/target.h	4 Feb 2004 09:08:06 -0000
*************** struct gcc_target
*** 453,458 ****
--- 453,461 ----
         targetm.calls.strict_argument_naming().  */
      bool (*pretend_outgoing_varargs_named) (CUMULATIVE_ARGS *ca);
    } calls;
+ 
+   /* Function for recombining splits.  */
+   void (*recombine_splits) (rtx, rtx);
  };
  
  extern struct gcc_target targetm;
Index: gcc/target-def.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/target-def.h,v
retrieving revision 1.64
diff -c -3 -p -r1.64 target-def.h
*** gcc/target-def.h	23 Jan 2004 21:05:18 -0000	1.64
--- gcc/target-def.h	4 Feb 2004 09:08:19 -0000
*************** Foundation, 59 Temple Place - Suite 330,
*** 301,306 ****
--- 301,307 ----
  #define TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P hook_bool_tree_false
  #define TARGET_MS_BITFIELD_LAYOUT_P hook_bool_tree_false
  #define TARGET_RTX_COSTS hook_bool_rtx_int_int_intp_false
+ #define TARGET_SPLIT_RECOMBINE 0
  
  #ifndef TARGET_INIT_LIBFUNCS
  #define TARGET_INIT_LIBFUNCS hook_void_void
*************** Foundation, 59 Temple Place - Suite 330,
*** 398,403 ****
--- 399,405 ----
    TARGET_ASM_FILE_START_APP_OFF,		\
    TARGET_ASM_FILE_START_FILE_DIRECTIVE,		\
    TARGET_CALLS,					\
+   TARGET_SPLIT_RECOMBINE,			\
  }
  
  #include "hooks.h"
Index: gcc/recog.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/recog.c,v
retrieving revision 1.195
diff -c -3 -p -r1.195 recog.c
*** gcc/recog.c	23 Jan 2004 23:49:36 -0000	1.195
--- gcc/recog.c	4 Feb 2004 09:08:31 -0000
*************** Software Foundation, 59 Temple Place - S
*** 38,43 ****
--- 38,44 ----
  #include "toplev.h"
  #include "basic-block.h"
  #include "output.h"
+ #include "target.h"
  #include "reload.h"
  
  #ifndef STACK_PUSH_CODE
*************** split_all_insns (int upd_life)
*** 2728,2734 ****
  
    FOR_EACH_BB_REVERSE (bb)
      {
!       rtx insn, next;
        bool finish = false;
  
        for (insn = BB_HEAD (bb); !finish ; insn = next)
--- 2729,2735 ----
  
    FOR_EACH_BB_REVERSE (bb)
      {
!       rtx insn, next, prev;
        bool finish = false;
  
        for (insn = BB_HEAD (bb); !finish ; insn = next)
*************** split_all_insns (int upd_life)
*** 2736,2741 ****
--- 2737,2743 ----
  	  /* Can't use `next_real_insn' because that might go across
  	     CODE_LABELS and short-out basic blocks.  */
  	  next = NEXT_INSN (insn);
+           prev = PREV_INSN (insn);
  	  finish = (insn == BB_END (bb));
  	  if (INSN_P (insn))
  	    {
*************** split_all_insns (int upd_life)
*** 2776,2781 ****
--- 2778,2793 ----
  		      while (GET_CODE (last) == BARRIER)
  			last = PREV_INSN (last);
  		      SET_BIT (blocks, bb->index);
+ 
+ 		      /* In SH4, the double precision load/stores are splitted
+ 			 into a sequence of three instruction - two single
+ 			 precision load/stores and one address arithmetic insn.
+ 			 This target hook tries to recombine the address
+ 			 arithmetic insn generated. Other targets may use this
+                          hook for target dependent operations on split sequences.  */
+                       if (*targetm.recombine_splits)
+                         (*targetm.recombine_splits)(last, prev);
+ 
  		      changed = true;
  		    }
  		}
Index: gcc/config/sh/sh.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/sh/sh.c,v
retrieving revision 1.250
diff -c -3 -p -r1.250 sh.c
*** gcc/config/sh/sh.c	23 Jan 2004 15:29:09 -0000	1.250
--- gcc/config/sh/sh.c	4 Feb 2004 09:08:45 -0000
*************** static void sh_setup_incoming_varargs (C
*** 249,254 ****
--- 249,256 ----
  static bool sh_strict_argument_naming (CUMULATIVE_ARGS *);
  static bool sh_pretend_outgoing_varargs_named (CUMULATIVE_ARGS *);
  static tree sh_build_builtin_va_list (void);
+ static void sh_split_recombine (rtx, rtx);
+ static void try_recombine (rtx, rtx, basic_block);
  
  \f
  /* Initialize the GCC target structure.  */
*************** static tree sh_build_builtin_va_list (vo
*** 319,324 ****
--- 321,329 ----
  #undef TARGET_ADDRESS_COST
  #define TARGET_ADDRESS_COST sh_address_cost
  
+ #undef TARGET_SPLIT_RECOMBINE
+ #define TARGET_SPLIT_RECOMBINE sh_split_recombine
+ 
  #undef TARGET_MACHINE_DEPENDENT_REORG
  #define TARGET_MACHINE_DEPENDENT_REORG sh_reorg
  
*************** check_use_sfunc_addr (rtx insn, rtx reg)
*** 9229,9234 ****
--- 9234,9383 ----
        return rtx_equal_p (extract_sfunc_addr (insn), reg);
      }
    abort ();
+ }
+ 
+ /* Try recombining the address arithmetic insns.
+    REG_RTX is the register used for address arithmetic
+    INSN is the insn for address arithmetic
+    BB is the basic block containing INSN.  */
+ static void
+ try_recombine (reg_rtx, insn, bb)
+    rtx reg_rtx;
+    rtx insn;
+    basic_block bb;
+ {
+   rtx scan, end_insn;
+   int move;
+   int inc = INTVAL (XEXP (SET_SRC (PATTERN (insn)), 1));
+ 
+   /* The addition of negative values is recombined with the
+      insns ahead in the instruction stream, and those of
+      positive values is recombined with prior insns. This
+      is an optimization based on the fact that negative add
+      is emitted in case of double precision loads and positive
+      add is emitted for double precision stores. In most cases
+      the REG_RTX is mentioned in preceding insn (for load) or
+      following insn (for store). We are not crossing the basic
+      blocks in any case.  */
+   if (inc < 0)
+     {
+       move = 1;
+       end_insn = BB_END (bb);
+     }
+   else if (inc > 0)
+     {
+       move = -1;
+       end_insn = BB_HEAD (bb);
+     }
+   else
+     return;
+ 
+   scan = (move == -1 ? PREV_INSN (insn) : NEXT_INSN (insn));
+ 
+   /* If insn is last insn in the function.  */
+   if (!scan)
+     return;
+ 
+   do
+     {
+        if (GET_CODE (scan) == CODE_LABEL || GET_CODE (scan) == BARRIER)
+          break;
+ 
+        if (NOTE_INSN_BASIC_BLOCK_P (scan))
+          break;
+ 
+        if (reg_mentioned_p (reg_rtx, scan))
+          {
+            if (GET_CODE (PATTERN (scan)) != PARALLEL && single_set (scan))
+              {
+                rtx set = single_set (scan);
+                rtx set_src = SET_SRC (set);
+                rtx set_dest = SET_DEST (set);
+ 
+                /* scan should be of form ADD #x, REG.  */
+                if (GET_CODE (set_dest) == REG && GET_CODE (set_src) == PLUS
+                       && GET_CODE (XEXP (set_src, 0)) == REG
+                       && REGNO (set_dest) == REGNO (XEXP (set_src, 0))
+                       && GET_CODE (XEXP (set_src, 1)) == CONST_INT)
+                  {
+                     rtx prev_scan = scan;
+                     int new_inc = INTVAL (XEXP (set_src, 1)) + inc;
+ 
+                     /* Recombination should not be costly. Any constant
+                        not satisfying this condition will go in literal
+                        pool.  */
+                     if (CONST_OK_FOR_I08 (new_inc))
+                       {
+                         XEXP (SET_SRC (PATTERN (insn)), 1) = GEN_INT (new_inc);
+                         inc = new_inc;
+ 
+                         scan = (move == -1 ? PREV_INSN (scan) : NEXT_INSN (scan));
+ 			/* Make the insn a NOTE.  */
+                         PUT_CODE (prev_scan, NOTE);
+                         NOTE_LINE_NUMBER (prev_scan) = NOTE_INSN_DELETED;
+                         NOTE_SOURCE_FILE (prev_scan) = 0;
+                         continue;
+                       }
+                   }
+                }
+              break;
+            }
+ 
+         scan = (move == -1 ? PREV_INSN (scan) : NEXT_INSN (scan));
+ 
+       } while (scan && scan != end_insn);
+ }
+ 
+ /* This funtion is called from split_all_insns in recog.c.
+    It looks for the potential candidates for the recombination
+    and passes them on to try_recombine. The recombination is
+    actually done by try_recombine.  */
+ static void
+ sh_split_recombine (rtx split_seq, rtx prev)
+ {
+   /* The optimization is local to the block.  */
+   basic_block bb = BLOCK_FOR_INSN (prev);
+ 
+   if (!reload_completed || !TARGET_SH4 || TARGET_FMOVD)
+     return;
+ 
+ /* Scan all instructions in split sequence for potential
+    recombination candidates.  */
+   while (split_seq != prev)
+     {
+       rtx prev_insn = PREV_INSN (split_seq);
+ 
+       if (INSN_P (split_seq))
+         {
+            rtx pat = PATTERN (split_seq);
+            rtx set = single_set (split_seq);
+ 
+            if (GET_CODE (pat) != PARALLEL && set)
+              {
+                rtx set_src = SET_SRC (set);
+                rtx set_dest = SET_DEST (set);
+ 
+                /* Look for the desired insn. recog can be used here,
+                   but that would be more expensive.  */
+                if (GET_CODE (set_dest) == REG && GET_CODE (set_src) == PLUS
+                      && GET_CODE (XEXP (set_src, 0)) == REG
+                      && REGNO (XEXP (set_src, 0)) == REGNO (set_dest)
+                      && GET_CODE (XEXP (set_src, 1)) == CONST_INT)
+                  {
+                     try_recombine (set_dest, split_seq, bb);
+ 
+                     /* Do not delete the insn with null effect. Make it a note.  */
+                     if (INTVAL (XEXP (SET_SRC (PATTERN (split_seq)), 1)) == 0)
+                       {
+                         PUT_CODE (split_seq, NOTE);
+                         NOTE_LINE_NUMBER (split_seq) = NOTE_INSN_DELETED;
+                         NOTE_SOURCE_FILE (split_seq) = 0;
+                       }
+                  }
+               }
+          }
+        split_seq = prev_insn;
+     }
  }
  
  #include "gt-sh.h"
Index: gcc/doc/tm.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
retrieving revision 1.288
diff -c -3 -p -r1.288 tm.texi
*** gcc/doc/tm.texi	23 Jan 2004 21:05:21 -0000	1.288
--- gcc/doc/tm.texi	4 Feb 2004 09:09:06 -0000
*************** more than this number of multiplications
*** 9219,9221 ****
--- 9219,9240 ----
  system library's @code{pow}, @code{powf} or @code{powl} routines.
  The default value places no upper bound on the multiplication count.
  @end defmac
+ 
+ @deftypefn {Target Hook} void TARGET_SPLIT_RECOMBINE (rtx @var{prev}, rtx @var{last})
+ This target hook provides the optimization for recombining splits.
+ 
+ The insn splitting happens multiple times during many optimization
+ passes. Combiner pass runs only once, even before splitter runs for
+ the first time. This hook provides an opportunity for recombining
+ the splits with the existing insn chain. @var{prev} is insn prior
+ to the insn that is splitted and @var{last} is the last insn emitted
+ in the split sequence. This hook is called in @var{split_all_insns}.
+ 
+ As an example, SH does not support @code{DFmode} loads/stores. Hence
+ after reload, it has to split @code{DFmode} load/store insn into
+ three insns - two @code{SFmode} loads/stores and one address
+ arithmetic insn. The address arithmetic insn is a potential candidate
+ for recombination here.
+ 
+ If you do not define the hook, the default value is 0.
+ @end deftypefn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-04 14:48   ` Joern Rennecke
  2004-02-05  5:33     ` Richard Henderson
@ 2004-02-21 13:45     ` Joern Rennecke
  1 sibling, 0 replies; 14+ messages in thread
From: Joern Rennecke @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Rakesh Kumar - Software Noida, gcc-patches, Joern Rennecke

> It kinda sounds like you want reload-cse to run later than it
> currently does.  That might be a quite reasonable solution, but
> it would certainly have to be tested on many targets.

If reload_cse does the job remains subject to benchmarking.  But even
if it didn't, a new, separate pass would certainly be cleaner, and pathologic
time behaviour can be better avoided or rectified by appropriate algorithms
and data strctures.  What might be slightly useful to avoid spending
too much time on insns that have been ground through all the optimizers
before is keeping track of cfun->emit->x_cur_insn_uid before the last
split pass.  OTOH, if we have a pass that keeps track of values in registers,
it can't really just ignore the old insns.

We could enable this reload_cse pass with a new -f option, and default
it to off unless the default is overridden by the targets OPTIMIZATION_OPTIONS.
This way, we avoid compile-time and correctness regressions in ports that
don't specifically enable the new pass.
(It makes still sense to test that the new option works on other targets,
 but any failure there wouldn't be a regression.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-05  5:33     ` Richard Henderson
@ 2004-02-21 13:45       ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Rakesh Kumar - Software Noida, gcc-patches

On Wed, Feb 04, 2004 at 02:47:59PM +0000, Joern Rennecke wrote:
> If reload_cse does the job remains subject to benchmarking.  But even
> if it didn't, a new, separate pass would certainly be cleaner, and pathologic
> time behaviour can be better avoided or rectified by appropriate algorithms
> and data strctures.

I would certainly not object to reload_cse being completely replaced
on all targets.  Doing something special on only some targets seems
not really the right thing.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-04  9:29 ` Richard Henderson
  2004-02-04 14:48   ` Joern Rennecke
@ 2004-02-21 13:45   ` Richard Henderson
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida; +Cc: gcc-patches, Joern Rennecke

On Wed, Feb 04, 2004 at 02:50:21PM +0530, Rakesh Kumar - Software, Noida wrote:
> I'd like to advocate defining the target hook for
> recombining splits in split_all_insns ().

No.  Absolutely not.  This is not the purpose of split_all_insns.

It kinda sounds like you want reload-cse to run later than it
currently does.  That might be a quite reasonable solution, but
it would certainly have to be tested on many targets.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-05 16:58 ` Joern Rennecke
@ 2004-02-21 13:45   ` Joern Rennecke
  0 siblings, 0 replies; 14+ messages in thread
From: Joern Rennecke @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida
  Cc: Joern Rennecke, rth, Rakesh Kumar - Software Noida, gcc-patches

> To avoid compile time overhead for other targets, a target hook
> looked reasonable. Other targets can make use of that hook
> (if some are facing similar problems) and solve their "target specific
> problem in a target specific way".

I don't think the objection was against a target hook in general, but
against putting it into the splitting machinery and making it do something 
that is not actually splitting.

> Running a pass over all insns generated by the splitter looks expensive.

No, not necessarily.  When you do a full pass over all insns you can
keep track of some information in a way that is impossible if you
have to look ad-hoc at a lot of individual instructions.

Also, instead of scanning ahead from individual instructions, you can
just remember what you might want to fix up and keep on scanning in
a single sweep.  Thus, while the hook solution os O(n*n) for the
size of basic blocks, the single-pass can be done in something
similar to O(n*log(n))

> It sounds like if SH needs some optimization it has to pay "tax"
> going through all the condition checks which are bound to be
> the part of generic optimizer. 

If the SH does some optimization on its own, you have to pay "tax"
maintaining it separately as the compiler code evolves.
The cost of doing something in a general way is usually just a linear factor,
while algorithmic improvements can be dramatic, and they tend to require
tighter coupling of data structures to the rest of the compiler than
is easily maintainable in a target port.

So, if you can do the optimization in a machine-independent, generally
usable way, go for it.

Moreover, even in cases where you think that your approach is a little
simpler than the one that the reviewer suggests, if the alternate approach
is workable, and gives you the same code quality, it is more constructive
just to implement the alternate approach than to drag on the discussion.
Consider how many people read this list.
If you come against problems with an alternate approach, we can discuss
these, and maybe everyone learns something from the effort.

> reload_cse is not able to perform this optimization, even if
> you run it after flow2. There is no code written in reload_cse for handling
> such problems. Please correct me if I'm wrong.

reload_combine and reload_cse_move2add are O(n) subpasses of
reload_cse_regs.  It seems your optimizations can be added to either
of them with little effort.

reload_combine scans backwards over instructions, keeping track of how
registers are being used.
reload_cse_move2add scans forward over instructions, keeping track of
some values in registers, i.e. constants and sums of registers and
constants.

> If reload_cse () ought to do so, 
> I feel that needs rewriting of reload_cse which certainly is not a
> trivial task. The target hook should be acceptable in the meantime.

I think we are talking about about a screenful of code here.
The main work, of course, is the testing and benchmarking.

> I agree that recombination is not the purpose of split_all_insns ().
> But a target hook helps without harming any other target.
> Please have a look at the patch and let me know what is wrong with it.

Most design decisions that involve style or cleanliness are right or wrong
only when you make the design principles explicit, which would be a lot of
work for gcc.
What we can objectively say is that the hook solution is slower on large
basic blocks than an implementation in reload_combine or reload_cse_move2add,
and the hook solution also leads to more code that is harder to maintain.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] splits after reload
  2004-02-04 11:21 Rakesh Kumar - Software, Noida
@ 2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  0 siblings, 0 replies; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Richard Henderson, Rakesh Kumar - Software, Noida
  Cc: gcc-patches, Joern Rennecke



> -----Original Message-----
> From: Richard Henderson [mailto:rth@redhat.com]
> Sent: Wednesday, February 04, 2004 3:00 PM
> To: Rakesh Kumar - Software, Noida
> Cc: gcc-patches@gcc.gnu.org; Joern Rennecke
> Subject: Re: [PATCH] splits after reload
> 
> 
> On Wed, Feb 04, 2004 at 02:50:21PM +0530, Rakesh Kumar - 
> Software, Noida wrote:
> > I'd like to advocate defining the target hook for
> > recombining splits in split_all_insns ().
> 
> No.  Absolutely not.  This is not the purpose of split_all_insns.
> 
> It kinda sounds like you want reload-cse to run later than it
> currently does.  That might be a quite reasonable solution, but
> it would certainly have to be tested on many targets.
> 

No, that won't serve the purpose. reload_cse won't be able to
recognize the patterns I'm trying to recombine. Consider some
target splitting a complex insn into simpler insns. These 
simpler insns can be recombined as soon as the split happens.
In case of SH, a DFmode load/store insn is split into three insns
after reload. For example

fmov.d @r1, dr2	/* a 64-bit load.  */

This insn is split into

fmov.s @r1+, fr3	/* 32-bit load.  */
fmov.s @r1, fr2
add #-4, r1		/* Address arithmetic insn.  */

The problem is with the 3rd insn generated. There are a number of
cases where it can be combined with other ADD insns in the same
basic block (as evident from the size gains). It can help
eliminate similar problems on other targets as well. There is a
hoard of arithmetic operations like ADD, OR, XOR,
SHIFT, SUB, etc. which are possible candidates.

Targets generates the complex insns initially and later splits them
to simpler ones. But this approach suffers from side effects.
The combiner pass runs even before the splitter runs for the
very first time. GCC doesn't get the opportunity for combining
the insns those would be generated after combiner pass.
Almost every optimization is repeated after reload (be it CSE,
GCSE, or Scheduler). But combiner misses its share.

An alternative pass, peephole2 comes for rescue. But it certainly
is not the substitute for combiner accounting to its limitations.
Particulary, it requires window size to be limited.
There is no way I could run a peephole on whole basic block.

I understand that re-running combiner pass would be an overkill and
certainly
not favoring the idea. But the problem can't be denied. Here's
an alternative mechanism that lets the targets decide action after split.

Thanks for looking at the patch.

Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] splits after reload
  2004-02-05 11:44 Rakesh Kumar - Software, Noida
  2004-02-05 16:58 ` Joern Rennecke
@ 2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  1 sibling, 0 replies; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-21 13:45 UTC (permalink / raw)
  To: Joern Rennecke, rth; +Cc: Rakesh Kumar - Software, Noida, gcc-patches

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org
> [mailto:gcc-patches-owner@gcc.gnu.org]On Behalf Of Joern Rennecke
> Sent: Wednesday, February 04, 2004 8:18 PM
> To: rth@redhat.com
> Cc: rakeshku@noida.hcltech.com; gcc-patches@gcc.gnu.org;
> joern.rennecke@superh.com
> Subject: Re: [PATCH] splits after reload
> If reload_cse does the job remains subject to benchmarking. But even
> if it didn't, a new, separate pass would certainly be 
> cleaner, and pathologic
> time behaviour can be better avoided or rectified by 
> appropriate algorithms
> and data strctures. 

The problem I'm talking about is acute on SH. Please have
a look at the following thread.

http://gcc.gnu.org/ml/gcc/2003-12/msg00700.html 

To avoid compile time overhead for other targets, a target hook
looked reasonable. Other targets can make use of that hook
(if some are facing similar problems) and solve their "target specific
problem in a target specific way".

Running a pass over all insns generated by the splitter looks expensive.
It sounds like if SH needs some optimization it has to pay "tax"
going through all the condition checks which are bound to be
the part of generic optimizer. 

reload_cse is not able to perform this optimization, even if
you run it after flow2. There is no code written in reload_cse for handling
such problems. Please correct me if I'm wrong.

If reload_cse () ought to do so, 
I feel that needs rewriting of reload_cse which certainly is not a
trivial task. The target hook should be acceptable in the meantime.

I agree that recombination is not the purpose of split_all_insns ().
But a target hook helps without harming any other target.
Please have a look at the patch and let me know what is wrong with it.

I'm thankful to you for your valuable time and suggestions.

Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] splits after reload
  2004-02-05 11:44 Rakesh Kumar - Software, Noida
@ 2004-02-05 16:58 ` Joern Rennecke
  2004-02-21 13:45   ` Joern Rennecke
  2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  1 sibling, 1 reply; 14+ messages in thread
From: Joern Rennecke @ 2004-02-05 16:58 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida
  Cc: Joern Rennecke, rth, Rakesh Kumar - Software Noida, gcc-patches

> To avoid compile time overhead for other targets, a target hook
> looked reasonable. Other targets can make use of that hook
> (if some are facing similar problems) and solve their "target specific
> problem in a target specific way".

I don't think the objection was against a target hook in general, but
against putting it into the splitting machinery and making it do something 
that is not actually splitting.

> Running a pass over all insns generated by the splitter looks expensive.

No, not necessarily.  When you do a full pass over all insns you can
keep track of some information in a way that is impossible if you
have to look ad-hoc at a lot of individual instructions.

Also, instead of scanning ahead from individual instructions, you can
just remember what you might want to fix up and keep on scanning in
a single sweep.  Thus, while the hook solution os O(n*n) for the
size of basic blocks, the single-pass can be done in something
similar to O(n*log(n))

> It sounds like if SH needs some optimization it has to pay "tax"
> going through all the condition checks which are bound to be
> the part of generic optimizer. 

If the SH does some optimization on its own, you have to pay "tax"
maintaining it separately as the compiler code evolves.
The cost of doing something in a general way is usually just a linear factor,
while algorithmic improvements can be dramatic, and they tend to require
tighter coupling of data structures to the rest of the compiler than
is easily maintainable in a target port.

So, if you can do the optimization in a machine-independent, generally
usable way, go for it.

Moreover, even in cases where you think that your approach is a little
simpler than the one that the reviewer suggests, if the alternate approach
is workable, and gives you the same code quality, it is more constructive
just to implement the alternate approach than to drag on the discussion.
Consider how many people read this list.
If you come against problems with an alternate approach, we can discuss
these, and maybe everyone learns something from the effort.

> reload_cse is not able to perform this optimization, even if
> you run it after flow2. There is no code written in reload_cse for handling
> such problems. Please correct me if I'm wrong.

reload_combine and reload_cse_move2add are O(n) subpasses of
reload_cse_regs.  It seems your optimizations can be added to either
of them with little effort.

reload_combine scans backwards over instructions, keeping track of how
registers are being used.
reload_cse_move2add scans forward over instructions, keeping track of
some values in registers, i.e. constants and sums of registers and
constants.

> If reload_cse () ought to do so, 
> I feel that needs rewriting of reload_cse which certainly is not a
> trivial task. The target hook should be acceptable in the meantime.

I think we are talking about about a screenful of code here.
The main work, of course, is the testing and benchmarking.

> I agree that recombination is not the purpose of split_all_insns ().
> But a target hook helps without harming any other target.
> Please have a look at the patch and let me know what is wrong with it.

Most design decisions that involve style or cleanliness are right or wrong
only when you make the design principles explicit, which would be a lot of
work for gcc.
What we can objectively say is that the hook solution is slower on large
basic blocks than an implementation in reload_combine or reload_cse_move2add,
and the hook solution also leads to more code that is harder to maintain.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] splits after reload
@ 2004-02-05 11:44 Rakesh Kumar - Software, Noida
  2004-02-05 16:58 ` Joern Rennecke
  2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  0 siblings, 2 replies; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-05 11:44 UTC (permalink / raw)
  To: Joern Rennecke, rth; +Cc: Rakesh Kumar - Software, Noida, gcc-patches

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org
> [mailto:gcc-patches-owner@gcc.gnu.org]On Behalf Of Joern Rennecke
> Sent: Wednesday, February 04, 2004 8:18 PM
> To: rth@redhat.com
> Cc: rakeshku@noida.hcltech.com; gcc-patches@gcc.gnu.org;
> joern.rennecke@superh.com
> Subject: Re: [PATCH] splits after reload
> If reload_cse does the job remains subject to benchmarking. But even
> if it didn't, a new, separate pass would certainly be 
> cleaner, and pathologic
> time behaviour can be better avoided or rectified by 
> appropriate algorithms
> and data strctures. 

The problem I'm talking about is acute on SH. Please have
a look at the following thread.

http://gcc.gnu.org/ml/gcc/2003-12/msg00700.html 

To avoid compile time overhead for other targets, a target hook
looked reasonable. Other targets can make use of that hook
(if some are facing similar problems) and solve their "target specific
problem in a target specific way".

Running a pass over all insns generated by the splitter looks expensive.
It sounds like if SH needs some optimization it has to pay "tax"
going through all the condition checks which are bound to be
the part of generic optimizer. 

reload_cse is not able to perform this optimization, even if
you run it after flow2. There is no code written in reload_cse for handling
such problems. Please correct me if I'm wrong.

If reload_cse () ought to do so, 
I feel that needs rewriting of reload_cse which certainly is not a
trivial task. The target hook should be acceptable in the meantime.

I agree that recombination is not the purpose of split_all_insns ().
But a target hook helps without harming any other target.
Please have a look at the patch and let me know what is wrong with it.

I'm thankful to you for your valuable time and suggestions.

Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] splits after reload
@ 2004-02-04 11:21 Rakesh Kumar - Software, Noida
  2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
  0 siblings, 1 reply; 14+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2004-02-04 11:21 UTC (permalink / raw)
  To: Richard Henderson, Rakesh Kumar - Software, Noida
  Cc: gcc-patches, Joern Rennecke



> -----Original Message-----
> From: Richard Henderson [mailto:rth@redhat.com]
> Sent: Wednesday, February 04, 2004 3:00 PM
> To: Rakesh Kumar - Software, Noida
> Cc: gcc-patches@gcc.gnu.org; Joern Rennecke
> Subject: Re: [PATCH] splits after reload
> 
> 
> On Wed, Feb 04, 2004 at 02:50:21PM +0530, Rakesh Kumar - 
> Software, Noida wrote:
> > I'd like to advocate defining the target hook for
> > recombining splits in split_all_insns ().
> 
> No.  Absolutely not.  This is not the purpose of split_all_insns.
> 
> It kinda sounds like you want reload-cse to run later than it
> currently does.  That might be a quite reasonable solution, but
> it would certainly have to be tested on many targets.
> 

No, that won't serve the purpose. reload_cse won't be able to
recognize the patterns I'm trying to recombine. Consider some
target splitting a complex insn into simpler insns. These 
simpler insns can be recombined as soon as the split happens.
In case of SH, a DFmode load/store insn is split into three insns
after reload. For example

fmov.d @r1, dr2	/* a 64-bit load.  */

This insn is split into

fmov.s @r1+, fr3	/* 32-bit load.  */
fmov.s @r1, fr2
add #-4, r1		/* Address arithmetic insn.  */

The problem is with the 3rd insn generated. There are a number of
cases where it can be combined with other ADD insns in the same
basic block (as evident from the size gains). It can help
eliminate similar problems on other targets as well. There is a
hoard of arithmetic operations like ADD, OR, XOR,
SHIFT, SUB, etc. which are possible candidates.

Targets generates the complex insns initially and later splits them
to simpler ones. But this approach suffers from side effects.
The combiner pass runs even before the splitter runs for the
very first time. GCC doesn't get the opportunity for combining
the insns those would be generated after combiner pass.
Almost every optimization is repeated after reload (be it CSE,
GCSE, or Scheduler). But combiner misses its share.

An alternative pass, peephole2 comes for rescue. But it certainly
is not the substitute for combiner accounting to its limitations.
Particulary, it requires window size to be limited.
There is no way I could run a peephole on whole basic block.

I understand that re-running combiner pass would be an overkill and
certainly
not favoring the idea. But the problem can't be denied. Here's
an alternative mechanism that lets the targets decide action after split.

Thanks for looking at the patch.

Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-02-05 16:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-04  9:16 [PATCH] splits after reload Rakesh Kumar - Software, Noida
2004-02-04  9:29 ` Richard Henderson
2004-02-04 14:48   ` Joern Rennecke
2004-02-05  5:33     ` Richard Henderson
2004-02-21 13:45       ` Richard Henderson
2004-02-21 13:45     ` Joern Rennecke
2004-02-21 13:45   ` Richard Henderson
2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
2004-02-04 11:21 Rakesh Kumar - Software, Noida
2004-02-21 13:45 ` Rakesh Kumar - Software, Noida
2004-02-05 11:44 Rakesh Kumar - Software, Noida
2004-02-05 16:58 ` Joern Rennecke
2004-02-21 13:45   ` Joern Rennecke
2004-02-21 13:45 ` Rakesh Kumar - Software, Noida

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).