public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH][MIPS] Enable load-load/store-store bonding
       [not found] <38C8F1E431EDD94A82971C543A11B4FEE0A588@PUMAIL01.pu.imgtec.org>
@ 2014-06-21 16:25 ` Richard Sandiford
  2014-06-23  9:18   ` Sameera Deshpande
  2014-06-24 10:42   ` Sameera Deshpande
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Sandiford @ 2014-06-21 16:25 UTC (permalink / raw)
  To: Sameera Deshpande; +Cc: Matthew Fortune, gcc-patches

Hi Sameera,

Thanks for the patch.

Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index b5b5ba7..9804ef2 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -18813,6 +18813,9 @@ mips_option_override (void)
>    if (TARGET_MICROMIPS && TARGET_MIPS16)
>      error ("unsupported combination: %s", "-mips16 -mmicromips");
>  
> +  if (TARGET_FIX_24K && TUNE_P5600)
> +    error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
> +
>    /* Save the base compression state and process flags as though we
>       were generating uncompressed code.  */
>    mips_base_compression_flags = TARGET_COMPRESSION;

Although it's a bit of an odd combination, we need to accept
-mfix-24k -mtune=p5600 and continue to implement the 24k workarounds.
The idea is that a distributor can build for a common base architecture,
add -mfix- options for processors that might run the code, and add -mtune=
for the processor that's most of interest optimisation-wise.

We should just make the pairing of stores conditional on !TARGET_FIX_24K.

> diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
> index b9cfd62..d4135cf 100644
> --- a/gcc/config/mips/mips.opt
> +++ b/gcc/config/mips/mips.opt
> @@ -445,3 +445,7 @@ Enum(mips_lib_setting) String(tiny) Value(MIPS_LIB_TINY)
>  
>  msched-weight
>  Target Report Var(TARGET_SCHED_WEIGHT) Undocumented
> +
> +mld-st-pairing
> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING)
> +Enable load/store pairing

Other options are just "TARGET_" + the captialised form of the option name,
so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be misleading
since it's an abbreviation for "load" rather than the LD instruction.
Maybe -mload-store-pairs, since plurals are more common than "-ing"?
Not sure that's a great suggestion though.

If we want a user-level option then it needs to be documented in
invoke.texi.

> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 86ca419..4478f81e 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -3184,3 +3184,6 @@ extern GTY(()) struct target_globals *mips16_globals;
>     with arguments ARGS.  */
>  #define PMODE_INSN(NAME, ARGS) \
>    (Pmode == SImode ? NAME ## _si ARGS : NAME ## _di ARGS)
> +
> +#define ENABLE_LD_ST_PAIRING \
> +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)

The patch requires -mld-st-pairing to be passed explicitly even for
-mtune=p5600.  Is that because it's not a consistent enough win for us
to enable it by default?  It sounded from the description like it should
be an improvement more often that not.

We should allow pairing even without -mtune=p5600.

> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index 7229e8f..05605c5 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -780,6 +780,7 @@
>  
>  (define_mode_iterator MOVEP1 [SI SF])
>  (define_mode_iterator MOVEP2 [SI SF])
> +(define_mode_iterator JOINLDST1 [SI SF DF])

Maybe:

(define_mode_iterator JOIN_MODE [
  SI
  (DI "TARGET_64BIT")
  (SF "TARGET_HARD_FLOAT")
  (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])

and then extend:

> @@ -883,6 +884,8 @@
>  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>  (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>  
> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
> +
>  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>  ;; are different.  Some forms of unextended addiu have an 8-bit immediate
>  ;; field but the equivalent daddiu has only a 5-bit field.

this accordingly.

> @@ -7442,6 +7445,153 @@
>    { return MIPS_CALL ("jal", operands, 0, -1); }
>    [(set_attr "type" "call")
>     (set_attr "insn_count" "3")])
> +
> +(define_insn "*join2_load_store<JOINLDST1:mode>"
> +  [(parallel
> +    [(set (match_operand:JOINLDST1 0 "nonimmediate_operand" "=<reg>,m")
> +	  (match_operand:JOINLDST1 1 "nonimmediate_operand" "m,<reg>"))
> +     (set (match_operand:JOINLDST1 2 "nonimmediate_operand" "=<reg>,m")
> +	  (match_operand:JOINLDST1 3 "nonimmediate_operand" "m,<reg>"))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
> +    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
> +    return "";
> +  }
> +  [(set_attr "move_type" "<insn_type>load,<insn_type>store")
> +   (set_attr_alternative "insn_count"
> +	[(mult (symbol_ref "mips_load_store_insns (operands[1], insn)")
> +	       (const_int 2))
> +	 (mult (symbol_ref "mips_load_store_insns (operands[0], insn)")
> +	       (const_int 2))])])

Outer (parallel ...)s are redundant in a define_insn.

It would be better to add the mips_load_store_insns for each operand
rather than multiplying one of them by 2.  Or see the next bit
for an alternative.

> +;;2 SI/SF/DF loads are joined.
> +(define_peephole2
> +  [(set (match_operand:JOINLDST1 0 "register_operand")
> +	(mem:JOINLDST1 (plus:SI (match_operand:SI 1 "register_operand")
> +				(match_operand:SI 2 "const_int_operand"))))
> +   (set (match_operand:JOINLDST1 3 "register_operand")
> +	(mem:JOINLDST1 (plus:SI (match_dup 1)
> +				(match_operand:SI 4 "const_int_operand"))))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[0]))
> +	== REGNO_REG_CLASS (REGNO (operands[3])) &&
> +   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
> +	== GET_MODE_SIZE (<JOINLDST1:MODE>mode))"
> +
> +  [(parallel [(set (match_dup 0)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
> +    operands[2] = SET_SRC (PATTERN (curr_insn));
> +  })

The operands in the new pattern should come directly from the operands
in the original, rather than going via curr_insn.  That would mean
matching them as memory_operands rather than matching their addresses.

Also, matching (plus ... (const_int)) will miss the important case of
a register with no offset.  It would be better to add a function such
as mips_join_load_store_pair_p that analyses the addresses (via
mips_classify_address) and says whether the pair is OK.

mips_join_load_store_pair_p should also make sure that the destination
of the first load does not overlap the base register, as it does in
something like:

        lw      $2,0($2)
        lw      $3,4($2)

This would be a false optimisation anyway, but more importantly,
the rtl semantics of the single parallel define_insn would be
different from the original, since parallel sets read all their
inputs before modifying any outputs.  You also need to make
sure that the destinations of the two loads don't overlap.

For this reason, the insn patterns should also check
mips_join_load_store_pair_p in their condition.

mips_join_load_store_pair_p could also check whether:

    mips_address_insns (..., mode, false) == 1

We're only really interested in cases where the load or store will be a
single instruction.  If you check that up-front you could get rid of the
mips_load_store_insns stuff and just set the insn_count to 2.

It's possible for SI to be stored in FPRs and SF to be stored in GPRs.
You should either make the define_insn handle that or make the peephole
check for the right class of register.  (Either's OK with me FWIW.)
The REGNO_REG_CLASS check above only checks whether the two registers
are the same class, not whether they're "d" for integer modes and "f"
for floating-point.

> +(define_insn "*join2_storehi"
> +  [(parallel
> +    [(set (match_operand:HI 0 "memory_operand" "=m")
> +	  (match_operand:HI 1 "register_operand" "r"))
> +     (set (match_operand:HI 2 "memory_operand" "=m")
> +	  (match_operand:HI 3 "register_operand" "r"))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
> +    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
> +    return "";
> +  }
> +  [(set_attr "move_type" "store")
> +   (set_attr_alternative "insn_count"
> +	[(mult	(symbol_ref "mips_load_store_insns (operands[0], insn)")
> +		(const_int 2))])])
> +
> +(define_insn "*join2_loadhi"
> +  [(parallel
> +    [(set (match_operand:SI 0 "register_operand" "=r")
> +	  (any_extend:SI (match_operand:HI 1 "memory_operand" "m")))
> +     (set (match_operand:SI 2 "register_operand" "=r")
> +	  (any_extend:SI (match_operand:HI 3 "memory_operand" "m")))])]
> +  "ENABLE_LD_ST_PAIRING && reload_completed"
> +  {
> +    output_asm_insn ("lh<u>\t%0,%1", operands);
> +    output_asm_insn ("lh<u>\t%2,%3", operands);
> +    return "";
> +  }
> +  [(set_attr "move_type" "load")
> +   (set_attr_alternative "insn_count"
> +	[(mult	(symbol_ref "mips_load_store_insns (operands[1], insn)")
> +		(const_int 2))])])
> +
> +
> +;; 2 16 bit integer loads are joined.
> +(define_peephole2
> +  [(set (match_operand:SI 0 "register_operand")
> +	(any_extend:SI (mem:HI (plus:SI
> +				  (match_operand:SI 1 "register_operand")
> +				  (match_operand:SI 2 "const_int_operand")))))
> +   (set (match_operand:SI 3 "register_operand")
> +	(any_extend:SI (mem:HI (plus:SI
> +				  (match_dup 1)
> +				  (match_operand:SI 4 "const_int_operand")))))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[0]))
> +	== REGNO_REG_CLASS (REGNO (operands[3])) &&
> +   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
> +	== GET_MODE_SIZE (HImode))"
> +
> +  [(parallel [(set (match_dup 0)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
> +    operands[2] = SET_SRC (PATTERN (curr_insn));
> +  })
> +
> +;; 2 16 bit integer stores are joined.
> +(define_peephole2
> +  [(set (mem:HI (plus:SI (match_operand:SI 0 "register_operand")
> +			 (match_operand:SI 1 "const_int_operand")))
> +	(match_operand:HI 2 "register_operand"))
> +   (set (mem:HI (plus:SI (match_dup 0)
> +			 (match_operand:SI 3 "const_int_operand")))
> +	(match_operand:HI 4 "register_operand"))]
> +  "ENABLE_LD_ST_PAIRING &&
> +   REGNO_REG_CLASS (REGNO (operands[2]))
> +	== REGNO_REG_CLASS (REGNO (operands[4])) &&
> +   (abs (INTVAL (operands[1]) - INTVAL (operands[3]))
> +	== GET_MODE_SIZE (HImode))"
> +  [(parallel [(set (match_dup 1)
> +		   (match_dup 2))
> +	      (set (match_dup 3)
> +		   (match_dup 4))])]
> +  {
> +    operands[3] = SET_DEST (PATTERN (next_active_insn (curr_insn)));
> +    operands[1] = SET_DEST (PATTERN (curr_insn));
> +  })
> +

Please instead add HI to the define_mode_iterator so that we can use
the same peephole and define_insn.

Are QImodes not paired in the same way?  If so, it'd be worth adding
a comment above the define_mode_iterator saying that QI is deliberately
excluded.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH][MIPS] Enable load-load/store-store bonding
  2014-06-21 16:25 ` [PATCH][MIPS] Enable load-load/store-store bonding Richard Sandiford
@ 2014-06-23  9:18   ` Sameera Deshpande
  2014-06-23  9:41     ` Richard Sandiford
  2014-06-24 10:42   ` Sameera Deshpande
  1 sibling, 1 reply; 15+ messages in thread
From: Sameera Deshpande @ 2014-06-23  9:18 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Matthew Fortune, gcc-patches

Hi Richard,

Thanks for your comments. I am working on the review comments, and will share the reworked patch soon.
However, here is clarification on some of the issues raised.

> > +  if (TARGET_FIX_24K && TUNE_P5600)
> > +    error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
> > +
> >    /* Save the base compression state and process flags as though we
> >       were generating uncompressed code.  */
> >    mips_base_compression_flags = TARGET_COMPRESSION;
> 
> Although it's a bit of an odd combination, we need to accept -mfix-24k -
> mtune=p5600 and continue to implement the 24k workarounds.
> The idea is that a distributor can build for a common base architecture, add -
> mfix- options for processors that might run the code, and add -mtune= for
> the processor that's most of interest optimisation-wise.
> 
> We should just make the pairing of stores conditional on !TARGET_FIX_24K.
We had offline discussion based on your comment. There is additional view on the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs do not support P5600. 
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is implemented separately, and hence, as you suggested, P5600 Ld-ST bonding optimization should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2, mips32r3 and mips32r5 when P5600 is enabled, or the compilation should continue by emitting warning and disabling P5600?
Also, the optimization will be enabled only if !TARGET_FIX_24K && !TARGET_MICTOMIPS as suggested by you.

> > +
> > +#define ENABLE_LD_ST_PAIRING \
> > +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)
> 
> The patch requires -mld-st-pairing to be passed explicitly even for -
> mtune=p5600.  Is that because it's not a consistent enough win for us to
> enable it by default?  It sounded from the description like it should be an
> improvement more often that not.
> 
> We should allow pairing even without -mtune=p5600.
Performance testing for this patch is not yet done. 
If the patch proves beneficial in most of the testcases (which we believe will do on P5600) we will enable this optimization by default for P5600 - in which case this option can be removed.

> 
> Are QImodes not paired in the same way?  If so, it'd be worth adding a
> comment above the define_mode_iterator saying that QI is deliberately
> excluded.
The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF modes only. Hence QI mode is excluded. I will add the comment on the iterator.

- Thanks and regards,
   Sameera D.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2014-06-23  9:18   ` Sameera Deshpande
@ 2014-06-23  9:41     ` Richard Sandiford
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2014-06-23  9:41 UTC (permalink / raw)
  To: Sameera Deshpande; +Cc: Matthew Fortune, gcc-patches

Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
>> > +  if (TARGET_FIX_24K && TUNE_P5600)
>> > +    error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
>> > +
>> >    /* Save the base compression state and process flags as though we
>> >       were generating uncompressed code.  */
>> >    mips_base_compression_flags = TARGET_COMPRESSION;
>> 
>> Although it's a bit of an odd combination, we need to accept -mfix-24k -
>> mtune=p5600 and continue to implement the 24k workarounds.
>> The idea is that a distributor can build for a common base architecture, add -
>> mfix- options for processors that might run the code, and add -mtune= for
>> the processor that's most of interest optimisation-wise.
>> 
>> We should just make the pairing of stores conditional on !TARGET_FIX_24K.
> We had offline discussion based on your comment. There is additional
> view on the same.
> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs
> do not support P5600.
> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
> implemented separately, and hence, as you suggested, P5600 Ld-ST bonding
> optimization should not be enabled for them.
> So, is it fine if I emit error for any ISAs other than mips32r2,
> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
> continue by emitting warning and disabling P5600?

No, the point is that we have two separate concepts: ISA and optimisation
target.  -mipsN and -march=N control the ISA (which instructions are
available) and -mtune=M controls optimisation decisions within the
constraints of that N, such as scheduling and the cost of things like
multiplication and division.

E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS
II-compatible code, optimise it for p5600, but make sure that 24k
workarounds are used.  The code would run correctly on any MIPS
II-compatible processor without known errata and also on the 24k.

>> > +
>> > +#define ENABLE_LD_ST_PAIRING \
>> > +  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)
>> 
>> The patch requires -mld-st-pairing to be passed explicitly even for -
>> mtune=p5600.  Is that because it's not a consistent enough win for us to
>> enable it by default?  It sounded from the description like it should be an
>> improvement more often that not.
>> 
>> We should allow pairing even without -mtune=p5600.
> Performance testing for this patch is not yet done. 
> If the patch proves beneficial in most of the testcases (which we
> believe will do on P5600) we will enable this optimization by default
> for P5600 - in which case this option can be removed.

OK.  Sending the patch for comments before performance testing is fine,
but I think it'd be better to commit the patch only after the testing
is done, since otherwise the patch might need to be tweaked.

I don't see any problem with keeping the option in case people want to
experiment with it.  I just think the patch should only go in once it
can be enabled by default for p5600.  I.e. the option would exist to
turn off the pairing.

Not having the option is fine too of course.

>> Are QImodes not paired in the same way?  If so, it'd be worth adding a
>> comment above the define_mode_iterator saying that QI is deliberately
>> excluded.
> The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF
> modes only. Hence QI mode is excluded. I will add the comment on the
> iterator.

Thanks.

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH][MIPS] Enable load-load/store-store bonding
  2014-06-21 16:25 ` [PATCH][MIPS] Enable load-load/store-store bonding Richard Sandiford
  2014-06-23  9:18   ` Sameera Deshpande
@ 2014-06-24 10:42   ` Sameera Deshpande
  2015-03-30 11:28     ` sameera
  1 sibling, 1 reply; 15+ messages in thread
From: Sameera Deshpande @ 2014-06-24 10:42 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Matthew Fortune, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5412 bytes --]

Hi Richard,

Thanks for the review.
Please find attached updated patch after your review comments.

Changelog:
gcc/
	* config/mips/mips.md (JOIN_MODE): New mode iterator.
	(join2_load_Store<JOIN_MODE:mode>): New pattern.
	(join2_loadhi): Likewise.
	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
	load-load and store-stores.
	* config/mips/mips.opt (mload-store-pairs): New option.
	(TARGET_LOAD_STORE_PAIRS): New macro.
	*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
	*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
	*config/mips/mips.c(mips_load_store_bonding_p): New function.

The change is tested with dejagnu with additional options -mload-store-pairs and -mtune=p5600.
The perf measurement is yet to finish.

> > We had offline discussion based on your comment. There is additional
> > view on the same.
> > Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
> > ISAs do not support P5600.
> > For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
> > implemented separately, and hence, as you suggested, P5600 Ld-ST
> > bonding optimization should not be enabled for them.
> > So, is it fine if I emit error for any ISAs other than mips32r2,
> > mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
> > continue by emitting warning and disabling P5600?
> 
> No, the point is that we have two separate concepts: ISA and optimisation
> target.  -mipsN and -march=N control the ISA (which instructions are
> available) and -mtune=M controls optimisation decisions within the
> constraints of that N, such as scheduling and the cost of things like
> multiplication and division.
> 
> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
> compatible code, optimise it for p5600, but make sure that 24k workarounds
> are used.  The code would run correctly on any MIPS II-compatible processor
> without known errata and also on the 24k.
Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific patterns to be matched.

> > +
> > +mld-st-pairing
> > +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
> > +pairing
> 
> Other options are just "TARGET_" + the captialised form of the option name,
> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
> misleading since it's an abbreviation for "load" rather than the LD instruction.
> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
> Not sure that's a great suggestion though.
Renamed the option and corresponding macro as suggested.

> > Performance testing for this patch is not yet done.
> > If the patch proves beneficial in most of the testcases (which we
> > believe will do on P5600) we will enable this optimization by default
> > for P5600 - in which case this option can be removed.
> 
> OK.  Sending the patch for comments before performance testing is fine, but
> I think it'd be better to commit the patch only after the testing is done, since
> otherwise the patch might need to be tweaked.
> 
> I don't see any problem with keeping the option in case people want to
> experiment with it.  I just think the patch should only go in once it can be
> enabled by default for p5600.  I.e. the option would exist to turn off the
> pairing.
> 
> Not having the option is fine too of course.
Yes, after perf analysis, I will share the results across, and then depending upon the impact, the decision can be made - whether to make the option as default or not, and then the patch will be submitted.

> We should allow pairing even without -mtune=p5600.
The load-store pairing is currently attribute of P5600, so I have not enabled the pairing without mtune=5600. If need be, can enable that without mtune=p5600.

> 
> (define_mode_iterator JOIN_MODE [
>   SI
>   (DI "TARGET_64BIT")
>   (SF "TARGET_HARD_FLOAT")
>   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>
Done this change.
 
> and then extend:
> 
> > @@ -883,6 +884,8 @@
> >  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
> > (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
> >
> > +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
> > +
> >  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
> > ;; are different.  Some forms of unextended addiu have an 8-bit
> > immediate  ;; field but the equivalent daddiu has only a 5-bit field.
> 
> this accordingly.
In order to allow d/f for both register classes, the pattern join2_load_store<mode> was altered a bit which eliminated this mode iterator.

> 
> Outer (parallel ...)s are redundant in a define_insn.
Removed.

> 
> It would be better to add the mips_load_store_insns for each operand
> rather than multiplying one of them by 2.  Or see the next bit for an
> alternative.
Using the alternative method as you suggested, so this change is not needed.

> Please instead add HI to the define_mode_iterator so that we can use the
> same peephole and define_insn.
Added HI in the mode iterator to eliminate join2_storehi pattern and corresponding peephole2.
As arithmetic operations on HImode is not supported, we generate zero or sign extended loads in such cases. 
To handle that case, join2_loadhi pattern is kept.

- Thanks and regards,
   Sameera D.


[-- Attachment #2: load-store-pairing.patch --]
[-- Type: application/octet-stream, Size: 6630 bytes --]

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 3c3be1c..84a6653 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -372,6 +372,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, enum machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index b5b5ba7..e1b2864 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19764,6 +19764,59 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  long int offset1, offset2;
+
+  if (load_p)
+    {
+      reg1 = operands[0];
+      reg2 = operands[2];
+      mem1 = operands[1];
+      mem2 = operands[3];
+    }
+  else
+    {
+      reg1 = operands[1];
+      reg2 = operands[3];
+      mem1 = operands[0];
+      mem2 = operands[2];
+    }
+
+  if (!mips_address_insns (XEXP (mem1, 0), mode, false)
+      || !mips_address_insns (XEXP (mem2, 0), mode, false))
+    return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match. */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+    return false;
+
+  /* Either of the loads is clobbering base register. */
+  if (load_p
+      && (REGNO (reg1) == REGNO (base1)
+          || (REGNO (reg2) == REGNO (base1))))
+    return false;
+
+  /* Loading in same registers. */
+  if (load_p
+      && REGNO (reg1) == REGNO (reg2))
+    return false;
+
+  /* The loads/stores are not of same type. */
+  if (REGNO_REG_CLASS (REGNO (reg1)) != REGNO_REG_CLASS (REGNO (reg2)))
+    return false;
+
+  if (abs(offset1 - offset2) != GET_MODE_SIZE (mode))
+    return false;
+
+  return true;
+}
+
 /* OPERANDS describes the operands to a pair of SETs, in the order
    dest1, src1, dest2, src2.  Return true if the operands can be used
    in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 86ca419..44a127e 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3184,3 +3184,7 @@ extern GTY(()) struct target_globals *mips16_globals;
    with arguments ARGS.  */
 #define PMODE_INSN(NAME, ARGS) \
   (Pmode == SImode ? NAME ## _si ARGS : NAME ## _di ARGS)
+
+#define ENABLE_P5600_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 && \
+   !TARGET_MICROMIPS && !TARGET_FIX_24K)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 7229e8f..4865022 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -780,6 +780,11 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOIN_MODE [
+				 HI
+				 SI
+				 (SF "TARGET_HARD_FLOAT")
+				 (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -7442,6 +7447,80 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+(define_insn "*join2_load_store<JOIN_MODE:mode>"
+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
+  "ENABLE_P5600_LD_ST_PAIRS && reload_completed"
+  {
+	output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
+	output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
+	return "";
+  }
+  [(set_attr "move_type" "load,fpload,store,fpstore")
+   (set_attr "insn_count" "2,2,2,2")])
+
+;; 2 HI/SI/SF/DF loads are joined.
+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "register_operand")
+	(match_operand:JOIN_MODE 1 "memory_operand"))
+   (set (match_operand:JOIN_MODE 2 "register_operand")
+	(match_operand:JOIN_MODE 3 "memory_operand"))]
+  "ENABLE_P5600_LD_ST_PAIRS && 
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; 2 HI/SI/SF/DF stores are joined.
+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
+	(match_operand:JOIN_MODE 1 "register_operand"))
+   (set (match_operand:JOIN_MODE 2 "memory_operand")
+	(match_operand:JOIN_MODE 3 "register_operand"))]
+  "ENABLE_P5600_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 1 "memory_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 3 "memory_operand" "m")))]
+  "ENABLE_P5600_LD_ST_PAIRS && reload_completed"
+  {
+    output_asm_insn ("lh<u>\t%0,%1", operands);
+    output_asm_insn ("lh<u>\t%2,%3", operands);
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])
+
+
+;; 2 16 bit integer loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (match_operand:HI 1 "memory_operand")))
+   (set (match_operand:SI 2 "register_operand")
+	(any_extend:SI (match_operand:HI 3 "memory_operand")))]
+  "ENABLE_P5600_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, HImode, true)"
+  [(parallel [(set (match_dup 0)
+		   (any_extend:SI (match_dup 1)))
+	      (set (match_dup 2)
+		   (any_extend:SI (match_dup 3)))])]
+  "")
+
 \f
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index b9cfd62..3a9488f 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -445,3 +445,7 @@ Enum(mips_lib_setting) String(tiny) Value(MIPS_LIB_TINY)
 
 msched-weight
 Target Report Var(TARGET_SCHED_WEIGHT) Undocumented
+
+mload-store-pairs
+Target Report Var(TARGET_LOAD_STORE_PAIRS)
+Enable load/store bonding.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2014-06-24 10:42   ` Sameera Deshpande
@ 2015-03-30 11:28     ` sameera
  2015-04-20  5:09       ` sameera
  0 siblings, 1 reply; 15+ messages in thread
From: sameera @ 2015-03-30 11:28 UTC (permalink / raw)
  To: Matthew Fortune, clm, echristo; +Cc: Richard Sandiford, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6783 bytes --]

Hi!

Sorry for delay in sending this patch for review.
Please find attached updated patch.

In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch 
single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed 
for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby 
guaranteeing h/w level load/store bonding.

The patch is tested with dejagnu for correctness, and tested on hardware for performance.
Ok for trunk?

Changelog:
gcc/
         * config/mips/mips.md (JOIN_MODE): New mode iterator.
	(join2_load_Store<JOIN_MODE:mode>): New pattern.
	(join2_loadhi): Likewise.
	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
	load-load and store-stores.
	* config/mips/mips.opt (mload-store-pairs): New option.
	(TARGET_LOAD_STORE_PAIRS): New macro.
	*config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
	*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
	*config/mips/mips.c(mips_load_store_bonding_p): New function.

- Thanks and regards,
   Sameera D.

On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
> Hi Richard,
>
> Thanks for the review.
> Please find attached updated patch after your review comments.
>
> Changelog:
> gcc/
> 	* config/mips/mips.md (JOIN_MODE): New mode iterator.
> 	(join2_load_Store<JOIN_MODE:mode>): New pattern.
> 	(join2_loadhi): Likewise.
> 	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
> 	load-load and store-stores.
> 	* config/mips/mips.opt (mload-store-pairs): New option.
> 	(TARGET_LOAD_STORE_PAIRS): New macro.
> 	*config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
> 	*config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
> 	*config/mips/mips.c(mips_load_store_bonding_p): New function.
>
> The change is tested with dejagnu with additional options -mload-store-pairs and -mtune=p5600.
> The perf measurement is yet to finish.
>
>>> We had offline discussion based on your comment. There is additional
>>> view on the same.
>>> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
>>> ISAs do not support P5600.
>>> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
>>> implemented separately, and hence, as you suggested, P5600 Ld-ST
>>> bonding optimization should not be enabled for them.
>>> So, is it fine if I emit error for any ISAs other than mips32r2,
>>> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
>>> continue by emitting warning and disabling P5600?
>>
>> No, the point is that we have two separate concepts: ISA and optimisation
>> target.  -mipsN and -march=N control the ISA (which instructions are
>> available) and -mtune=M controls optimisation decisions within the
>> constraints of that N, such as scheduling and the cost of things like
>> multiplication and division.
>>
>> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
>> compatible code, optimise it for p5600, but make sure that 24k workarounds
>> are used.  The code would run correctly on any MIPS II-compatible processor
>> without known errata and also on the 24k.
> Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific patterns to be matched.
>
>>> +
>>> +mld-st-pairing
>>> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
>>> +pairing
>>
>> Other options are just "TARGET_" + the captialised form of the option name,
>> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
>> misleading since it's an abbreviation for "load" rather than the LD instruction.
>> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
>> Not sure that's a great suggestion though.
> Renamed the option and corresponding macro as suggested.
>
>>> Performance testing for this patch is not yet done.
>>> If the patch proves beneficial in most of the testcases (which we
>>> believe will do on P5600) we will enable this optimization by default
>>> for P5600 - in which case this option can be removed.
>>
>> OK.  Sending the patch for comments before performance testing is fine, but
>> I think it'd be better to commit the patch only after the testing is done, since
>> otherwise the patch might need to be tweaked.
>>
>> I don't see any problem with keeping the option in case people want to
>> experiment with it.  I just think the patch should only go in once it can be
>> enabled by default for p5600.  I.e. the option would exist to turn off the
>> pairing.
>>
>> Not having the option is fine too of course.
> Yes, after perf analysis, I will share the results across, and then depending upon the impact, the decision can be made - whether to make the option as default or not, and then the patch will be submitted.
>
>> We should allow pairing even without -mtune=p5600.
> The load-store pairing is currently attribute of P5600, so I have not enabled the pairing without mtune=5600. If need be, can enable that without mtune=p5600.
>
>>
>> (define_mode_iterator JOIN_MODE [
>>    SI
>>    (DI "TARGET_64BIT")
>>    (SF "TARGET_HARD_FLOAT")
>>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>>
> Done this change.
>
>> and then extend:
>>
>>> @@ -883,6 +884,8 @@
>>>   (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>>> (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>>>
>>> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
>>> +
>>>   ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>>> ;; are different.  Some forms of unextended addiu have an 8-bit
>>> immediate  ;; field but the equivalent daddiu has only a 5-bit field.
>>
>> this accordingly.
> In order to allow d/f for both register classes, the pattern join2_load_store<mode> was altered a bit which eliminated this mode iterator.
>
>>
>> Outer (parallel ...)s are redundant in a define_insn.
> Removed.
>
>>
>> It would be better to add the mips_load_store_insns for each operand
>> rather than multiplying one of them by 2.  Or see the next bit for an
>> alternative.
> Using the alternative method as you suggested, so this change is not needed.
>
>> Please instead add HI to the define_mode_iterator so that we can use the
>> same peephole and define_insn.
> Added HI in the mode iterator to eliminate join2_storehi pattern and corresponding peephole2.
> As arithmetic operations on HImode is not supported, we generate zero or sign extended loads in such cases.
> To handle that case, join2_loadhi pattern is kept.
>
> - Thanks and regards,
>     Sameera D.
>

[-- Attachment #2: load-store-bonding.patch --]
[-- Type: text/x-patch, Size: 7777 bytes --]

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1733457..85f0591 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+    {
+      reg1 = operands[0];
+      reg2 = operands[2];
+      mem1 = operands[1];
+      mem2 = operands[3];
+    }
+  else
+    {
+      reg1 = operands[1];
+      reg2 = operands[3];
+      mem1 = operands[0];
+      mem2 = operands[2];
+    }
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+    return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+    return false;
+
+  /* Either of the loads is clobbering base register.  */
+  if (load_p
+      && (REGNO (reg1) == REGNO (base1)
+	  || (REGNO (reg2) == REGNO (base1))))
+    return false;
+
+  /* Loading in same registers.  */
+  if (load_p
+      && REGNO (reg1) == REGNO (reg2))
+    return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
+  if (rc1 != rc2
+      && !reg_class_subset_p (rc1, rc2)
+      && !reg_class_subset_p (rc2, rc1))
+    return false;
+
+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
+    return false;
+
+  return true;
+}
+
 /* OPERANDS describes the operands to a pair of SETs, in the order
    dest1, src1, dest2, src2.  Return true if the operands can be used
    in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index ec69ed5..1bd0dae 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
 #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
 #endif
+
+#define ENABLE_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 3672c0b..cfdb750 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -754,6 +754,11 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOIN_MODE [HI
+				 SI
+				 (SF "TARGET_HARD_FLOAT")
+				 (DF "TARGET_HARD_FLOAT
+				      && TARGET_DOUBLE_FLOAT")])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -7419,6 +7424,108 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+(define_insn "*join2_load_store<JOIN_MODE:mode>"
+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    bool load_p = (which_alternative == 0 || which_alternative == 1);
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+      }
+    else
+      {
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+      }
+    return "";
+  }
+  [(set_attr "move_type" "load,fpload,store,fpstore")
+   (set_attr "insn_count" "2,2,2,2")])
+
+;; 2 HI/SI/SF/DF loads are joined.
+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "register_operand")
+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
+   (set (match_operand:JOIN_MODE 2 "register_operand")
+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; 2 HI/SI/SF/DF stores are joined.
+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
+	(match_operand:JOIN_MODE 1 "register_operand"))
+   (set (match_operand:JOIN_MODE 2 "memory_operand")
+	(match_operand:JOIN_MODE 3 "register_operand"))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+      }
+    else
+      {
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+      }
+
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])
+
+
+;; 2 16 bit integer loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
+   (set (match_operand:SI 2 "register_operand")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
+  "ENABLE_LD_ST_PAIRS &&
+   mips_load_store_bonding_p (operands, HImode, true)"
+  [(parallel [(set (match_dup 0)
+		   (any_extend:SI (match_dup 1)))
+	      (set (match_dup 2)
+		   (any_extend:SI (match_dup 3)))])]
+  "")
+
 \f
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 9e89aa9..a9baebe 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
 
 noasmopt
 Driver
+
+mload-store-pairs
+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
+Enable load/store bonding.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-03-30 11:28     ` sameera
@ 2015-04-20  5:09       ` sameera
  2015-04-20 18:30         ` Mike Stump
  2015-04-20 19:09         ` Matthew Fortune
  0 siblings, 2 replies; 15+ messages in thread
From: sameera @ 2015-04-20  5:09 UTC (permalink / raw)
  To: Matthew Fortune, clm, echristo; +Cc: Richard Sandiford, gcc-patches

Gentle reminder!

- Thanks and regards,
   Sameera D.

On Monday 30 March 2015 04:58 PM, sameera wrote:
> Hi!
>
> Sorry for delay in sending this patch for review.
> Please find attached updated patch.
>
> In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch
> single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed
> for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.
>
> This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby
> guaranteeing h/w level load/store bonding.
>
> The patch is tested with dejagnu for correctness, and tested on hardware for performance.
> Ok for trunk?
>
> Changelog:
> gcc/
>          * config/mips/mips.md (JOIN_MODE): New mode iterator.
>      (join2_load_Store<JOIN_MODE:mode>): New pattern.
>      (join2_loadhi): Likewise.
>      (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>      load-load and store-stores.
>      * config/mips/mips.opt (mload-store-pairs): New option.
>      (TARGET_LOAD_STORE_PAIRS): New macro.
>      *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>      *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>      *config/mips/mips.c(mips_load_store_bonding_p): New function.
>
> - Thanks and regards,
>    Sameera D.
>
> On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
>> Hi Richard,
>>
>> Thanks for the review.
>> Please find attached updated patch after your review comments.
>>
>> Changelog:
>> gcc/
>>     * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>     (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>     (join2_loadhi): Likewise.
>>     (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>     load-load and store-stores.
>>     * config/mips/mips.opt (mload-store-pairs): New option.
>>     (TARGET_LOAD_STORE_PAIRS): New macro.
>>     *config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
>>     *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>     *config/mips/mips.c(mips_load_store_bonding_p): New function.
>>
>> The change is tested with dejagnu with additional options -mload-store-pairs and -mtune=p5600.
>> The perf measurement is yet to finish.
>>
>>>> We had offline discussion based on your comment. There is additional
>>>> view on the same.
>>>> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
>>>> ISAs do not support P5600.
>>>> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
>>>> implemented separately, and hence, as you suggested, P5600 Ld-ST
>>>> bonding optimization should not be enabled for them.
>>>> So, is it fine if I emit error for any ISAs other than mips32r2,
>>>> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
>>>> continue by emitting warning and disabling P5600?
>>>
>>> No, the point is that we have two separate concepts: ISA and optimisation
>>> target.  -mipsN and -march=N control the ISA (which instructions are
>>> available) and -mtune=M controls optimisation decisions within the
>>> constraints of that N, such as scheduling and the cost of things like
>>> multiplication and division.
>>>
>>> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
>>> compatible code, optimise it for p5600, but make sure that 24k workarounds
>>> are used.  The code would run correctly on any MIPS II-compatible processor
>>> without known errata and also on the 24k.
>> Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific patterns to be matched.
>>
>>>> +
>>>> +mld-st-pairing
>>>> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
>>>> +pairing
>>>
>>> Other options are just "TARGET_" + the captialised form of the option name,
>>> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
>>> misleading since it's an abbreviation for "load" rather than the LD instruction.
>>> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
>>> Not sure that's a great suggestion though.
>> Renamed the option and corresponding macro as suggested.
>>
>>>> Performance testing for this patch is not yet done.
>>>> If the patch proves beneficial in most of the testcases (which we
>>>> believe will do on P5600) we will enable this optimization by default
>>>> for P5600 - in which case this option can be removed.
>>>
>>> OK.  Sending the patch for comments before performance testing is fine, but
>>> I think it'd be better to commit the patch only after the testing is done, since
>>> otherwise the patch might need to be tweaked.
>>>
>>> I don't see any problem with keeping the option in case people want to
>>> experiment with it.  I just think the patch should only go in once it can be
>>> enabled by default for p5600.  I.e. the option would exist to turn off the
>>> pairing.
>>>
>>> Not having the option is fine too of course.
>> Yes, after perf analysis, I will share the results across, and then depending upon the impact, the decision can be made - whether to make the option
>> as default or not, and then the patch will be submitted.
>>
>>> We should allow pairing even without -mtune=p5600.
>> The load-store pairing is currently attribute of P5600, so I have not enabled the pairing without mtune=5600. If need be, can enable that without
>> mtune=p5600.
>>
>>>
>>> (define_mode_iterator JOIN_MODE [
>>>    SI
>>>    (DI "TARGET_64BIT")
>>>    (SF "TARGET_HARD_FLOAT")
>>>    (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>>>
>> Done this change.
>>
>>> and then extend:
>>>
>>>> @@ -883,6 +884,8 @@
>>>>   (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>>>> (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>>>>
>>>> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
>>>> +
>>>>   ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>>>> ;; are different.  Some forms of unextended addiu have an 8-bit
>>>> immediate  ;; field but the equivalent daddiu has only a 5-bit field.
>>>
>>> this accordingly.
>> In order to allow d/f for both register classes, the pattern join2_load_store<mode> was altered a bit which eliminated this mode iterator.
>>
>>>
>>> Outer (parallel ...)s are redundant in a define_insn.
>> Removed.
>>
>>>
>>> It would be better to add the mips_load_store_insns for each operand
>>> rather than multiplying one of them by 2.  Or see the next bit for an
>>> alternative.
>> Using the alternative method as you suggested, so this change is not needed.
>>
>>> Please instead add HI to the define_mode_iterator so that we can use the
>>> same peephole and define_insn.
>> Added HI in the mode iterator to eliminate join2_storehi pattern and corresponding peephole2.
>> As arithmetic operations on HImode is not supported, we generate zero or sign extended loads in such cases.
>> To handle that case, join2_loadhi pattern is kept.
>>
>> - Thanks and regards,
>>     Sameera D.
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-04-20  5:09       ` sameera
@ 2015-04-20 18:30         ` Mike Stump
  2015-04-20 19:09         ` Matthew Fortune
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Stump @ 2015-04-20 18:30 UTC (permalink / raw)
  To: sameera; +Cc: Matthew Fortune, clm, echristo, Richard Sandiford, gcc-patches

With FUSION you might get farther.  See the arm port as I recall.

The quick overview, FUSION allows instructions that are not contiguous to be paired up and fused together.  it was built for load/load store/store combining.

On Apr 19, 2015, at 10:09 PM, sameera <sameera.deshpande@imgtec.com> wrote:
> Gentle reminder!
> 
> - Thanks and regards,
>  Sameera D.
> 
> On Monday 30 March 2015 04:58 PM, sameera wrote:
>> Hi!
>> 
>> Sorry for delay in sending this patch for review.
>> Please find attached updated patch.
>> 
>> In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch
>> single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed
>> for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.
>> 
>> This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby
>> guaranteeing h/w level load/store bonding.
>> 
>> The patch is tested with dejagnu for correctness, and tested on hardware for performance.
>> Ok for trunk?
>> 
>> Changelog:
>> gcc/
>>         * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>     (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>     (join2_loadhi): Likewise.
>>     (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>     load-load and store-stores.
>>     * config/mips/mips.opt (mload-store-pairs): New option.
>>     (TARGET_LOAD_STORE_PAIRS): New macro.
>>     *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>>     *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>     *config/mips/mips.c(mips_load_store_bonding_p): New function.
>> 
>> - Thanks and regards,
>>   Sameera D.
>> 
>> On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
>>> Hi Richard,
>>> 
>>> Thanks for the review.
>>> Please find attached updated patch after your review comments.
>>> 
>>> Changelog:
>>> gcc/
>>>    * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>>    (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>>    (join2_loadhi): Likewise.
>>>    (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>>    load-load and store-stores.
>>>    * config/mips/mips.opt (mload-store-pairs): New option.
>>>    (TARGET_LOAD_STORE_PAIRS): New macro.
>>>    *config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
>>>    *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>>    *config/mips/mips.c(mips_load_store_bonding_p): New function.
>>> 
>>> The change is tested with dejagnu with additional options -mload-store-pairs and -mtune=p5600.
>>> The perf measurement is yet to finish.
>>> 
>>>>> We had offline discussion based on your comment. There is additional
>>>>> view on the same.
>>>>> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
>>>>> ISAs do not support P5600.
>>>>> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
>>>>> implemented separately, and hence, as you suggested, P5600 Ld-ST
>>>>> bonding optimization should not be enabled for them.
>>>>> So, is it fine if I emit error for any ISAs other than mips32r2,
>>>>> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
>>>>> continue by emitting warning and disabling P5600?
>>>> 
>>>> No, the point is that we have two separate concepts: ISA and optimisation
>>>> target.  -mipsN and -march=N control the ISA (which instructions are
>>>> available) and -mtune=M controls optimisation decisions within the
>>>> constraints of that N, such as scheduling and the cost of things like
>>>> multiplication and division.
>>>> 
>>>> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
>>>> compatible code, optimise it for p5600, but make sure that 24k workarounds
>>>> are used.  The code would run correctly on any MIPS II-compatible processor
>>>> without known errata and also on the 24k.
>>> Ok, disabled the peephole pattern for fix-24k and micromips - to allow specific patterns to be matched.
>>> 
>>>>> +
>>>>> +mld-st-pairing
>>>>> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
>>>>> +pairing
>>>> 
>>>> Other options are just "TARGET_" + the captialised form of the option name,
>>>> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
>>>> misleading since it's an abbreviation for "load" rather than the LD instruction.
>>>> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
>>>> Not sure that's a great suggestion though.
>>> Renamed the option and corresponding macro as suggested.
>>> 
>>>>> Performance testing for this patch is not yet done.
>>>>> If the patch proves beneficial in most of the testcases (which we
>>>>> believe will do on P5600) we will enable this optimization by default
>>>>> for P5600 - in which case this option can be removed.
>>>> 
>>>> OK.  Sending the patch for comments before performance testing is fine, but
>>>> I think it'd be better to commit the patch only after the testing is done, since
>>>> otherwise the patch might need to be tweaked.
>>>> 
>>>> I don't see any problem with keeping the option in case people want to
>>>> experiment with it.  I just think the patch should only go in once it can be
>>>> enabled by default for p5600.  I.e. the option would exist to turn off the
>>>> pairing.
>>>> 
>>>> Not having the option is fine too of course.
>>> Yes, after perf analysis, I will share the results across, and then depending upon the impact, the decision can be made - whether to make the option
>>> as default or not, and then the patch will be submitted.
>>> 
>>>> We should allow pairing even without -mtune=p5600.
>>> The load-store pairing is currently attribute of P5600, so I have not enabled the pairing without mtune=5600. If need be, can enable that without
>>> mtune=p5600.
>>> 
>>>> 
>>>> (define_mode_iterator JOIN_MODE [
>>>>   SI
>>>>   (DI "TARGET_64BIT")
>>>>   (SF "TARGET_HARD_FLOAT")
>>>>   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>>>> 
>>> Done this change.
>>> 
>>>> and then extend:
>>>> 
>>>>> @@ -883,6 +884,8 @@
>>>>>  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>>>>> (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>>>>> 
>>>>> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
>>>>> +
>>>>>  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>>>>> ;; are different.  Some forms of unextended addiu have an 8-bit
>>>>> immediate  ;; field but the equivalent daddiu has only a 5-bit field.
>>>> 
>>>> this accordingly.
>>> In order to allow d/f for both register classes, the pattern join2_load_store<mode> was altered a bit which eliminated this mode iterator.
>>> 
>>>> 
>>>> Outer (parallel ...)s are redundant in a define_insn.
>>> Removed.
>>> 
>>>> 
>>>> It would be better to add the mips_load_store_insns for each operand
>>>> rather than multiplying one of them by 2.  Or see the next bit for an
>>>> alternative.
>>> Using the alternative method as you suggested, so this change is not needed.
>>> 
>>>> Please instead add HI to the define_mode_iterator so that we can use the
>>>> same peephole and define_insn.
>>> Added HI in the mode iterator to eliminate join2_storehi pattern and corresponding peephole2.
>>> As arithmetic operations on HImode is not supported, we generate zero or sign extended loads in such cases.
>>> To handle that case, join2_loadhi pattern is kept.
>>> 
>>> - Thanks and regards,
>>>    Sameera D.
>>> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-04-20  5:09       ` sameera
  2015-04-20 18:30         ` Mike Stump
@ 2015-04-20 19:09         ` Matthew Fortune
  2015-04-20 22:01           ` Moore, Catherine
  2015-05-11 11:05           ` sameera
  1 sibling, 2 replies; 15+ messages in thread
From: Matthew Fortune @ 2015-04-20 19:09 UTC (permalink / raw)
  To: Sameera Deshpande, clm; +Cc: Richard Sandiford, gcc-patches, echristo

Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
> Gentle reminder!

Thanks Sameera. Just a couple of comments inline below and a question
for Catherine at the end.

> - Thanks and regards,
>    Sameera D.
> 
> On Monday 30 March 2015 04:58 PM, sameera wrote:
> > Hi!
> >
> > Sorry for delay in sending this patch for review.
> > Please find attached updated patch.
> >
> > In P5600, 2 consecutive loads/stores of same type which access
> > contiguous memory locations are bonded together by instruction issue
> > unit to dispatch single load/store instruction which accesses both
> locations. This allows 2X improvement in memory intensive code. This
> optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC
> instructions.
> >
> > This patch adds peephole2 patterns to identify such loads/stores, and
> > put them in parallel, so that the scheduler will not split it -
> thereby guaranteeing h/w level load/store bonding.
> >
> > The patch is tested with dejagnu for correctness, and tested on
> hardware for performance.
> > Ok for trunk?
> >
> > Changelog:
> > gcc/
> >          * config/mips/mips.md (JOIN_MODE): New mode iterator.
> >      (join2_load_Store<JOIN_MODE:mode>): New pattern.
> >      (join2_loadhi): Likewise.
> >      (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-
> mode
> >      load-load and store-stores.
> >      * config/mips/mips.opt (mload-store-pairs): New option.
> >      (TARGET_LOAD_STORE_PAIRS): New macro.
> >      *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
> >      *config/mips/mips-protos.h (mips_load_store_bonding_p): New
> prototype.
> >      *config/mips/mips.c(mips_load_store_bonding_p): New function.

I don't know if this has been corrupted by mail clients but a single
space after '*' and a space before '('. 

>diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
>index b48e04f..244eb8d 100644
>--- a/gcc/config/mips/mips-protos.h
>+++ b/gcc/config/mips/mips-protos.h
>@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
> extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
> extern int mips_trampoline_code_size (void);
> extern void mips_function_profiler (FILE *);
>+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
> 
> typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
> #ifdef RTX_CODE
>diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>index 1733457..85f0591 100644
>--- a/gcc/config/mips/mips.c
>+++ b/gcc/config/mips/mips.c
>@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
>   return true;
> }
> 
>+bool
>+mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)

Remove enum from machine_mode.

>+{
>+  rtx reg1, reg2, mem1, mem2, base1, base2;
>+  enum reg_class rc1, rc2;
>+  HOST_WIDE_INT offset1, offset2;
>+
>+  if (load_p)
>+    {
>+      reg1 = operands[0];
>+      reg2 = operands[2];
>+      mem1 = operands[1];
>+      mem2 = operands[3];
>+    }
>+  else
>+    {
>+      reg1 = operands[1];
>+      reg2 = operands[3];
>+      mem1 = operands[0];
>+      mem2 = operands[2];
>+    }
>+
>+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
>+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
>+    return false;
>+
>+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
>+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
>+
>+  /* Base regs do not match.  */
>+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
>+    return false;
>+
>+  /* Either of the loads is clobbering base register.  */
>+  if (load_p
>+      && (REGNO (reg1) == REGNO (base1)
>+	  || (REGNO (reg2) == REGNO (base1))))
>+    return false;

Can you add a comment saying that this case does not get bonded by
any known hardware even though it could be valid to bond them if it
is the second load that clobbers the base.

>+  /* Loading in same registers.  */
>+  if (load_p
>+      && REGNO (reg1) == REGNO (reg2))
>+    return false;
>+
>+  /* The loads/stores are not of same type.  */
>+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
>+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
>+  if (rc1 != rc2
>+      && !reg_class_subset_p (rc1, rc2)
>+      && !reg_class_subset_p (rc2, rc1))
>+    return false;
>+
>+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
>+    return false;
>+
>+  return true;
>+}
>+
> /* OPERANDS describes the operands to a pair of SETs, in the order
>    dest1, src1, dest2, src2.  Return true if the operands can be used
>    in an LWP or SWP instruction; LOAD_P says which.  */
>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>index ec69ed5..1bd0dae 100644
>--- a/gcc/config/mips/mips.h
>+++ b/gcc/config/mips/mips.h
>@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
> #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
> #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
> #endif
>+
>+#define ENABLE_LD_ST_PAIRS \
>+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
>+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)

I've already forgotten why these extra micromips/fix24k conditions
were required. Can you add a comment?

>diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
>index 3672c0b..cfdb750 100644
>--- a/gcc/config/mips/mips.md
>+++ b/gcc/config/mips/mips.md
>@@ -754,6 +754,11 @@
> 
> (define_mode_iterator MOVEP1 [SI SF])
> (define_mode_iterator MOVEP2 [SI SF])
>+(define_mode_iterator JOIN_MODE [HI
>+				 SI
>+				 (SF "TARGET_HARD_FLOAT")
>+				 (DF "TARGET_HARD_FLOAT
>+				      && TARGET_DOUBLE_FLOAT")])
> 
> ;; This mode iterator allows :HILO to be used as the mode of the
> ;; concatenated HI and LO registers.
>@@ -7419,6 +7424,108 @@
>   { return MIPS_CALL ("jal", operands, 0, -1); }
>   [(set_attr "type" "call")
>    (set_attr "insn_count" "3")])
>+
>+(define_insn "*join2_load_store<JOIN_MODE:mode>"
>+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
>+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
>+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
>+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
>+  "ENABLE_LD_ST_PAIRS && reload_completed"
>+  {
>+    bool load_p = (which_alternative == 0 || which_alternative == 1);
>+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>+       Hardware does not bond those loads, even when they are consecutive.
>+       However, order of the loads need to be checked for correctness.  */
>+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
>+      {
>+	output_asm_insn (mips_output_move (operands[0], operands[1]),
>+			 operands);
>+	output_asm_insn (mips_output_move (operands[2], operands[3]),
>+			 &operands[2]);
>+      }
>+    else
>+      {
>+	output_asm_insn (mips_output_move (operands[2], operands[3]),
>+			 &operands[2]);
>+	output_asm_insn (mips_output_move (operands[0], operands[1]),
>+			 operands);
>+      }
>+    return "";
>+  }
>+  [(set_attr "move_type" "load,fpload,store,fpstore")
>+   (set_attr "insn_count" "2,2,2,2")])
>+
>+;; 2 HI/SI/SF/DF loads are joined.
>+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
>+(define_peephole2
>+  [(set (match_operand:JOIN_MODE 0 "register_operand")
>+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
>+   (set (match_operand:JOIN_MODE 2 "register_operand")
>+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]

Please can you comment that the loads must be non-volatile as they may get
re-ordered.

>+  "ENABLE_LD_ST_PAIRS &&

&& on the next line

>+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
>+  [(parallel [(set (match_dup 0)
>+		   (match_dup 1))
>+	      (set (match_dup 2)
>+		   (match_dup 3))])]
>+  "")
>+
>+;; 2 HI/SI/SF/DF stores are joined.
>+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
>+(define_peephole2
>+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
>+	(match_operand:JOIN_MODE 1 "register_operand"))
>+   (set (match_operand:JOIN_MODE 2 "memory_operand")
>+	(match_operand:JOIN_MODE 3 "register_operand"))]
>+  "ENABLE_LD_ST_PAIRS &&

&& on the next line

>+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
>+  [(parallel [(set (match_dup 0)
>+		   (match_dup 1))
>+	      (set (match_dup 2)
>+		   (match_dup 3))])]
>+  "")
>+
>+(define_insn "*join2_loadhi"
>+  [(set (match_operand:SI 0 "register_operand" "=r")
>+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
>+   (set (match_operand:SI 2 "register_operand" "=r")
>+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
>+  "ENABLE_LD_ST_PAIRS && reload_completed"
>+  {
>+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>+       Hardware does not bond those loads, even when they are consecutive.
>+       However, order of the loads need to be checked for correctness.  */
>+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
>+      {
>+	output_asm_insn ("lh<u>\t%0,%1", operands);
>+	output_asm_insn ("lh<u>\t%2,%3", operands);
>+      }
>+    else
>+      {
>+	output_asm_insn ("lh<u>\t%2,%3", operands);
>+	output_asm_insn ("lh<u>\t%0,%1", operands);
>+      }
>+
>+    return "";
>+  }
>+  [(set_attr "move_type" "load")
>+   (set_attr "insn_count" "2")])
>+
>+
>+;; 2 16 bit integer loads are joined.

2 HI mode loads

>+(define_peephole2
>+  [(set (match_operand:SI 0 "register_operand")
>+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
>+   (set (match_operand:SI 2 "register_operand")
>+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
>+  "ENABLE_LD_ST_PAIRS &&
>+   mips_load_store_bonding_p (operands, HImode, true)"
>+  [(parallel [(set (match_dup 0)
>+		   (any_extend:SI (match_dup 1)))
>+	      (set (match_dup 2)
>+		   (any_extend:SI (match_dup 3)))])]
>+  "")
>+
> 
> ;; Synchronization instructions.
> 
>diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
>index 9e89aa9..a9baebe 100644
>--- a/gcc/config/mips/mips.opt
>+++ b/gcc/config/mips/mips.opt
>@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
> 
> noasmopt
> Driver
>+
>+mload-store-pairs
>+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
>+Enable load/store bonding.

Catherine: We have this option in place just as a get-out clause if
there are any side effects that have been missed in this patch such
that you can still tune for p5600 but with bonding disabled. Do you
think this is OK? I'm not completely against this being either
undocumented or removed entirely.

Sameera: Assuming we keep it then it needs adding to the invoke doc.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-04-20 19:09         ` Matthew Fortune
@ 2015-04-20 22:01           ` Moore, Catherine
  2015-05-11 11:05           ` sameera
  1 sibling, 0 replies; 15+ messages in thread
From: Moore, Catherine @ 2015-04-20 22:01 UTC (permalink / raw)
  To: Matthew Fortune, Sameera Deshpande
  Cc: Richard Sandiford, gcc-patches, echristo



> -----Original Message-----
> From: Matthew Fortune [mailto:Matthew.Fortune@imgtec.com]
> Sent: Monday, April 20, 2015 3:10 PM
> To: Sameera Deshpande; Moore, Catherine
> Cc: Richard Sandiford; gcc-patches@gcc.gnu.org; echristo@gmail.com
> Subject: RE: [PATCH][MIPS] Enable load-load/store-store bonding
> 
> Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
> > Gentle reminder!
> 
> Thanks Sameera. Just a couple of comments inline below and a question for
> Catherine at the end.
> 
> > - Thanks and regards,
> >    Sameera D.
> >
> > On Monday 30 March 2015 04:58 PM, sameera wrote:
> > > Hi!
> > >
> > > Sorry for delay in sending this patch for review.
> > > Please find attached updated patch.
> > >
> > > In P5600, 2 consecutive loads/stores of same type which access
> > > contiguous memory locations are bonded together by instruction issue
> > > unit to dispatch single load/store instruction which accesses both
> > locations. This allows 2X improvement in memory intensive code. This
> > optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC
> > instructions.
> > >
> > > This patch adds peephole2 patterns to identify such loads/stores,
> > > and put them in parallel, so that the scheduler will not split it -
> > thereby guaranteeing h/w level load/store bonding.
> > >
> > > The patch is tested with dejagnu for correctness, and tested on
> > hardware for performance.
> > > Ok for trunk?
> > >
> > > Changelog:
> > > gcc/
> > >          * config/mips/mips.md (JOIN_MODE): New mode iterator.
> > >      (join2_load_Store<JOIN_MODE:mode>): New pattern.
> > >      (join2_loadhi): Likewise.
> > >      (define_peehole2): Add peephole2 patterns to join 2
> > > HI/SI/SF/DF-
> > mode
> > >      load-load and store-stores.
> > >      * config/mips/mips.opt (mload-store-pairs): New option.
> > >      (TARGET_LOAD_STORE_PAIRS): New macro.
> > >      *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
> > >      *config/mips/mips-protos.h (mips_load_store_bonding_p): New
> > prototype.
> > >      *config/mips/mips.c(mips_load_store_bonding_p): New function.
> 
> I don't know if this has been corrupted by mail clients but a single space after
> '*' and a space before '('.
> 
> >diff --git a/gcc/config/mips/mips-protos.h
> >b/gcc/config/mips/mips-protos.h index b48e04f..244eb8d 100644
> >--- a/gcc/config/mips/mips-protos.h
> >+++ b/gcc/config/mips/mips-protos.h
> >@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
> >extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);  extern
> >int mips_trampoline_code_size (void);  extern void
> >mips_function_profiler (FILE *);
> >+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
> >
> > typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);  #ifdef RTX_CODE diff
> >--git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
> >1733457..85f0591 100644
> >--- a/gcc/config/mips/mips.c
> >+++ b/gcc/config/mips/mips.c
> >@@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p,
> bool swap_p,
> >   return true;
> > }
> >
> >+bool
> >+mips_load_store_bonding_p (rtx *operands, enum machine_mode
> mode, bool
> >+load_p)
> 
> Remove enum from machine_mode.
> 
> >+{
> >+  rtx reg1, reg2, mem1, mem2, base1, base2;
> >+  enum reg_class rc1, rc2;
> >+  HOST_WIDE_INT offset1, offset2;
> >+
> >+  if (load_p)
> >+    {
> >+      reg1 = operands[0];
> >+      reg2 = operands[2];
> >+      mem1 = operands[1];
> >+      mem2 = operands[3];
> >+    }
> >+  else
> >+    {
> >+      reg1 = operands[1];
> >+      reg2 = operands[3];
> >+      mem1 = operands[0];
> >+      mem2 = operands[2];
> >+    }
> >+
> >+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
> >+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
> >+    return false;
> >+
> >+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);  mips_split_plus
> >+ (XEXP (mem2, 0), &base2, &offset2);
> >+
> >+  /* Base regs do not match.  */
> >+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
> >+    return false;
> >+
> >+  /* Either of the loads is clobbering base register.  */
> >+  if (load_p
> >+      && (REGNO (reg1) == REGNO (base1)
> >+	  || (REGNO (reg2) == REGNO (base1))))
> >+    return false;
> 
> Can you add a comment saying that this case does not get bonded by any
> known hardware even though it could be valid to bond them if it is the
> second load that clobbers the base.
> 
> >+  /* Loading in same registers.  */
> >+  if (load_p
> >+      && REGNO (reg1) == REGNO (reg2))
> >+    return false;
> >+
> >+  /* The loads/stores are not of same type.  */
> >+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
> >+  rc2 = REGNO_REG_CLASS (REGNO (reg2));  if (rc1 != rc2
> >+      && !reg_class_subset_p (rc1, rc2)
> >+      && !reg_class_subset_p (rc2, rc1))
> >+    return false;
> >+
> >+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
> >+    return false;
> >+
> >+  return true;
> >+}
> >+
> > /* OPERANDS describes the operands to a pair of SETs, in the order
> >    dest1, src1, dest2, src2.  Return true if the operands can be used
> >    in an LWP or SWP instruction; LOAD_P says which.  */ diff --git
> >a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index
> >ec69ed5..1bd0dae 100644
> >--- a/gcc/config/mips/mips.h
> >+++ b/gcc/config/mips/mips.h
> >@@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals
> >*mips16_globals;  #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
> > #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
> > #endif
> >+
> >+#define ENABLE_LD_ST_PAIRS \
> >+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
> >+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
> 
> I've already forgotten why these extra micromips/fix24k conditions were
> required. Can you add a comment?
> 
> >diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index
> >3672c0b..cfdb750 100644
> >--- a/gcc/config/mips/mips.md
> >+++ b/gcc/config/mips/mips.md
> >@@ -754,6 +754,11 @@
> >
> > (define_mode_iterator MOVEP1 [SI SF])
> > (define_mode_iterator MOVEP2 [SI SF])
> >+(define_mode_iterator JOIN_MODE [HI
> >+				 SI
> >+				 (SF "TARGET_HARD_FLOAT")
> >+				 (DF "TARGET_HARD_FLOAT
> >+				      && TARGET_DOUBLE_FLOAT")])
> >
> > ;; This mode iterator allows :HILO to be used as the mode of the  ;;
> >concatenated HI and LO registers.
> >@@ -7419,6 +7424,108 @@
> >   { return MIPS_CALL ("jal", operands, 0, -1); }
> >   [(set_attr "type" "call")
> >    (set_attr "insn_count" "3")])
> >+
> >+(define_insn "*join2_load_store<JOIN_MODE:mode>"
> >+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand"
> "=d,f,m,m")
> >+	(match_operand:JOIN_MODE 1 "nonimmediate_operand"
> "m,m,d,f"))
> >+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand"
> "=d,f,m,m")
> >+	(match_operand:JOIN_MODE 3 "nonimmediate_operand"
> "m,m,d,f"))]
> >+  "ENABLE_LD_ST_PAIRS && reload_completed"
> >+  {
> >+    bool load_p = (which_alternative == 0 || which_alternative == 1);
> >+    /* Reg-renaming pass reuses base register if it is dead after bonded
> loads.
> >+       Hardware does not bond those loads, even when they are
> consecutive.
> >+       However, order of the loads need to be checked for correctness.  */
> >+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
> >+      {
> >+	output_asm_insn (mips_output_move (operands[0], operands[1]),
> >+			 operands);
> >+	output_asm_insn (mips_output_move (operands[2], operands[3]),
> >+			 &operands[2]);
> >+      }
> >+    else
> >+      {
> >+	output_asm_insn (mips_output_move (operands[2], operands[3]),
> >+			 &operands[2]);
> >+	output_asm_insn (mips_output_move (operands[0], operands[1]),
> >+			 operands);
> >+      }
> >+    return "";
> >+  }
> >+  [(set_attr "move_type" "load,fpload,store,fpstore")
> >+   (set_attr "insn_count" "2,2,2,2")])
> >+
> >+;; 2 HI/SI/SF/DF loads are joined.
> >+;; P5600 does not support bonding of two LBs, hence QI mode is not
> included.
> >+(define_peephole2
> >+  [(set (match_operand:JOIN_MODE 0 "register_operand")
> >+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
> >+   (set (match_operand:JOIN_MODE 2 "register_operand")
> >+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
> 
> Please can you comment that the loads must be non-volatile as they may get
> re-ordered.
> 
> >+  "ENABLE_LD_ST_PAIRS &&
> 
> && on the next line
> 
> >+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode,
> true)"
> >+  [(parallel [(set (match_dup 0)
> >+		   (match_dup 1))
> >+	      (set (match_dup 2)
> >+		   (match_dup 3))])]
> >+  "")
> >+
> >+;; 2 HI/SI/SF/DF stores are joined.
> >+;; P5600 does not support bonding of two SBs, hence QI mode is not
> included.
> >+(define_peephole2
> >+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
> >+	(match_operand:JOIN_MODE 1 "register_operand"))
> >+   (set (match_operand:JOIN_MODE 2 "memory_operand")
> >+	(match_operand:JOIN_MODE 3 "register_operand"))]
> >+  "ENABLE_LD_ST_PAIRS &&
> 
> && on the next line
> 
> >+   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode,
> false)"
> >+  [(parallel [(set (match_dup 0)
> >+		   (match_dup 1))
> >+	      (set (match_dup 2)
> >+		   (match_dup 3))])]
> >+  "")
> >+
> >+(define_insn "*join2_loadhi"
> >+  [(set (match_operand:SI 0 "register_operand" "=r")
> >+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand"
> "m")))
> >+   (set (match_operand:SI 2 "register_operand" "=r")
> >+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand"
> "m")))]
> >+  "ENABLE_LD_ST_PAIRS && reload_completed"
> >+  {
> >+    /* Reg-renaming pass reuses base register if it is dead after bonded
> loads.
> >+       Hardware does not bond those loads, even when they are
> consecutive.
> >+       However, order of the loads need to be checked for correctness.  */
> >+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
> >+      {
> >+	output_asm_insn ("lh<u>\t%0,%1", operands);
> >+	output_asm_insn ("lh<u>\t%2,%3", operands);
> >+      }
> >+    else
> >+      {
> >+	output_asm_insn ("lh<u>\t%2,%3", operands);
> >+	output_asm_insn ("lh<u>\t%0,%1", operands);
> >+      }
> >+
> >+    return "";
> >+  }
> >+  [(set_attr "move_type" "load")
> >+   (set_attr "insn_count" "2")])
> >+
> >+
> >+;; 2 16 bit integer loads are joined.
> 
> 2 HI mode loads
> 
> >+(define_peephole2
> >+  [(set (match_operand:SI 0 "register_operand")
> >+	(any_extend:SI (match_operand:HI 1
> "non_volatile_mem_operand")))
> >+   (set (match_operand:SI 2 "register_operand")
> >+	(any_extend:SI (match_operand:HI 3
> "non_volatile_mem_operand")))]
> >+  "ENABLE_LD_ST_PAIRS &&
> >+   mips_load_store_bonding_p (operands, HImode, true)"
> >+  [(parallel [(set (match_dup 0)
> >+		   (any_extend:SI (match_dup 1)))
> >+	      (set (match_dup 2)
> >+		   (any_extend:SI (match_dup 3)))])]
> >+  "")
> >+
> >
> > ;; Synchronization instructions.
> >
> >diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt index
> >9e89aa9..a9baebe 100644
> >--- a/gcc/config/mips/mips.opt
> >+++ b/gcc/config/mips/mips.opt
> >@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision
> >registers
> >
> > noasmopt
> > Driver
> >+
> >+mload-store-pairs
> >+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1) Enable load/store
> >+bonding.
> 
> Catherine: We have this option in place just as a get-out clause if there are
> any side effects that have been missed in this patch such that you can still
> tune for p5600 but with bonding disabled. Do you think this is OK? I'm not
> completely against this being either undocumented or removed entirely.

If we allow the patch to go in, then it needs to be documented.  I don't have a strong opinion either way.  If you find it helpful to allow the option, then I'm OK with that.
Sameer, will please add a test case to the patch?
Thanks,
Catherine

> 
> Sameera: Assuming we keep it then it needs adding to the invoke doc.
> 
> Thanks,
> Matthew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-04-20 19:09         ` Matthew Fortune
  2015-04-20 22:01           ` Moore, Catherine
@ 2015-05-11 11:05           ` sameera
  2015-05-11 12:13             ` Matthew Fortune
  2015-05-11 16:40             ` Mike Stump
  1 sibling, 2 replies; 15+ messages in thread
From: sameera @ 2015-05-11 11:05 UTC (permalink / raw)
  To: Matthew Fortune, clm, Mike Stump; +Cc: Richard Sandiford, gcc-patches, echristo

[-- Attachment #1: Type: text/plain, Size: 12541 bytes --]

On Tuesday 21 April 2015 12:39 AM, Matthew Fortune wrote:
> Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
>> Gentle reminder!
>
> Thanks Sameera. Just a couple of comments inline below and a question
> for Catherine at the end.
>
>> - Thanks and regards,
>>     Sameera D.
>>
>> On Monday 30 March 2015 04:58 PM, sameera wrote:
>>> Hi!
>>>
>>> Sorry for delay in sending this patch for review.
>>> Please find attached updated patch.
>>>
>>> In P5600, 2 consecutive loads/stores of same type which access
>>> contiguous memory locations are bonded together by instruction issue
>>> unit to dispatch single load/store instruction which accesses both
>> locations. This allows 2X improvement in memory intensive code. This
>> optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC
>> instructions.
>>>
>>> This patch adds peephole2 patterns to identify such loads/stores, and
>>> put them in parallel, so that the scheduler will not split it -
>> thereby guaranteeing h/w level load/store bonding.
>>>
>>> The patch is tested with dejagnu for correctness, and tested on
>> hardware for performance.
>>> Ok for trunk?
>>>
>>> Changelog:
>>> gcc/
>>>           * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>>       (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>>       (join2_loadhi): Likewise.
>>>       (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-
>> mode
>>>       load-load and store-stores.
>>>       * config/mips/mips.opt (mload-store-pairs): New option.
>>>       (TARGET_LOAD_STORE_PAIRS): New macro.
>>>       *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>>>       *config/mips/mips-protos.h (mips_load_store_bonding_p): New
>> prototype.
>>>       *config/mips/mips.c(mips_load_store_bonding_p): New function.
>
> I don't know if this has been corrupted by mail clients but a single
> space after '*' and a space before '('.
>
>> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
>> index b48e04f..244eb8d 100644
>> --- a/gcc/config/mips/mips-protos.h
>> +++ b/gcc/config/mips/mips-protos.h
>> @@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
>> extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
>> extern int mips_trampoline_code_size (void);
>> extern void mips_function_profiler (FILE *);
>> +extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
>>
>> typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
>> #ifdef RTX_CODE
>> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>> index 1733457..85f0591 100644
>> --- a/gcc/config/mips/mips.c
>> +++ b/gcc/config/mips/mips.c
>> @@ -18241,6 +18241,64 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
>>    return true;
>> }
>>
>> +bool
>> +mips_load_store_bonding_p (rtx *operands, enum machine_mode mode, bool load_p)
>
> Remove enum from machine_mode.
>
>> +{
>> +  rtx reg1, reg2, mem1, mem2, base1, base2;
>> +  enum reg_class rc1, rc2;
>> +  HOST_WIDE_INT offset1, offset2;
>> +
>> +  if (load_p)
>> +    {
>> +      reg1 = operands[0];
>> +      reg2 = operands[2];
>> +      mem1 = operands[1];
>> +      mem2 = operands[3];
>> +    }
>> +  else
>> +    {
>> +      reg1 = operands[1];
>> +      reg2 = operands[3];
>> +      mem1 = operands[0];
>> +      mem2 = operands[2];
>> +    }
>> +
>> +  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
>> +      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
>> +    return false;
>> +
>> +  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
>> +  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
>> +
>> +  /* Base regs do not match.  */
>> +  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
>> +    return false;
>> +
>> +  /* Either of the loads is clobbering base register.  */
>> +  if (load_p
>> +      && (REGNO (reg1) == REGNO (base1)
>> +	  || (REGNO (reg2) == REGNO (base1))))
>> +    return false;
>
> Can you add a comment saying that this case does not get bonded by
> any known hardware even though it could be valid to bond them if it
> is the second load that clobbers the base.
>
>> +  /* Loading in same registers.  */
>> +  if (load_p
>> +      && REGNO (reg1) == REGNO (reg2))
>> +    return false;
>> +
>> +  /* The loads/stores are not of same type.  */
>> +  rc1 = REGNO_REG_CLASS (REGNO (reg1));
>> +  rc2 = REGNO_REG_CLASS (REGNO (reg2));
>> +  if (rc1 != rc2
>> +      && !reg_class_subset_p (rc1, rc2)
>> +      && !reg_class_subset_p (rc2, rc1))
>> +    return false;
>> +
>> +  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
>> +    return false;
>> +
>> +  return true;
>> +}
>> +
>> /* OPERANDS describes the operands to a pair of SETs, in the order
>>     dest1, src1, dest2, src2.  Return true if the operands can be used
>>     in an LWP or SWP instruction; LOAD_P says which.  */
>> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>> index ec69ed5..1bd0dae 100644
>> --- a/gcc/config/mips/mips.h
>> +++ b/gcc/config/mips/mips.h
>> @@ -3147,3 +3147,7 @@ extern GTY(()) struct target_globals *mips16_globals;
>> #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
>> #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
>> #endif
>> +
>> +#define ENABLE_LD_ST_PAIRS \
>> +  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
>> +   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
>
> I've already forgotten why these extra micromips/fix24k conditions
> were required. Can you add a comment?
>
>> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
>> index 3672c0b..cfdb750 100644
>> --- a/gcc/config/mips/mips.md
>> +++ b/gcc/config/mips/mips.md
>> @@ -754,6 +754,11 @@
>>
>> (define_mode_iterator MOVEP1 [SI SF])
>> (define_mode_iterator MOVEP2 [SI SF])
>> +(define_mode_iterator JOIN_MODE [HI
>> +				 SI
>> +				 (SF "TARGET_HARD_FLOAT")
>> +				 (DF "TARGET_HARD_FLOAT
>> +				      && TARGET_DOUBLE_FLOAT")])
>>
>> ;; This mode iterator allows :HILO to be used as the mode of the
>> ;; concatenated HI and LO registers.
>> @@ -7419,6 +7424,108 @@
>>    { return MIPS_CALL ("jal", operands, 0, -1); }
>>    [(set_attr "type" "call")
>>     (set_attr "insn_count" "3")])
>> +
>> +(define_insn "*join2_load_store<JOIN_MODE:mode>"
>> +  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
>> +	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
>> +   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
>> +	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
>> +  "ENABLE_LD_ST_PAIRS && reload_completed"
>> +  {
>> +    bool load_p = (which_alternative == 0 || which_alternative == 1);
>> +    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>> +       Hardware does not bond those loads, even when they are consecutive.
>> +       However, order of the loads need to be checked for correctness.  */
>> +    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
>> +      {
>> +	output_asm_insn (mips_output_move (operands[0], operands[1]),
>> +			 operands);
>> +	output_asm_insn (mips_output_move (operands[2], operands[3]),
>> +			 &operands[2]);
>> +      }
>> +    else
>> +      {
>> +	output_asm_insn (mips_output_move (operands[2], operands[3]),
>> +			 &operands[2]);
>> +	output_asm_insn (mips_output_move (operands[0], operands[1]),
>> +			 operands);
>> +      }
>> +    return "";
>> +  }
>> +  [(set_attr "move_type" "load,fpload,store,fpstore")
>> +   (set_attr "insn_count" "2,2,2,2")])
>> +
>> +;; 2 HI/SI/SF/DF loads are joined.
>> +;; P5600 does not support bonding of two LBs, hence QI mode is not included.
>> +(define_peephole2
>> +  [(set (match_operand:JOIN_MODE 0 "register_operand")
>> +	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
>> +   (set (match_operand:JOIN_MODE 2 "register_operand")
>> +	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
>
> Please can you comment that the loads must be non-volatile as they may get
> re-ordered.
>
>> +  "ENABLE_LD_ST_PAIRS &&
>
> && on the next line
>
>> +   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
>> +  [(parallel [(set (match_dup 0)
>> +		   (match_dup 1))
>> +	      (set (match_dup 2)
>> +		   (match_dup 3))])]
>> +  "")
>> +
>> +;; 2 HI/SI/SF/DF stores are joined.
>> +;; P5600 does not support bonding of two SBs, hence QI mode is not included.
>> +(define_peephole2
>> +  [(set (match_operand:JOIN_MODE 0 "memory_operand")
>> +	(match_operand:JOIN_MODE 1 "register_operand"))
>> +   (set (match_operand:JOIN_MODE 2 "memory_operand")
>> +	(match_operand:JOIN_MODE 3 "register_operand"))]
>> +  "ENABLE_LD_ST_PAIRS &&
>
> && on the next line
>
>> +   mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
>> +  [(parallel [(set (match_dup 0)
>> +		   (match_dup 1))
>> +	      (set (match_dup 2)
>> +		   (match_dup 3))])]
>> +  "")
>> +
>> +(define_insn "*join2_loadhi"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
>> +   (set (match_operand:SI 2 "register_operand" "=r")
>> +	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
>> +  "ENABLE_LD_ST_PAIRS && reload_completed"
>> +  {
>> +    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>> +       Hardware does not bond those loads, even when they are consecutive.
>> +       However, order of the loads need to be checked for correctness.  */
>> +    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
>> +      {
>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>> +      }
>> +    else
>> +      {
>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>> +      }
>> +
>> +    return "";
>> +  }
>> +  [(set_attr "move_type" "load")
>> +   (set_attr "insn_count" "2")])
>> +
>> +
>> +;; 2 16 bit integer loads are joined.
>
> 2 HI mode loads
>
>> +(define_peephole2
>> +  [(set (match_operand:SI 0 "register_operand")
>> +	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
>> +   (set (match_operand:SI 2 "register_operand")
>> +	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
>> +  "ENABLE_LD_ST_PAIRS &&
>> +   mips_load_store_bonding_p (operands, HImode, true)"
>> +  [(parallel [(set (match_dup 0)
>> +		   (any_extend:SI (match_dup 1)))
>> +	      (set (match_dup 2)
>> +		   (any_extend:SI (match_dup 3)))])]
>> +  "")
>> +
>>
>> ;; Synchronization instructions.
>>
>> diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
>> index 9e89aa9..a9baebe 100644
>> --- a/gcc/config/mips/mips.opt
>> +++ b/gcc/config/mips/mips.opt
>> @@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
>>
>> noasmopt
>> Driver
>> +
>> +mload-store-pairs
>> +Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
>> +Enable load/store bonding.
>
> Catherine: We have this option in place just as a get-out clause if
> there are any side effects that have been missed in this patch such
> that you can still tune for p5600 but with bonding disabled. Do you
> think this is OK? I'm not completely against this being either
> undocumented or removed entirely.
>
> Sameera: Assuming we keep it then it needs adding to the invoke doc.
>
> Thanks,
> Matthew
>

Hi Matthew, Catherine and Mike,

Thanks for your comments.

Please find attached updated patch. I have added a testcase for bonding. However, unlike other architectures, we do not generate single instruction 
for bonded pair, because of which it is difficult to check if bonding is happening or not. Hence, an assembly file is generated with debug dumps, and 
the bonded loads/stores are identified by their pattern names.

I am trying FUSION for MIPS as suggested by Mike, and testing the perf impact of it along with other mips specific options.

Changelog:
gcc/
         * config/mips/mips.md (JOIN_MODE): New mode iterator.
         (join2_load_Store<JOIN_MODE:mode>): New pattern.
         (join2_loadhi): Likewise.
         (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
         load-load and store-stores.
         * config/mips/mips.opt (mload-store-pairs): New option.
         (TARGET_LOAD_STORE_PAIRS): New macro.
         * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
         * config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
         * config/mips/mips.c (mips_load_store_bonding_p): New function.

gcc/testsuite/
         * gcc.target/mips/p5600-bonding.c : New testcase to test bonding.

- Thanks and regards,
   Sameera D.

[-- Attachment #2: load-store-bonding-tot.patch --]
[-- Type: text/x-patch, Size: 9095 bytes --]

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bf69850..4fc15c4 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,66 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+    {
+      reg1 = operands[0];
+      reg2 = operands[2];
+      mem1 = operands[1];
+      mem2 = operands[3];
+    }
+  else
+    {
+      reg1 = operands[1];
+      reg2 = operands[3];
+      mem1 = operands[0];
+      mem2 = operands[2];
+    }
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+    return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+    return false;
+
+  /* Either of the loads is clobbering base register.  It is legitimate to bond
+     loads if second load clobbers base register.  However, hardware does not
+     support such bonding.  */
+  if (load_p
+      && (REGNO (reg1) == REGNO (base1)
+	  || (REGNO (reg2) == REGNO (base1))))
+    return false;
+
+  /* Loading in same registers.  */
+  if (load_p
+      && REGNO (reg1) == REGNO (reg2))
+    return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
+  if (rc1 != rc2
+      && !reg_class_subset_p (rc1, rc2)
+      && !reg_class_subset_p (rc2, rc1))
+    return false;
+
+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
+    return false;
+
+  return true;
+}
+
 /* OPERANDS describes the operands to a pair of SETs, in the order
    dest1, src1, dest2, src2.  Return true if the operands can be used
    in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 4bd83f5..c9accd1 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3162,3 +3162,10 @@ extern GTY(()) struct target_globals *mips16_globals;
 #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
 #endif
+
+/* Load store bonding is not supported by micromips and fix_24k.  The
+   performance can be degraded for those targets.  Hence, do not bond for
+   micromips or fix_24k.  */
+#define ENABLE_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ed4c0ba..0e2b172 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -754,6 +754,11 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOIN_MODE [HI
+				 SI
+				 (SF "TARGET_HARD_FLOAT")
+				 (DF "TARGET_HARD_FLOAT
+				      && TARGET_DOUBLE_FLOAT")])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -7407,6 +7412,112 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+;; Match paired HI/SI/SF/DFmode load/stores.
+(define_insn "*join2_load_store<JOIN_MODE:mode>"
+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    bool load_p = (which_alternative == 0 || which_alternative == 1);
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+      }
+    else
+      {
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+      }
+    return "";
+  }
+  [(set_attr "move_type" "load,fpload,store,fpstore")
+   (set_attr "insn_count" "2,2,2,2")])
+
+;; 2 HI/SI/SF/DF loads are joined.
+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
+;; The loads must be non-volatile as they might be reordered at the time of asm
+;; generation.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "register_operand")
+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
+   (set (match_operand:JOIN_MODE 2 "register_operand")
+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; 2 HI/SI/SF/DF stores are joined.
+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
+	(match_operand:JOIN_MODE 1 "register_operand"))
+   (set (match_operand:JOIN_MODE 2 "memory_operand")
+	(match_operand:JOIN_MODE 3 "register_operand"))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; Match paired HImode loads.
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+      }
+    else
+      {
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+      }
+
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])
+
+
+;; 2 HI loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
+   (set (match_operand:SI 2 "register_operand")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, HImode, true)"
+  [(parallel [(set (match_dup 0)
+		   (any_extend:SI (match_dup 1)))
+	      (set (match_dup 2)
+		   (any_extend:SI (match_dup 3)))])]
+  "")
+
 \f
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 9e89aa9..a9baebe 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
 
 noasmopt
 Driver
+
+mload-store-pairs
+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
+Enable load/store bonding.
diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
new file mode 100644
index 0000000..122b9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
+/* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" } { "" } } */
+typedef int VINT32 __attribute__ ((vector_size((16))));
+
+void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, int num)
+{
+    VINT32 *vsrc = (VINT32 *)src;
+    VINT32 *vdest = (VINT32 *)dest;
+    int i;
+
+    for (i = 0; i < num - 1; i+=2)
+    {
+      vdest[i] = (vdest[i] + vsrc[i]);
+      vdest[i + 1] = vdest[i + 1] + vsrc[i + 1];
+    }
+}
+/* { dg-final { scan-assembler "join2_" } }  */
+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-05-11 11:05           ` sameera
@ 2015-05-11 12:13             ` Matthew Fortune
  2015-05-11 12:39               ` sameera
  2015-05-11 16:40             ` Mike Stump
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Fortune @ 2015-05-11 12:13 UTC (permalink / raw)
  To: Sameera Deshpande, clm, Mike Stump
  Cc: Richard Sandiford, gcc-patches, echristo

Hi Sameera,

Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes: 
> Changelog:
> gcc/
>          * config/mips/mips.md (JOIN_MODE): New mode iterator.
>          (join2_load_Store<JOIN_MODE:mode>): New pattern.
>          (join2_loadhi): Likewise.
>          (define_peehole2): Add peephole2 patterns to join 2
> HI/SI/SF/DF-mode
>          load-load and store-stores.
>          * config/mips/mips.opt (mload-store-pairs): New option.
>          (TARGET_LOAD_STORE_PAIRS): New macro.
>          * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>          * config/mips/mips-protos.h (mips_load_store_bonding_p): New
> prototype.
>          * config/mips/mips.c (mips_load_store_bonding_p): New function.
> 
> gcc/testsuite/
>          * gcc.target/mips/p5600-bonding.c : New testcase to test
> bonding.

Just 'New file.' is fine for the changelog.

>diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
>new file mode 100644
>index 0000000..122b9f8
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
>@@ -0,0 +1,19 @@
>+/* { dg-do compile } */
>+/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
>+/* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" } { "" } } */
>+typedef int VINT32 __attribute__ ((vector_size((16))));
>+
>+void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, int num)

Code style applies for testcases too, return type on line above, space
after function name, line length.

>+{
>+    VINT32 *vsrc = (VINT32 *)src;

Indentation.

>+    VINT32 *vdest = (VINT32 *)dest;
>+    int i;
>+
>+    for (i = 0; i < num - 1; i+=2)
>+    {

Indentation

>+      vdest[i] = (vdest[i] + vsrc[i]);

Unnecessary brackets.

>+      vdest[i + 1] = vdest[i + 1] + vsrc[i + 1];
>+    }
>+}
>+/* { dg-final { scan-assembler "join2_" } }  */
>+

OK with those changes.

Thanks,
Matthew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-05-11 12:13             ` Matthew Fortune
@ 2015-05-11 12:39               ` sameera
  0 siblings, 0 replies; 15+ messages in thread
From: sameera @ 2015-05-11 12:39 UTC (permalink / raw)
  To: Matthew Fortune, clm, Mike Stump; +Cc: Richard Sandiford, gcc-patches, echristo

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

On Monday 11 May 2015 05:43 PM, Matthew Fortune wrote:
> Hi Sameera,
>
> Sameera Deshpande <Sameera.Deshpande@imgtec.com> writes:
>> Changelog:
>> gcc/
>>           * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>           (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>           (join2_loadhi): Likewise.
>>           (define_peehole2): Add peephole2 patterns to join 2
>> HI/SI/SF/DF-mode
>>           load-load and store-stores.
>>           * config/mips/mips.opt (mload-store-pairs): New option.
>>           (TARGET_LOAD_STORE_PAIRS): New macro.
>>           * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>>           * config/mips/mips-protos.h (mips_load_store_bonding_p): New
>> prototype.
>>           * config/mips/mips.c (mips_load_store_bonding_p): New function.
>>
>> gcc/testsuite/
>>           * gcc.target/mips/p5600-bonding.c : New testcase to test
>> bonding.
>
> Just 'New file.' is fine for the changelog.
>
>> diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
>> new file mode 100644
>> index 0000000..122b9f8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
>> +/* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" } { "" } } */
>> +typedef int VINT32 __attribute__ ((vector_size((16))));
>> +
>> +void memory_operation_fun2_si(void * __restrict src, void * __restrict dest, int num)
>
> Code style applies for testcases too, return type on line above, space
> after function name, line length.
>
>> +{
>> +    VINT32 *vsrc = (VINT32 *)src;
>
> Indentation.
>
>> +    VINT32 *vdest = (VINT32 *)dest;
>> +    int i;
>> +
>> +    for (i = 0; i < num - 1; i+=2)
>> +    {
>
> Indentation
>
>> +      vdest[i] = (vdest[i] + vsrc[i]);
>
> Unnecessary brackets.
>
>> +      vdest[i + 1] = vdest[i + 1] + vsrc[i + 1];
>> +    }
>> +}
>> +/* { dg-final { scan-assembler "join2_" } }  */
>> +
>
> OK with those changes.
>
> Thanks,
> Matthew
>
Hi Matthew,

Thanks for the comments.
Please find attached updated patch.

I do not have permissions to apply the patch in GCC.
Can you please submit the patch for me?

Changelog:
gcc/
         * config/mips/mips.md (JOIN_MODE): New mode iterator.
         (join2_load_Store<JOIN_MODE:mode>): New pattern.
         (join2_loadhi): Likewise.
         (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
         load-load and store-stores.
         * config/mips/mips.opt (mload-store-pairs): New option.
         (TARGET_LOAD_STORE_PAIRS): New macro.
         * config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
         * config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
         * config/mips/mips.c (mips_load_store_bonding_p): New function.

gcc/testsuite/
         * gcc.target/mips/p5600-bonding.c : New file.

[-- Attachment #2: load-store-bonding-tot.patch --]
[-- Type: text/x-patch, Size: 9075 bytes --]

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index b48e04f..244eb8d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -360,6 +360,7 @@ extern bool mips_epilogue_uses (unsigned int);
 extern void mips_final_prescan_insn (rtx_insn *, rtx *, int);
 extern int mips_trampoline_code_size (void);
 extern void mips_function_profiler (FILE *);
+extern bool mips_load_store_bonding_p (rtx *, machine_mode, bool);
 
 typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
 #ifdef RTX_CODE
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bf69850..4fc15c4 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18241,6 +18241,66 @@ umips_load_store_pair_p_1 (bool load_p, bool swap_p,
   return true;
 }
 
+bool
+mips_load_store_bonding_p (rtx *operands, machine_mode mode, bool load_p)
+{
+  rtx reg1, reg2, mem1, mem2, base1, base2;
+  enum reg_class rc1, rc2;
+  HOST_WIDE_INT offset1, offset2;
+
+  if (load_p)
+    {
+      reg1 = operands[0];
+      reg2 = operands[2];
+      mem1 = operands[1];
+      mem2 = operands[3];
+    }
+  else
+    {
+      reg1 = operands[1];
+      reg2 = operands[3];
+      mem1 = operands[0];
+      mem2 = operands[2];
+    }
+
+  if (mips_address_insns (XEXP (mem1, 0), mode, false) == 0
+      || mips_address_insns (XEXP (mem2, 0), mode, false) == 0)
+    return false;
+
+  mips_split_plus (XEXP (mem1, 0), &base1, &offset1);
+  mips_split_plus (XEXP (mem2, 0), &base2, &offset2);
+
+  /* Base regs do not match.  */
+  if (!REG_P (base1) || !rtx_equal_p (base1, base2))
+    return false;
+
+  /* Either of the loads is clobbering base register.  It is legitimate to bond
+     loads if second load clobbers base register.  However, hardware does not
+     support such bonding.  */
+  if (load_p
+      && (REGNO (reg1) == REGNO (base1)
+	  || (REGNO (reg2) == REGNO (base1))))
+    return false;
+
+  /* Loading in same registers.  */
+  if (load_p
+      && REGNO (reg1) == REGNO (reg2))
+    return false;
+
+  /* The loads/stores are not of same type.  */
+  rc1 = REGNO_REG_CLASS (REGNO (reg1));
+  rc2 = REGNO_REG_CLASS (REGNO (reg2));
+  if (rc1 != rc2
+      && !reg_class_subset_p (rc1, rc2)
+      && !reg_class_subset_p (rc2, rc1))
+    return false;
+
+  if (abs (offset1 - offset2) != GET_MODE_SIZE (mode))
+    return false;
+
+  return true;
+}
+
 /* OPERANDS describes the operands to a pair of SETs, in the order
    dest1, src1, dest2, src2.  Return true if the operands can be used
    in an LWP or SWP instruction; LOAD_P says which.  */
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 4bd83f5..c9accd1 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3162,3 +3162,10 @@ extern GTY(()) struct target_globals *mips16_globals;
 #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
 #endif
+
+/* Load store bonding is not supported by micromips and fix_24k.  The
+   performance can be degraded for those targets.  Hence, do not bond for
+   micromips or fix_24k.  */
+#define ENABLE_LD_ST_PAIRS \
+  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+   && !TARGET_MICROMIPS && !TARGET_FIX_24K)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ed4c0ba..0e2b172 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -754,6 +754,11 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOIN_MODE [HI
+				 SI
+				 (SF "TARGET_HARD_FLOAT")
+				 (DF "TARGET_HARD_FLOAT
+				      && TARGET_DOUBLE_FLOAT")])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -7407,6 +7412,112 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+;; Match paired HI/SI/SF/DFmode load/stores.
+(define_insn "*join2_load_store<JOIN_MODE:mode>"
+  [(set (match_operand:JOIN_MODE 0 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 1 "nonimmediate_operand" "m,m,d,f"))
+   (set (match_operand:JOIN_MODE 2 "nonimmediate_operand" "=d,f,m,m")
+	(match_operand:JOIN_MODE 3 "nonimmediate_operand" "m,m,d,f"))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    bool load_p = (which_alternative == 0 || which_alternative == 1);
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!load_p || !reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+      }
+    else
+      {
+	output_asm_insn (mips_output_move (operands[2], operands[3]),
+			 &operands[2]);
+	output_asm_insn (mips_output_move (operands[0], operands[1]),
+			 operands);
+      }
+    return "";
+  }
+  [(set_attr "move_type" "load,fpload,store,fpstore")
+   (set_attr "insn_count" "2,2,2,2")])
+
+;; 2 HI/SI/SF/DF loads are joined.
+;; P5600 does not support bonding of two LBs, hence QI mode is not included.
+;; The loads must be non-volatile as they might be reordered at the time of asm
+;; generation.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "register_operand")
+	(match_operand:JOIN_MODE 1 "non_volatile_mem_operand"))
+   (set (match_operand:JOIN_MODE 2 "register_operand")
+	(match_operand:JOIN_MODE 3 "non_volatile_mem_operand"))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, true)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; 2 HI/SI/SF/DF stores are joined.
+;; P5600 does not support bonding of two SBs, hence QI mode is not included.
+(define_peephole2
+  [(set (match_operand:JOIN_MODE 0 "memory_operand")
+	(match_operand:JOIN_MODE 1 "register_operand"))
+   (set (match_operand:JOIN_MODE 2 "memory_operand")
+	(match_operand:JOIN_MODE 3 "register_operand"))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, <JOIN_MODE:MODE>mode, false)"
+  [(parallel [(set (match_dup 0)
+		   (match_dup 1))
+	      (set (match_dup 2)
+		   (match_dup 3))])]
+  "")
+
+;; Match paired HImode loads.
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+      }
+    else
+      {
+	output_asm_insn ("lh<u>\t%2,%3", operands);
+	output_asm_insn ("lh<u>\t%0,%1", operands);
+      }
+
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])
+
+
+;; 2 HI loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand")))
+   (set (match_operand:SI 2 "register_operand")
+	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand")))]
+  "ENABLE_LD_ST_PAIRS
+   && mips_load_store_bonding_p (operands, HImode, true)"
+  [(parallel [(set (match_dup 0)
+		   (any_extend:SI (match_dup 1)))
+	      (set (match_dup 2)
+		   (any_extend:SI (match_dup 3)))])]
+  "")
+
 \f
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 9e89aa9..a9baebe 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -418,3 +418,7 @@ Enable use of odd-numbered single-precision registers
 
 noasmopt
 Driver
+
+mload-store-pairs
+Target Report Var(TARGET_LOAD_STORE_PAIRS) Init(1)
+Enable load/store bonding.
diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
new file mode 100644
index 0000000..0890ffa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
+/* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" } { "" } } */
+typedef int VINT32 __attribute__ ((vector_size((16))));
+
+void
+memory_operation (void * __restrict src, void * __restrict dest, int num)
+{
+  VINT32 *vsrc = (VINT32 *) src;
+  VINT32 *vdest = (VINT32 *) dest;
+  int i;
+
+  for (i = 0; i < num - 1; i += 2)
+  {
+    vdest[i] = vdest[i] + vsrc[i];
+    vdest[i + 1] = vdest[i + 1] + vsrc[i + 1];
+  }
+}
+/* { dg-final { scan-assembler "join2_" } }  */
+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-05-11 11:05           ` sameera
  2015-05-11 12:13             ` Matthew Fortune
@ 2015-05-11 16:40             ` Mike Stump
  2015-05-13  7:39               ` sameera
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Stump @ 2015-05-11 16:40 UTC (permalink / raw)
  To: sameera; +Cc: Matthew Fortune, clm, Richard Sandiford, gcc-patches, echristo

On May 11, 2015, at 4:05 AM, sameera <sameera.deshpande@imgtec.com> wrote:
>>> +(define_insn "*join2_loadhi"
>>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>>> +	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
>>> +   (set (match_operand:SI 2 "register_operand" "=r")
>>> +	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
>>> +  "ENABLE_LD_ST_PAIRS && reload_completed"
>>> +  {
>>> +    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>>> +       Hardware does not bond those loads, even when they are consecutive.
>>> +       However, order of the loads need to be checked for correctness.  */
>>> +    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
>>> +      {
>>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>>> +      }
>>> +    else
>>> +      {
>>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>>> +      }
>>> +
>>> +    return "";
>>> +  }
>>> +  [(set_attr "move_type" "load")
>>> +   (set_attr "insn_count" "2")])

> However, unlike other architectures, we do not generate single instruction for bonded pair,

Actually, you do.  The above is 1 instruction pattern.  Doesn’t matter much what it prints as or what the CPU thinks of it.

> because of which it is difficult to check if bonding is happening or not. Hence, an assembly file is generated with debug dumps, and the bonded loads/stores are identified by their pattern names.

Nothing wrong with that approach.  Also, in the assembly, one can look for sequences of instruction if they way.  
See gcc/testsuite/gcc.target/aarch64/fuse_adrp_add_1.c:

  /* { dg-final { scan-assembler "adrp\tx.*, fixed_regs\n\tadd\tx.*, x.*fixed_regs" } } */

in the test suite for example.

> I am trying FUSION for MIPS as suggested by Mike, and testing the perf impact of it along with other mips specific options.

I think you will discover it is virtually what you have now, and works better.  The fusion just can peephole over greater distances, that’s the only real difference.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][MIPS] Enable load-load/store-store bonding
  2015-05-11 16:40             ` Mike Stump
@ 2015-05-13  7:39               ` sameera
  0 siblings, 0 replies; 15+ messages in thread
From: sameera @ 2015-05-13  7:39 UTC (permalink / raw)
  To: Mike Stump; +Cc: Matthew Fortune, clm, Richard Sandiford, gcc-patches, echristo

Hi Mike,

Thanks for your comments.
Please find my comments inlined.

- Thanks and regards,
   Sameera D.

On Monday 11 May 2015 10:09 PM, Mike Stump wrote:
> On May 11, 2015, at 4:05 AM, sameera <sameera.deshpande@imgtec.com> wrote:
>>>> +(define_insn "*join2_loadhi"
>>>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>>>> +	(any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
>>>> +   (set (match_operand:SI 2 "register_operand" "=r")
>>>> +	(any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
>>>> +  "ENABLE_LD_ST_PAIRS && reload_completed"
>>>> +  {
>>>> +    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
>>>> +       Hardware does not bond those loads, even when they are consecutive.
>>>> +       However, order of the loads need to be checked for correctness.  */
>>>> +    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
>>>> +      {
>>>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>>>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>>>> +      }
>>>> +    else
>>>> +      {
>>>> +	output_asm_insn ("lh<u>\t%2,%3", operands);
>>>> +	output_asm_insn ("lh<u>\t%0,%1", operands);
>>>> +      }
>>>> +
>>>> +    return "";
>>>> +  }
>>>> +  [(set_attr "move_type" "load")
>>>> +   (set_attr "insn_count" "2")])
>
>> However, unlike other architectures, we do not generate single instruction for bonded pair,
>
> Actually, you do.  The above is 1 instruction pattern.  Doesn’t matter much what it prints as or what the CPU thinks of it.
The pattern is single, however, the asm code will have multiple instructions generated for the pattern.
>
>> because of which it is difficult to check if bonding is happening or not. Hence, an assembly file is generated with debug dumps, and the bonded loads/stores are identified by their pattern names.
>
> Nothing wrong with that approach.  Also, in the assembly, one can look for sequences of instruction if they way.
Load/store bonding is not just contiguous load/store instructions, but they also need to have same base register and offset with specific difference. 
Hence, The way you suggested might not be useful always. Hence, I am comparing the pattern name instead.
> See gcc/testsuite/gcc.target/aarch64/fuse_adrp_add_1.c:
>
>    /* { dg-final { scan-assembler "adrp\tx.*, fixed_regs\n\tadd\tx.*, x.*fixed_regs" } } */
>
> in the test suite for example.
>
>> I am trying FUSION for MIPS as suggested by Mike, and testing the perf impact of it along with other mips specific options.
>
> I think you will discover it is virtually what you have now, and works better.  The fusion just can peephole over greater distances, that’s the only real difference.
Yes, in many cases I see clear improvement. However, it also tries to bring loads/stores together, which were split intentionally by msched-weight 
option, introduced for MIPS. I need to measure performance and do perf tuning (if needed) for that option before sending it for review.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH][MIPS] Enable load-load/store-store bonding
@ 2014-06-19  9:41 Sameera Deshpande
  0 siblings, 0 replies; 15+ messages in thread
From: Sameera Deshpande @ 2014-06-19  9:41 UTC (permalink / raw)
  To: rdsandiford; +Cc: Matthew Fortune, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

Hi Richard,

Please find attached the patch implementing load-load/store-store bonding supported by P5600.

In P5600, 2 consecutive loads/stores of same type which access contiguous memory locations are bonded together by instruction issue unit to dispatch single load/store instruction which accesses both locations. This allows 2X improvement in memory intensive code. This optimization can be performed for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.

This patch adds peephole2 patterns to identify such loads/stores, and put them in parallel, so that the scheduler will not split it - thereby guarantying h/w level load/store bonding.

The patch is tested with dejagnu for correctness.
Local testing on hardware for perf is  currently going on.
Ok for trunk? 

Changelog:
gcc/
	* config/mips/mips.md (JOINLDST1): New mode iterator.
	(insn_type): New mode attribute.
	(reg): Update mode attribute.
	(join2_load_Store<JOINLDST1:mode>): New pattern.
	(join2_loadhi): Likewise.
	(join2_storehi): Likewise.
	(define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
	load-load and store-stores.
	* config/mips/mips.opt (mld-st-pairing): New option.
	* config/mips/mips.c (mips_option_override): New exception.
	*config/mips/mips.h (ENABLE_LD_ST_PAIRING): New macro.

- Thanks and regards,
   Sameera D.


[-- Attachment #2: load-store-pairing.patch --]
[-- Type: application/octet-stream, Size: 8218 bytes --]

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index b5b5ba7..9804ef2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18813,6 +18813,9 @@ mips_option_override (void)
   if (TARGET_MICROMIPS && TARGET_MIPS16)
     error ("unsupported combination: %s", "-mips16 -mmicromips");
 
+  if (TARGET_FIX_24K && TUNE_P5600)
+    error ("unsupported combination: %s", "-mtune=p5600 -mfix-24k");
+
   /* Save the base compression state and process flags as though we
      were generating uncompressed code.  */
   mips_base_compression_flags = TARGET_COMPRESSION;
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 86ca419..4478f81e 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3184,3 +3184,6 @@ extern GTY(()) struct target_globals *mips16_globals;
    with arguments ARGS.  */
 #define PMODE_INSN(NAME, ARGS) \
   (Pmode == SImode ? NAME ## _si ARGS : NAME ## _di ARGS)
+
+#define ENABLE_LD_ST_PAIRING \
+  (TARGET_ENABLE_LD_ST_PAIRING && TUNE_P5600)
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 7229e8f..05605c5 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -780,6 +780,7 @@
 
 (define_mode_iterator MOVEP1 [SI SF])
 (define_mode_iterator MOVEP2 [SI SF])
+(define_mode_iterator JOINLDST1 [SI SF DF])
 
 ;; This mode iterator allows :HILO to be used as the mode of the
 ;; concatenated HI and LO registers.
@@ -883,6 +884,8 @@
 (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
 (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
 
+(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
+
 ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
 ;; are different.  Some forms of unextended addiu have an 8-bit immediate
 ;; field but the equivalent daddiu has only a 5-bit field.
@@ -890,7 +893,7 @@
 
 ;; This attribute gives the best constraint to use for registers of
 ;; a given mode.
-(define_mode_attr reg [(SI "d") (DI "d") (CC "z") (CCF "f")])
+(define_mode_attr reg [(SI "d") (DI "d") (CC "z") (CCF "f") (SF "f") (DF "f")])
 
 ;; This attribute gives the format suffix for floating-point operations.
 (define_mode_attr fmt [(SF "s") (DF "d") (V2SF "ps")])
@@ -7442,6 +7445,153 @@
   { return MIPS_CALL ("jal", operands, 0, -1); }
   [(set_attr "type" "call")
    (set_attr "insn_count" "3")])
+
+(define_insn "*join2_load_store<JOINLDST1:mode>"
+  [(parallel
+    [(set (match_operand:JOINLDST1 0 "nonimmediate_operand" "=<reg>,m")
+	  (match_operand:JOINLDST1 1 "nonimmediate_operand" "m,<reg>"))
+     (set (match_operand:JOINLDST1 2 "nonimmediate_operand" "=<reg>,m")
+	  (match_operand:JOINLDST1 3 "nonimmediate_operand" "m,<reg>"))])]
+  "ENABLE_LD_ST_PAIRING && reload_completed"
+  {
+    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
+    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
+    return "";
+  }
+  [(set_attr "move_type" "<insn_type>load,<insn_type>store")
+   (set_attr_alternative "insn_count"
+	[(mult (symbol_ref "mips_load_store_insns (operands[1], insn)")
+	       (const_int 2))
+	 (mult (symbol_ref "mips_load_store_insns (operands[0], insn)")
+	       (const_int 2))])])
+
+;;2 SI/SF/DF loads are joined.
+(define_peephole2
+  [(set (match_operand:JOINLDST1 0 "register_operand")
+	(mem:JOINLDST1 (plus:SI (match_operand:SI 1 "register_operand")
+				(match_operand:SI 2 "const_int_operand"))))
+   (set (match_operand:JOINLDST1 3 "register_operand")
+	(mem:JOINLDST1 (plus:SI (match_dup 1)
+				(match_operand:SI 4 "const_int_operand"))))]
+  "ENABLE_LD_ST_PAIRING &&
+   REGNO_REG_CLASS (REGNO (operands[0]))
+	== REGNO_REG_CLASS (REGNO (operands[3])) &&
+   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
+	== GET_MODE_SIZE (<JOINLDST1:MODE>mode))"
+
+  [(parallel [(set (match_dup 0)
+		   (match_dup 2))
+	      (set (match_dup 3)
+		   (match_dup 4))])]
+  {
+    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
+    operands[2] = SET_SRC (PATTERN (curr_insn));
+  })
+
+;; 2 SI/SF/DF stores are joined.
+(define_peephole2
+  [(set (mem:JOINLDST1 (plus:SI (match_operand:SI 0 "register_operand")
+				(match_operand:SI 1 "const_int_operand")))
+	(match_operand:JOINLDST1 2 "register_operand"))
+   (set (mem:JOINLDST1 (plus:SI (match_dup 0)
+				(match_operand:SI 3 "const_int_operand")))
+	(match_operand:JOINLDST1 4 "register_operand"))]
+  "ENABLE_LD_ST_PAIRING &&
+   REGNO_REG_CLASS (REGNO (operands[2]))
+	== REGNO_REG_CLASS (REGNO (operands[4])) &&
+   (abs (INTVAL (operands[1]) - INTVAL (operands[3]))
+	== GET_MODE_SIZE (<JOINLDST1:MODE>mode))"
+  [(parallel [(set (match_dup 1)
+		   (match_dup 2))
+	      (set (match_dup 3)
+		   (match_dup 4))])]
+  {
+    operands[3] = SET_DEST (PATTERN (next_active_insn (curr_insn)));
+    operands[1] = SET_DEST (PATTERN (curr_insn));
+  })
+
+(define_insn "*join2_storehi"
+  [(parallel
+    [(set (match_operand:HI 0 "memory_operand" "=m")
+	  (match_operand:HI 1 "register_operand" "r"))
+     (set (match_operand:HI 2 "memory_operand" "=m")
+	  (match_operand:HI 3 "register_operand" "r"))])]
+  "ENABLE_LD_ST_PAIRING && reload_completed"
+  {
+    output_asm_insn (mips_output_move (operands[0], operands[1]), operands);
+    output_asm_insn (mips_output_move (operands[2], operands[3]), &operands[2]);
+    return "";
+  }
+  [(set_attr "move_type" "store")
+   (set_attr_alternative "insn_count"
+	[(mult	(symbol_ref "mips_load_store_insns (operands[0], insn)")
+		(const_int 2))])])
+
+(define_insn "*join2_loadhi"
+  [(parallel
+    [(set (match_operand:SI 0 "register_operand" "=r")
+	  (any_extend:SI (match_operand:HI 1 "memory_operand" "m")))
+     (set (match_operand:SI 2 "register_operand" "=r")
+	  (any_extend:SI (match_operand:HI 3 "memory_operand" "m")))])]
+  "ENABLE_LD_ST_PAIRING && reload_completed"
+  {
+    output_asm_insn ("lh<u>\t%0,%1", operands);
+    output_asm_insn ("lh<u>\t%2,%3", operands);
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr_alternative "insn_count"
+	[(mult	(symbol_ref "mips_load_store_insns (operands[1], insn)")
+		(const_int 2))])])
+
+
+;; 2 16 bit integer loads are joined.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+	(any_extend:SI (mem:HI (plus:SI
+				  (match_operand:SI 1 "register_operand")
+				  (match_operand:SI 2 "const_int_operand")))))
+   (set (match_operand:SI 3 "register_operand")
+	(any_extend:SI (mem:HI (plus:SI
+				  (match_dup 1)
+				  (match_operand:SI 4 "const_int_operand")))))]
+  "ENABLE_LD_ST_PAIRING &&
+   REGNO_REG_CLASS (REGNO (operands[0]))
+	== REGNO_REG_CLASS (REGNO (operands[3])) &&
+   (abs (INTVAL (operands[2]) - INTVAL (operands[4]))
+	== GET_MODE_SIZE (HImode))"
+
+  [(parallel [(set (match_dup 0)
+		   (match_dup 2))
+	      (set (match_dup 3)
+		   (match_dup 4))])]
+  {
+    operands[4] = SET_SRC (PATTERN (next_active_insn (curr_insn)));
+    operands[2] = SET_SRC (PATTERN (curr_insn));
+  })
+
+;; 2 16 bit integer stores are joined.
+(define_peephole2
+  [(set (mem:HI (plus:SI (match_operand:SI 0 "register_operand")
+			 (match_operand:SI 1 "const_int_operand")))
+	(match_operand:HI 2 "register_operand"))
+   (set (mem:HI (plus:SI (match_dup 0)
+			 (match_operand:SI 3 "const_int_operand")))
+	(match_operand:HI 4 "register_operand"))]
+  "ENABLE_LD_ST_PAIRING &&
+   REGNO_REG_CLASS (REGNO (operands[2]))
+	== REGNO_REG_CLASS (REGNO (operands[4])) &&
+   (abs (INTVAL (operands[1]) - INTVAL (operands[3]))
+	== GET_MODE_SIZE (HImode))"
+  [(parallel [(set (match_dup 1)
+		   (match_dup 2))
+	      (set (match_dup 3)
+		   (match_dup 4))])]
+  {
+    operands[3] = SET_DEST (PATTERN (next_active_insn (curr_insn)));
+    operands[1] = SET_DEST (PATTERN (curr_insn));
+  })
+
 \f
 ;; Synchronization instructions.
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index b9cfd62..d4135cf 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -445,3 +445,7 @@ Enum(mips_lib_setting) String(tiny) Value(MIPS_LIB_TINY)
 
 msched-weight
 Target Report Var(TARGET_SCHED_WEIGHT) Undocumented
+
+mld-st-pairing
+Target Report Var(TARGET_ENABLE_LD_ST_PAIRING)
+Enable load/store pairing

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-05-13  6:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <38C8F1E431EDD94A82971C543A11B4FEE0A588@PUMAIL01.pu.imgtec.org>
2014-06-21 16:25 ` [PATCH][MIPS] Enable load-load/store-store bonding Richard Sandiford
2014-06-23  9:18   ` Sameera Deshpande
2014-06-23  9:41     ` Richard Sandiford
2014-06-24 10:42   ` Sameera Deshpande
2015-03-30 11:28     ` sameera
2015-04-20  5:09       ` sameera
2015-04-20 18:30         ` Mike Stump
2015-04-20 19:09         ` Matthew Fortune
2015-04-20 22:01           ` Moore, Catherine
2015-05-11 11:05           ` sameera
2015-05-11 12:13             ` Matthew Fortune
2015-05-11 12:39               ` sameera
2015-05-11 16:40             ` Mike Stump
2015-05-13  7:39               ` sameera
2014-06-19  9:41 Sameera Deshpande

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).