Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
@ 2008-11-07  8:23 Ruan Beihong
  2008-11-10 22:16 ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Ruan Beihong @ 2008-11-07  8:23 UTC (permalink / raw)
  To: rdsandiford; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

2008/11/6, Ruan Beihong <ruanbeihong@gmail.com>:
> Hi Richard,
> It has been so many days since last talk. I have signed a assignment
> now (due to my fault that I'd given a wrong address and also the slow
> mailing service between USA and China), and the assignment is now on
> its way to Boston for a few days.
> So can the patch be accept now?
> I still think that the loongson2ef's <d>multu.g should be divided into
> a new type of insn (I call it mul3nc), since they don't change or use
> the HI/LO register.
>
> --
> James Ruan
>

I've got the reply from FSF. My assignment is completed.
-- 
James Ruan

[-- Attachment #2: new_i_insn_and_prefetch.patch.gz --]
[-- Type: application/x-gzip, Size: 2426 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
  2008-11-07  8:23 [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added Ruan Beihong
@ 2008-11-10 22:16 ` Richard Sandiford
  2008-11-11 13:20   ` Ruan Beihong
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2008-11-10 22:16 UTC (permalink / raw)
  To: Ruan Beihong; +Cc: gcc-patches

"Ruan Beihong" <ruanbeihong@gmail.com> writes:
> 2008/11/6, Ruan Beihong <ruanbeihong@gmail.com>:
>> Hi Richard,
>> It has been so many days since last talk. I have signed a assignment
>> now (due to my fault that I'd given a wrong address and also the slow
>> mailing service between USA and China), and the assignment is now on
>> its way to Boston for a few days.
>> So can the patch be accept now?
>> I still think that the loongson2ef's <d>multu.g should be divided into
>> a new type of insn (I call it mul3nc), since they don't change or use
>> the HI/LO register.
>>
>> --
>> James Ruan
>>
>
> I've got the reply from FSF. My assignment is completed.

Great!  Thanks for going through the process.

> +(define_insn "<u>div<mode>3"
> +  [(set (match_operand:GPR 0 "register_operand" "=d")
> +	(any_div:GPR
> +		(match_operand:GPR 1 "register_operand" "d")
> +		(match_operand:GPR 2 "register_operand" "d")))]
> +  "TARGET_LOONGSON_2EF"
> +  { return mips_output_division ("<d>div<u>.g\t%0,%1,%2", operands); }
> +  [(set_attr "type" "idiv3")
> +   (set_attr "mode" "<MODE>")]
> +)
> +
> +(define_insn "<u>mod<mode>3"
> +  [(set (match_operand:GPR 0 "register_operand" "=d")
> +	(any_mod:GPR
> +		(match_operand:GPR 1 "register_operand" "d")
> +		(match_operand:GPR 2 "register_operand" "d")))]
> +  "TARGET_LOONGSON_2EF"
> +  { return mips_output_division ("<d>mod<u>.g\t%0,%1,%2", operands); }
> +  [(set_attr "type" "idiv3")
> +   (set_attr "mode" "<MODE>")]

As Eric says, these instructions need to be earlyclobber ("=&d" rather
than "=d") because the divisor might be checked for zero after the
division itself.

> @@ -1358,7 +1365,12 @@
>    ""
>  {
>    if (ISA_HAS_<D>MUL3)
> -    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
> +  {
> +	if(TARGET_LOONGSON_2EF)
> +		emit_insn(gen_mul<mode>3_mul3_ls2ef(operands[0],operands[1],operands[2]));
> +	else
> +    		emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
> +  }
>    else if (TARGET_FIX_R4000)
>      emit_insn (gen_mul<mode>3_r4000 (operands[0], operands[1], operands[2]));
>    else

Formatting; this should be:

    {
      if (TARGET_LOONGSON_2EF)
	emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
					      operands[2]));
      else
	emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
    }

(GCC's quite strict about formatting.)  But I don't see any reason to make
the TARGET_LOONSON_2EF code depend on ISA_HAS_<D>MUL3; it seems easier
to have:

  if (TARGET_LOONGSON_2EF)
    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
					  operands[2]));
  else if (ISA_HAS_<D>MUL3)
    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
  else if (TARGET_FIX_R4000)
    ...

instead.

> +(define_insn "mul<mode>3_mul3_ls2ef"
> +  [(set (match_operand:GPR 0 "register_operand" "=d")
> +        (mult:GPR (match_operand:GPR 1 "register_operand" "d")
> +                  (match_operand:GPR 2 "register_operand" "d")))]
> +  "TARGET_LOONGSON_2EF"
> +{
> +  return "<d>multu.g\t%0,%1,%2";
> +}

No need for C code here.  Just use:

  (define_insn "mul<mode>3_mul3_ls2ef"
    [(set (match_operand:GPR 0 "register_operand" "=d")
          (mult:GPR (match_operand:GPR 1 "register_operand" "d")
                    (match_operand:GPR 2 "register_operand" "d")))]
    "TARGET_LOONGSON_2EF"
    "<d>multu.g\t%0,%1,%2"

> +(define_peephole2
> +  [(set (match_operand:SI 0 "lo_operand")
> +        (mult:SI (match_operand:SI 1 "d_operand")
> +                 (match_operand:SI 2 "d_operand")))
> +   (set (match_operand:SI 3 "d_operand")
> +        (match_dup 0))]
> +   "TARGET_LOONGSON_2EF && peep2_reg_dead_p(2, operands[0])"
> +  [(set (match_dup 3)
> +        (mult:SI (match_dup 1)
> +                 (match_dup 2)))]

When do we need this?  How do we get a LO-target multiplication
after the mul<mode>3 change above?

> @@ -6128,6 +6163,9 @@
>  	     (match_operand 2 "const_int_operand" "n"))]
>    "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
>  {
> +  if(TARGET_LOONGSON_2EF)
> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
> +    return "ld\t$0,%a0";

Formatting; should be:

  if (TARGET_LOONGSON_2EF)
    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
    return "ld\t$0,%a0";

>  /* ISA has a three-operand multiplication instruction.  */
>  #define ISA_HAS_DMUL3		(TARGET_64BIT				\
> -				 && TARGET_OCTEON			\
> +				 && (TARGET_OCTEON			\
> +                                 ||  TARGET_LOONGSON_2EF)		\
>  				 && !TARGET_MIPS16)

Not needed after the change I suggested above.
 
Looks good otherwise, thanks.

Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
  2008-11-10 22:16 ` Richard Sandiford
@ 2008-11-11 13:20   ` Ruan Beihong
  2008-11-11 22:53     ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Ruan Beihong @ 2008-11-11 13:20 UTC (permalink / raw)
  To: Ruan Beihong, gcc-patches, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 6175 bytes --]

2008/11/11 Richard Sandiford <rdsandiford@googlemail.com>:
> "Ruan Beihong" <ruanbeihong@gmail.com> writes:
>> 2008/11/6, Ruan Beihong <ruanbeihong@gmail.com>:
>>> Hi Richard,
>>> It has been so many days since last talk. I have signed a assignment
>>> now (due to my fault that I'd given a wrong address and also the slow
>>> mailing service between USA and China), and the assignment is now on
>>> its way to Boston for a few days.
>>> So can the patch be accept now?
>>> I still think that the loongson2ef's <d>multu.g should be divided into
>>> a new type of insn (I call it mul3nc), since they don't change or use
>>> the HI/LO register.
>>>
>>> --
>>> James Ruan
>>>
>>
>> I've got the reply from FSF. My assignment is completed.
>
> Great!  Thanks for going through the process.
>
>> +(define_insn "<u>div<mode>3"
>> +  [(set (match_operand:GPR 0 "register_operand" "=d")
>> +     (any_div:GPR
>> +             (match_operand:GPR 1 "register_operand" "d")
>> +             (match_operand:GPR 2 "register_operand" "d")))]
>> +  "TARGET_LOONGSON_2EF"
>> +  { return mips_output_division ("<d>div<u>.g\t%0,%1,%2", operands); }
>> +  [(set_attr "type" "idiv3")
>> +   (set_attr "mode" "<MODE>")]
>> +)
>> +
>> +(define_insn "<u>mod<mode>3"
>> +  [(set (match_operand:GPR 0 "register_operand" "=d")
>> +     (any_mod:GPR
>> +             (match_operand:GPR 1 "register_operand" "d")
>> +             (match_operand:GPR 2 "register_operand" "d")))]
>> +  "TARGET_LOONGSON_2EF"
>> +  { return mips_output_division ("<d>mod<u>.g\t%0,%1,%2", operands); }
>> +  [(set_attr "type" "idiv3")
>> +   (set_attr "mode" "<MODE>")]
>
> As Eric says, these instructions need to be earlyclobber ("=&d" rather
> than "=d") because the divisor might be checked for zero after the
> division itself.
>
>> @@ -1358,7 +1365,12 @@
>>    ""
>>  {
>>    if (ISA_HAS_<D>MUL3)
>> -    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>> +  {
>> +     if(TARGET_LOONGSON_2EF)
>> +             emit_insn(gen_mul<mode>3_mul3_ls2ef(operands[0],operands[1],operands[2]));
>> +     else
>> +             emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>> +  }
>>    else if (TARGET_FIX_R4000)
>>      emit_insn (gen_mul<mode>3_r4000 (operands[0], operands[1], operands[2]));
>>    else
>
> Formatting; this should be:
>
>    {
>      if (TARGET_LOONGSON_2EF)
>        emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>                                              operands[2]));
>      else
>        emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>    }
>
> (GCC's quite strict about formatting.)  But I don't see any reason to make
> the TARGET_LOONSON_2EF code depend on ISA_HAS_<D>MUL3; it seems easier
> to have:
>
>  if (TARGET_LOONGSON_2EF)
>    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>                                          operands[2]));
>  else if (ISA_HAS_<D>MUL3)
>    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>  else if (TARGET_FIX_R4000)
>    ...
>
> instead.
>
>> +(define_insn "mul<mode>3_mul3_ls2ef"
>> +  [(set (match_operand:GPR 0 "register_operand" "=d")
>> +        (mult:GPR (match_operand:GPR 1 "register_operand" "d")
>> +                  (match_operand:GPR 2 "register_operand" "d")))]
>> +  "TARGET_LOONGSON_2EF"
>> +{
>> +  return "<d>multu.g\t%0,%1,%2";
>> +}
>
> No need for C code here.  Just use:
>
>  (define_insn "mul<mode>3_mul3_ls2ef"
>    [(set (match_operand:GPR 0 "register_operand" "=d")
>          (mult:GPR (match_operand:GPR 1 "register_operand" "d")
>                    (match_operand:GPR 2 "register_operand" "d")))]
>    "TARGET_LOONGSON_2EF"
>    "<d>multu.g\t%0,%1,%2"
>
>> +(define_peephole2
>> +  [(set (match_operand:SI 0 "lo_operand")
>> +        (mult:SI (match_operand:SI 1 "d_operand")
>> +                 (match_operand:SI 2 "d_operand")))
>> +   (set (match_operand:SI 3 "d_operand")
>> +        (match_dup 0))]
>> +   "TARGET_LOONGSON_2EF && peep2_reg_dead_p(2, operands[0])"
>> +  [(set (match_dup 3)
>> +        (mult:SI (match_dup 1)
>> +                 (match_dup 2)))]
>
> When do we need this?  How do we get a LO-target multiplication
> after the mul<mode>3 change above?
>
>> @@ -6128,6 +6163,9 @@
>>            (match_operand 2 "const_int_operand" "n"))]
>>    "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
>>  {
>> +  if(TARGET_LOONGSON_2EF)
>> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
>> +    return "ld\t$0,%a0";
>
> Formatting; should be:
>
>  if (TARGET_LOONGSON_2EF)
>    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
>    return "ld\t$0,%a0";
>
>>  /* ISA has a three-operand multiplication instruction.  */
>>  #define ISA_HAS_DMUL3                (TARGET_64BIT                           \
>> -                              && TARGET_OCTEON                       \
>> +                              && (TARGET_OCTEON                      \
>> +                                 ||  TARGET_LOONGSON_2EF)            \
>>                                && !TARGET_MIPS16)
>
> Not needed after the change I suggested above.
>
> Looks good otherwise, thanks.
>
> Richard
>

In traditional mips ISA, mult's result is stored in LO/HI.
If we need to get the result, we need an extra mflo.
Loongson2's multu.g is used to save the extra mflo in such case.
So I want use this peephole2 to turn insns like:
 mult $x, $y
 mflo $z
into
 multu.g $z, $x, $y
when the LO is used only once in such case.
I thought the peep2_reg_dead_p(2, operands[0]) means the LO will be
used only once, isn't it?
>> +(define_peephole2
>> +  [(set (match_operand:SI 0 "lo_operand")
>> +        (mult:SI (match_operand:SI 1 "d_operand")
>> +                 (match_operand:SI 2 "d_operand")))
>> +   (set (match_operand:SI 3 "d_operand")
>> +        (match_dup 0))]
>> +   "TARGET_LOONGSON_2EF && peep2_reg_dead_p(2, operands[0])"
>> +  [(set (match_dup 3)
>> +        (mult:SI (match_dup 1)
> +                 (match_dup 2)))]
>
>When do we need this?  How do we get a LO-target multiplication
>after the mul<mode>3 change above?

Other part has changed.
Thanks.
-- 
James Ruan

[-- Attachment #2: new_i_insn_and_prefetch.patch.gz --]
[-- Type: application/x-gzip, Size: 2271 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
  2008-11-11 13:20   ` Ruan Beihong
@ 2008-11-11 22:53     ` Richard Sandiford
  2008-11-12 10:50       ` Ruan Beihong
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2008-11-11 22:53 UTC (permalink / raw)
  To: Ruan Beihong; +Cc: gcc-patches

Thanks for the update.

"Ruan Beihong" <ruanbeihong@gmail.com> writes:
> 2008/11/11 Richard Sandiford <rdsandiford@googlemail.com>:
>> Formatting; this should be:
>>
>>    {
>>      if (TARGET_LOONGSON_2EF)
>>        emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>>                                              operands[2]));
>>      else
>>        emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>>    }
>>
>> (GCC's quite strict about formatting.)  But I don't see any reason to make
>> the TARGET_LOONSON_2EF code depend on ISA_HAS_<D>MUL3; it seems easier
>> to have:
>>
>>  if (TARGET_LOONGSON_2EF)
>>    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>>                                          operands[2]));
>>  else if (ISA_HAS_<D>MUL3)
>>    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
>>  else if (TARGET_FIX_R4000)
>>    ...

In the patch you had:

  if(TARGET_LOONGSON_2EF)
    emit_insn(gen_mul<mode>3_mul3_ls2ef(operands[0], operands[1], operands[2]));

instead, but the coding standards require spaces before the "("s.
(See what I meant when I said GCC was quite strict about formatting? ;))
The code really should be formatted the way I had it above.

(Lines shouldn't be longer than 80 characters, which is why I split
the call over two lines.)

I know this must seem overly picky, but bear in mind that, if a hundred
people made changes in their own preferred style, the whole thing would
become a right mess.

>>> +  if(TARGET_LOONGSON_2EF)
>>> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
>>> +    return "ld\t$0,%a0";
>>
>> Formatting; should be:
>>
>>  if (TARGET_LOONGSON_2EF)
>>    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
>>    return "ld\t$0,%a0";

Looks like you forgot this bit.  (Comments should start with a
captial letter and end with ".  ".)

>>> +(define_peephole2
>>> +  [(set (match_operand:SI 0 "lo_operand")
>>> +        (mult:SI (match_operand:SI 1 "d_operand")
>>> +                 (match_operand:SI 2 "d_operand")))
>>> +   (set (match_operand:SI 3 "d_operand")
>>> +        (match_dup 0))]
>>> +   "TARGET_LOONGSON_2EF && peep2_reg_dead_p(2, operands[0])"
>>> +  [(set (match_dup 3)
>>> +        (mult:SI (match_dup 1)
>>> +                 (match_dup 2)))]
>>
>> When do we need this?  How do we get a LO-target multiplication
>> after the mul<mode>3 change above?
>
> In traditional mips ISA, mult's result is stored in LO/HI.
> If we need to get the result, we need an extra mflo.
> Loongson2's multu.g is used to save the extra mflo in such case.
> So I want use this peephole2 to turn insns like:
>  mult $x, $y
>  mflo $z
> into
>  multu.g $z, $x, $y
> when the LO is used only once in such case.

Sure, but what I meant was: define_peephole2s operate on instructions
that have already been generated, and that already match some define_insn.
So this define_peephole2 is only necessary if we can still sometimes
generate a "normal" LO-based multiplication INSN for Loongson.
So how could that happen after your changes to the mul<mode>3 expanders?
Surely we should always be using mul<mode>3_mul3_ls2ef for Loongson,
in which case every multiplication will already write to a GPR
instead of LO.

Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
  2008-11-11 22:53     ` Richard Sandiford
@ 2008-11-12 10:50       ` Ruan Beihong
  2008-11-13 23:11         ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Ruan Beihong @ 2008-11-12 10:50 UTC (permalink / raw)
  To: Ruan Beihong, gcc-patches, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 3768 bytes --]

I understand that.
Formatting is very important.
I think  this update attached will be nicer.
As to the peephole2, did you mean that every mult would be
 replace with multu.g unless it's the LO is the demanding target,
so that no peephole2 is need for it? If so, I deleted this part.

2008/11/12, Richard Sandiford <rdsandiford@googlemail.com>:
> Thanks for the update.
>
> "Ruan Beihong" <ruanbeihong@gmail.com> writes:
>> 2008/11/11 Richard Sandiford <rdsandiford@googlemail.com>:
>>> Formatting; this should be:
>>>
>>>    {
>>>      if (TARGET_LOONGSON_2EF)
>>>        emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>>>                                              operands[2]));
>>>      else
>>>        emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1],
>>> operands[2]));
>>>    }
>>>
>>> (GCC's quite strict about formatting.)  But I don't see any reason to
>>> make
>>> the TARGET_LOONSON_2EF code depend on ISA_HAS_<D>MUL3; it seems easier
>>> to have:
>>>
>>>  if (TARGET_LOONGSON_2EF)
>>>    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
>>>                                          operands[2]));
>>>  else if (ISA_HAS_<D>MUL3)
>>>    emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1],
>>> operands[2]));
>>>  else if (TARGET_FIX_R4000)
>>>    ...
>
> In the patch you had:
>
>   if(TARGET_LOONGSON_2EF)
>     emit_insn(gen_mul<mode>3_mul3_ls2ef(operands[0], operands[1],
> operands[2]));
>
> instead, but the coding standards require spaces before the "("s.
> (See what I meant when I said GCC was quite strict about formatting? ;))
> The code really should be formatted the way I had it above.
>
> (Lines shouldn't be longer than 80 characters, which is why I split
> the call over two lines.)
>
> I know this must seem overly picky, but bear in mind that, if a hundred
> people made changes in their own preferred style, the whole thing would
> become a right mess.
>
>>>> +  if(TARGET_LOONGSON_2EF)
>>>> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
>>>> +    return "ld\t$0,%a0";
>>>
>>> Formatting; should be:
>>>
>>>  if (TARGET_LOONGSON_2EF)
>>>    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
>>>    return "ld\t$0,%a0";
>
> Looks like you forgot this bit.  (Comments should start with a
> captial letter and end with ".  ".)
>
>>>> +(define_peephole2
>>>> +  [(set (match_operand:SI 0 "lo_operand")
>>>> +        (mult:SI (match_operand:SI 1 "d_operand")
>>>> +                 (match_operand:SI 2 "d_operand")))
>>>> +   (set (match_operand:SI 3 "d_operand")
>>>> +        (match_dup 0))]
>>>> +   "TARGET_LOONGSON_2EF && peep2_reg_dead_p(2, operands[0])"
>>>> +  [(set (match_dup 3)
>>>> +        (mult:SI (match_dup 1)
>>>> +                 (match_dup 2)))]
>>>
>>> When do we need this?  How do we get a LO-target multiplication
>>> after the mul<mode>3 change above?
>>
>> In traditional mips ISA, mult's result is stored in LO/HI.
>> If we need to get the result, we need an extra mflo.
>> Loongson2's multu.g is used to save the extra mflo in such case.
>> So I want use this peephole2 to turn insns like:
>>  mult $x, $y
>>  mflo $z
>> into
>>  multu.g $z, $x, $y
>> when the LO is used only once in such case.
>
> Sure, but what I meant was: define_peephole2s operate on instructions
> that have already been generated, and that already match some define_insn.
> So this define_peephole2 is only necessary if we can still sometimes
> generate a "normal" LO-based multiplication INSN for Loongson.
> So how could that happen after your changes to the mul<mode>3 expanders?
> Surely we should always be using mul<mode>3_mul3_ls2ef for Loongson,
> in which case every multiplication will already write to a GPR
> instead of LO.
>
> Richard
>


-- 
James Ruan

[-- Attachment #2: new_i_insn_and_prefetch.patch.gz --]
[-- Type: application/x-gzip, Size: 2158 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added
  2008-11-12 10:50       ` Ruan Beihong
@ 2008-11-13 23:11         ` Richard Sandiford
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Sandiford @ 2008-11-13 23:11 UTC (permalink / raw)
  To: Ruan Beihong; +Cc: gcc-patches

"Ruan Beihong" <ruanbeihong@gmail.com> writes:
> I understand that.
> Formatting is very important.
> I think  this update attached will be nicer.

Applied, thanks.  And thanks for your patience.

For the record...

>>>>> +  if(TARGET_LOONGSON_2EF)
>>>>> +  /*Loongson 2[ef] use load to $0 to perform prefetching*/
>>>>> +    return "ld\t$0,%a0";
>>>>
>>>> Formatting; should be:
>>>>
>>>>  if (TARGET_LOONGSON_2EF)
>>>>    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
>>>>    return "ld\t$0,%a0";
>>
>> Looks like you forgot this bit.  (Comments should start with a
>> captial letter and end with ".  ".)

...this still wasn't right; the new patch had:

  /*Loongson 2[ef] use load to $0 to perform prefetching*/

but it should have been:

  /* Loongson 2[ef] use load to $0 to perform prefetching.  */

See the "Coding Standards" part of:

  http://gcc.gnu.org/contribute.html

for details about stuff like this.

Also, the "imul3nc" change to the "length" attribute wasn't needed;
that bit of code is there to work around VR4120 DMULT errata.
(The existing "imul3" case is redundant too.  It was added as part of a
mechanical split of "imul" into "imul" and "imul3".  I'll try to fix it
sometime.)

I fixed the two problems above, tweaked a few other bits of formatting
and wrote a changelog.  I've attached the final (applied) version.

I also added a couple of testcases, also attached.

> As to the peephole2, did you mean that every mult would be
>  replace with multu.g unless it's the LO is the demanding target,
> so that no peephole2 is need for it? If so, I deleted this part.

No, what I meant was: GCC is not allowed to (or should not be allowed to)
use MULT in _any_ circumstances on a Loongson target, even without the
define_peephole2.

In other words, the peephole2 looked for the following sequence of
instructions:

  I1: a multiplication that sets LO
  I2: a move from LO to a GPR

But how would we ever get I1 for Loongson targets?  When compiling for a
Loongson target, GCC is supposed to use your new mul<mode>3_mul3_ls2ef
define_insn for _every_ multiplication.  And your define_insn requires
the target to be a GPR; it does not allow multiplications to set LO.

I think you based the define_peephole2 on this one:

  ;; If a register gets allocated to LO, and we spill to memory, the reload
  ;; will include a move from LO to a GPR.  Merge it into the multiplication
  ;; if it can set the GPR directly.
  ;;
  ;; Operand 0: LO
  ;; Operand 1: GPR (1st multiplication operand)
  ;; Operand 2: GPR (2nd multiplication operand)
  ;; Operand 3: GPR (destination)
  (define_peephole2
    [(parallel
         [(set (match_operand:SI 0 "lo_operand")
               (mult:SI (match_operand:SI 1 "d_operand")
                        (match_operand:SI 2 "d_operand")))
          (clobber (scratch:SI))])
     (set (match_operand:SI 3 "d_operand")
          (match_dup 0))]
    "ISA_HAS_MUL3 && peep2_reg_dead_p (2, operands[0])"
    [(parallel
         [(set (match_dup 3)
               (mult:SI (match_dup 1)
                        (match_dup 2)))
          (clobber (match_dup 0))])])

But this peephole2 is for the following define_insn:

  (define_insn "mul<mode>3_mul3"
    [(set (match_operand:GPR 0 "register_operand" "=d,l")
          (mult:GPR (match_operand:GPR 1 "register_operand" "d,d")
                    (match_operand:GPR 2 "register_operand" "d,d")))
     (clobber (match_scratch:GPR 3 "=l,X"))]
    "ISA_HAS_<D>MUL3"
  {
    if (which_alternative == 1)
      return "<d>mult\t%1,%2";
    if (<MODE>mode == SImode && TARGET_MIPS3900)
      return "mult\t%0,%1,%2";
    return "<d>mul\t%0,%1,%2";
  }
    [(set_attr "type" "imul3,imul")
     (set_attr "mode" "<MODE>")])

Notice how this pattern allows the destination be either a GPR (the first
alternative) _or_ LO (the second alternative).  So the define_insn allows
GCC to generate either MUL _or_ MULT.

Sometimes GCC won't make a good decision: sometimes it will choose
the MULT alternative and then move the result from LO to a GPR.
(Hopefully this is rare.)  The define_peephole2 then optimises
"MULT; MFLO" into "MUL".

On the other hand, your new Loongson pattern is:

  (define_insn "mul<mode>3_mul3_ls2ef"
    [(set (match_operand:GPR 0 "register_operand" "=d")
          (mult:GPR (match_operand:GPR 1 "register_operand" "d")
                    (match_operand:GPR 2 "register_operand" "d")))]
    "TARGET_LOONGSON_2EF"
    "<d>multu.g\t%0,%1,%2"
    [(set_attr "type" "imul3nc")
     (set_attr "mode" "<MODE>")])

This pattern has only one alternative.  The instruction must be <D>MULTU.G,
and the destination must be a GPR.  It is not possible (or should not be
possible) for GCC to decide to use <D>MULT instead.  And if it isn't possible
for GCC to use <D>MULT instead, there should be no "<D>MULT; MFLO" sequences
for the define_peephole2 to optimise.

Richard


gcc/
2008-11-13  Ruan Beihong  <ruanbeihong@gmail.com>

	* config/mips/loongson.md (<u>div<mode>3, <u>mod<mode>3): New patterns.
	* config/mips/loongson2ef.md (ls2_imult): Handle imul3nc.
	(ls2_idiv): Likewise idiv3.
	(ls2_prefetch): New reservation.
	* config/mips/mips.h (ISA_HAS_PREFETCH): Add TARGET_LOONGSON_2EF.
	* config/mips/mips.md (type): Add imul3nc and idiv3.
	(length): Handle idiv3.
	(any_mod): New code_iterator.
	(u): Handle MOD and UMOD.
	(mul<mode>3): Generate mul<mode>3_mul3_ls2ef on Loongson targets.
	(prefetch): Handle TARGET_LOONGSON_2EF.

Index: gcc/config/mips/loongson.md
===================================================================
--- gcc/config/mips/loongson.md	(révision 141833)
+++ gcc/config/mips/loongson.md	(copie de travail)
@@ -473,3 +473,23 @@ (define_insn "vec_interleave_low<mode>"
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
   "punpckl<V_stretch_half_suffix>\t%0,%1,%2"
   [(set_attr "type" "fdiv")])
+
+;; Integer division and modulus.
+
+(define_insn "<u>div<mode>3"
+  [(set (match_operand:GPR 0 "register_operand" "=&d")
+	(any_div:GPR (match_operand:GPR 1 "register_operand" "d")
+		     (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  { return mips_output_division ("<d>div<u>.g\t%0,%1,%2", operands); }
+  [(set_attr "type" "idiv3")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "<u>mod<mode>3"
+  [(set (match_operand:GPR 0 "register_operand" "=&d")
+	(any_mod:GPR (match_operand:GPR 1 "register_operand" "d")
+		     (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  { return mips_output_division ("<d>mod<u>.g\t%0,%1,%2", operands); }
+  [(set_attr "type" "idiv3")
+   (set_attr "mode" "<MODE>")])
Index: gcc/config/mips/loongson2ef.md
===================================================================
--- gcc/config/mips/loongson2ef.md	(révision 141833)
+++ gcc/config/mips/loongson2ef.md	(copie de travail)
@@ -160,14 +160,14 @@ (define_insn_reservation "ls2_branch" 2
 ;; Reservation for integer multiplication instructions.
 (define_insn_reservation "ls2_imult" 5
   (and (eq_attr "cpu" "loongson_2e,loongson_2f")
-       (eq_attr "type" "imul,imul3"))
+       (eq_attr "type" "imul,imul3nc"))
   "ls2_alu2,ls2_alu2_core")
 
 ;; Reservation for integer division / remainder instructions.
 ;; These instructions use the SRT algorithm and hence take 2-38 cycles.
 (define_insn_reservation "ls2_idiv" 20
   (and (eq_attr "cpu" "loongson_2e,loongson_2f")
-       (eq_attr "type" "idiv"))
+       (eq_attr "type" "idiv,idiv3"))
   "ls2_alu2,ls2_alu2_core*18")
 
 ;; Reservation for memory load instructions.
@@ -176,6 +176,11 @@ (define_insn_reservation "ls2_load" 5
        (eq_attr "type" "load,fpload,mfc,mtc"))
   "ls2_mem")
 
+(define_insn_reservation "ls2_prefetch" 0
+  (and (eq_attr "cpu" "loongson_2e,loongson_2f")
+       (eq_attr "type" "prefetch,prefetchx"))
+  "ls2_mem")
+
 ;; Reservation for memory store instructions.
 ;; With stores we assume they don't alias with dependent loads.
 ;; Therefore we set the latency to zero.
Index: gcc/config/mips/mips.h
===================================================================
--- gcc/config/mips/mips.h	(révision 141833)
+++ gcc/config/mips/mips.h	(copie de travail)
@@ -919,6 +919,7 @@ #define ISA_HAS_ROR		((ISA_MIPS32R2				\
 
 /* ISA has data prefetch instructions.  This controls use of 'pref'.  */
 #define ISA_HAS_PREFETCH	((ISA_MIPS4				\
+				  || TARGET_LOONGSON_2EF		\
 				  || ISA_MIPS32				\
 				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64				\
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(révision 141833)
+++ gcc/config/mips/mips.md	(copie de travail)
@@ -351,8 +351,10 @@ (define_attr "dword_mode" "no,yes"
 ;; trap		trap if instructions
 ;; imul		integer multiply 2 operands
 ;; imul3	integer multiply 3 operands
+;; imul3nc	integer multiply 3 operands without clobbering HI/LO
 ;; imadd	integer multiply-add
-;; idiv		integer divide
+;; idiv		integer divide 2 operands
+;; idiv3	integer divide 3 operands
 ;; move		integer register move ({,D}ADD{,U} with rt = 0)
 ;; fmove	floating point register move
 ;; fadd		floating point add/subtract
@@ -376,9 +378,9 @@ (define_attr "dword_mode" "no,yes"
 (define_attr "type"
   "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore,
    prefetch,prefetchx,condmove,mtc,mfc,mthilo,mfhilo,const,arith,logical,
-   shift,slt,signext,clz,pop,trap,imul,imul3,imadd,idiv,move,fmove,fadd,fmul,
-   fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,frsqrt,frsqrt1,
-   frsqrt2,multi,nop,ghost"
+   shift,slt,signext,clz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
+   fmove,fadd,fmul,fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,
+   frsqrt,frsqrt1,frsqrt2,multi,nop,ghost"
   (cond [(eq_attr "jal" "!unset") (const_string "call")
 	 (eq_attr "got" "load") (const_string "load")
 
@@ -553,7 +555,7 @@ (define_attr "length" ""
 		    (ne (symbol_ref "TARGET_FIX_VR4120") (const_int 0))))
 	  (const_int 8)
 
-	  (eq_attr "type" "idiv")
+	  (eq_attr "type" "idiv,idiv3")
 	  (symbol_ref "mips_idiv_insns () * 4")
 	  ] (const_int 4)))
 
@@ -787,6 +789,10 @@ (define_code_iterator any_shift [ashift 
 ;; from the same template.
 (define_code_iterator any_div [div udiv])
 
+;; This code iterator allows unsigned and signed modulus to be generated
+;; from the same template.
+(define_code_iterator any_mod [mod umod])
+
 ;; This code iterator allows all native floating-point comparisons to be
 ;; generated from the same template.
 (define_code_iterator fcond [unordered uneq unlt unle eq lt le])
@@ -809,6 +815,7 @@ (define_code_iterator any_le [le leu])
 ;; "u" when doing an unsigned operation.
 (define_code_attr u [(sign_extend "") (zero_extend "u")
 		     (div "") (udiv "u")
+		     (mod "") (umod "u")
 		     (gt "") (gtu "u")
 		     (ge "") (geu "u")
 		     (lt "") (ltu "u")
@@ -1357,7 +1364,10 @@ (define_expand "mul<mode>3"
 		  (match_operand:GPR 2 "register_operand")))]
   ""
 {
-  if (ISA_HAS_<D>MUL3)
+  if (TARGET_LOONGSON_2EF)
+    emit_insn (gen_mul<mode>3_mul3_ls2ef (operands[0], operands[1],
+                                          operands[2]));
+  else if (ISA_HAS_<D>MUL3)
     emit_insn (gen_mul<mode>3_mul3 (operands[0], operands[1], operands[2]));
   else if (TARGET_FIX_R4000)
     emit_insn (gen_mul<mode>3_r4000 (operands[0], operands[1], operands[2]));
@@ -1367,6 +1377,15 @@ (define_expand "mul<mode>3"
   DONE;
 })
 
+(define_insn "mul<mode>3_mul3_ls2ef"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+        (mult:GPR (match_operand:GPR 1 "register_operand" "d")
+                  (match_operand:GPR 2 "register_operand" "d")))]
+  "TARGET_LOONGSON_2EF"
+  "<d>multu.g\t%0,%1,%2"
+  [(set_attr "type" "imul3nc")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "mul<mode>3_mul3"
   [(set (match_operand:GPR 0 "register_operand" "=d,l")
 	(mult:GPR (match_operand:GPR 1 "register_operand" "d,d")
@@ -6128,6 +6147,9 @@ (define_insn "prefetch"
 	     (match_operand 2 "const_int_operand" "n"))]
   "ISA_HAS_PREFETCH && TARGET_EXPLICIT_RELOCS"
 {
+  if (TARGET_LOONGSON_2EF)
+    /* Loongson 2[ef] use load to $0 to perform prefetching.  */
+    return "ld\t$0,%a0";
   operands[1] = mips_prefetch_cookie (operands[1], operands[2]);
   return "pref\t%1,%a0";
 }



gcc/testsuite/
	* gcc.target/mips/loongson-muldiv-1.c: New test.
	* gcc.target/mips/loongson-muldiv-2.c: Likewise.

Index: gcc/testsuite/gcc.target/mips/loongson-muldiv-1.c
===================================================================
--- /dev/null	2008-11-13 19:44:41.580098750 +0000
+++ gcc/testsuite/gcc.target/mips/loongson-muldiv-1.c	2008-11-13 22:35:42.000000000 +0000
@@ -0,0 +1,16 @@
+/* { dg-mips-options "-O2 -march=loongson2e" } */
+
+typedef int st;
+typedef unsigned int ut;
+
+NOMIPS16 st smul (st x, st y) { return x * y; }
+NOMIPS16 st sdiv (st x, st y) { return x / y + x % y; }
+
+NOMIPS16 ut umul (ut x, ut y) { return x * y; }
+NOMIPS16 ut udiv (ut x, ut y) { return x / y + x % y; }
+
+/* { dg-final { scan-assembler-times "\tmultu.g\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tdivu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tmodu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdiv.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tmod.g\t" 1 } } */
Index: gcc/testsuite/gcc.target/mips/loongson-muldiv-2.c
===================================================================
--- /dev/null	2008-11-13 19:44:41.580098750 +0000
+++ gcc/testsuite/gcc.target/mips/loongson-muldiv-2.c	2008-11-13 22:35:24.000000000 +0000
@@ -0,0 +1,16 @@
+/* { dg-mips-options "-O2 -march=loongson2e -mgp64" } */
+
+typedef long long st;
+typedef unsigned long long ut;
+
+NOMIPS16 st smul (st x, st y) { return x * y; }
+NOMIPS16 st sdiv (st x, st y) { return x / y + x % y; }
+
+NOMIPS16 ut umul (ut x, ut y) { return x * y; }
+NOMIPS16 ut udiv (ut x, ut y) { return x / y + x % y; }
+
+/* { dg-final { scan-assembler-times "\tdmultu.g\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tddivu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdmodu.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tddiv.g\t" 1 } } */
+/* { dg-final { scan-assembler-times "\tdmod.g\t" 1 } } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-11-13 22:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-07  8:23 [PATCH] [MIPS] [Loongson] Specific integer instructions and prefetch added Ruan Beihong
2008-11-10 22:16 ` Richard Sandiford
2008-11-11 13:20   ` Ruan Beihong
2008-11-11 22:53     ` Richard Sandiford
2008-11-12 10:50       ` Ruan Beihong
2008-11-13 23:11         ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).