public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [AArch64][PR65375] Fix RTX cost for vector SET
@ 2015-03-16  5:36 Kugan
  2015-03-16 10:02 ` Kyrill Tkachov
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-03-16  5:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]


AArch64 RTX cost for vector SET is causing PR65375. Lower subreg is
using this rtx_cost to compute the cost of moves, and splitting anything
larger than word size, 64-bits in this case. The aarch64 rtx_costs is
returning 2 * COST_N_INSNS(1) for vector moves, so they get split.
Attach patch fixes this.

With the patch the testcase in the PR:

#include <arm_neon.h>
void hello_vst2(float* fout, float *fin)
{
  float32x4x2_t a;
  a = vld2q_f32 (fin);
  vst2q_f32 (fout, a);
}

Changes to:

 hello_vst2:
-	ld2	{v0.4s - v1.4s}, [x1]
-	sub	sp, sp, #32
-	umov	x1, v0.d[0]
-	umov	x2, v0.d[1]
-	str	q1, [sp, 16]
-	mov	x5, x1
-	stp	x5, x2, [sp]
-	ld1	{v0.16b - v1.16b}, [sp]
+	ld2	{v2.4s - v3.4s}, [x1]
+	orr	v0.16b, v2.16b, v2.16b
+	orr	v1.16b, v3.16b, v3.16b
 	st2	{v0.4s - v1.4s}, [x0]
-	add	sp, sp, 32
 	ret


lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
(const_int )) move but I think the intention, at least for now, is to
return extra_cost->vect.alu for all the vector operations.

Regression tested on aarch64-linux-gnu with no new regression. Is this
OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2015-03-16  Kugan Vivekanandarajah  <kuganv@linaro.org>
            Jim Wilson  <jim.wilson@linaro.org>

	PR target/65375
	* config/aarch64/aarch64.c (aarch64_rtx_costs): Return
	extra_cost->vect.alu for SET.

[-- Attachment #2: p.txt --]
[-- Type: text/plain, Size: 537 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..db69979 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5517,6 +5517,13 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
       op0 = SET_DEST (x);
       op1 = SET_SRC (x);
 
+      /* Sets don't have a mode, so we must recompute this here.  */
+      if (VECTOR_MODE_P (GET_MODE (op0)))
+	{
+	  *cost += extra_cost->vect.alu;
+	  return true;
+	}
+
       switch (GET_CODE (op0))
 	{
 	case MEM:

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16  5:36 [AArch64][PR65375] Fix RTX cost for vector SET Kugan
@ 2015-03-16 10:02 ` Kyrill Tkachov
  2015-03-16 12:33   ` Kugan
  0 siblings, 1 reply; 27+ messages in thread
From: Kyrill Tkachov @ 2015-03-16 10:02 UTC (permalink / raw)
  To: Kugan, gcc-patches; +Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson


On 16/03/15 05:36, Kugan wrote:

Hi Kugan,

> AArch64 RTX cost for vector SET is causing PR65375. Lower subreg is
> using this rtx_cost to compute the cost of moves, and splitting anything
> larger than word size, 64-bits in this case. The aarch64 rtx_costs is
> returning 2 * COST_N_INSNS(1) for vector moves, so they get split.
> Attach patch fixes this.
>
> With the patch the testcase in the PR:
>
> #include <arm_neon.h>
> void hello_vst2(float* fout, float *fin)
> {
>    float32x4x2_t a;
>    a = vld2q_f32 (fin);
>    vst2q_f32 (fout, a);
> }
>
> Changes to:
>
>   hello_vst2:
> -	ld2	{v0.4s - v1.4s}, [x1]
> -	sub	sp, sp, #32
> -	umov	x1, v0.d[0]
> -	umov	x2, v0.d[1]
> -	str	q1, [sp, 16]
> -	mov	x5, x1
> -	stp	x5, x2, [sp]
> -	ld1	{v0.16b - v1.16b}, [sp]
> +	ld2	{v2.4s - v3.4s}, [x1]
> +	orr	v0.16b, v2.16b, v2.16b
> +	orr	v1.16b, v3.16b, v3.16b
>   	st2	{v0.4s - v1.4s}, [x0]
> -	add	sp, sp, 32
>   	ret
>
>
> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
> (const_int )) move but I think the intention, at least for now, is to
> return extra_cost->vect.alu for all the vector operations.

Almost, what we want at the moment is COSTS_N_INSNS (1) + 
extra_cost->vect.alu
>
> Regression tested on aarch64-linux-gnu with no new regression. Is this
> OK for trunk?

Are you sure it's a (set (reg) (const_int)) that's being costed here? I 
thought for moves into vecto registers it would be a (set (reg) 
(const_vector)) which we don't handle in our rtx costs currently. I 
think the correct approach would be to extend the aarch64_rtx_costs 
switch statement to handle the CONST_VECT case. I believe you can use 
aarch64_simd_valid_immediate to check whether x is a valid immediate for 
a simd instruction and give it a cost of extra_cost->vect.alu. The logic 
should be similar to the CONST_INT case.

Thanks,
Kyrill
>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2015-03-16  Kugan Vivekanandarajah  <kuganv@linaro.org>
>              Jim Wilson  <jim.wilson@linaro.org>
>
> 	PR target/65375
> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Return
> 	extra_cost->vect.alu for SET.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16 10:02 ` Kyrill Tkachov
@ 2015-03-16 12:33   ` Kugan
  2015-03-16 13:15     ` Kugan
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-03-16 12:33 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson

>> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
>> (const_int )) move but I think the intention, at least for now, is to
>> return extra_cost->vect.alu for all the vector operations.
> 
> Almost, what we want at the moment is COSTS_N_INSNS (1) +
> extra_cost->vect.alu

Thanks Kyrill for the review.

>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>> OK for trunk?
> 
> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
> thought for moves into vecto registers it would be a (set (reg)
> (const_vector)) which we don't handle in our rtx costs currently. I
> think the correct approach would be to extend the aarch64_rtx_costs
> switch statement to handle the CONST_VECT case. I believe you can use
> aarch64_simd_valid_immediate to check whether x is a valid immediate for
> a simd instruction and give it a cost of extra_cost->vect.alu. The logic
> should be similar to the CONST_INT case.

Sorry about the (set (reg) (const_int)) above. But the actual RTL that
is being split at 220r.subreg2 is

(insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
         (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
/home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
800 {*aarch64_simd_movv4sf}
      (nil))

And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
split and it dosent recover from there. Therefore we need something like
the below to prevent that happening.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..d5c80f1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5544,10 +5544,14 @@ aarch64_rtx_costs (rtx x, int code, int outer
ATTRIBUTE_UNUSED,

 	  /* Fall through.  */
 	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+	      *cost = COSTS_N_INSNS (1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
               /* The cost is 1 per register copied.  */
               int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;




Thanks,
Kugan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16 12:33   ` Kugan
@ 2015-03-16 13:15     ` Kugan
  2015-03-16 16:42       ` Jim Wilson
  2015-03-16 16:49       ` Kyrill Tkachov
  0 siblings, 2 replies; 27+ messages in thread
From: Kugan @ 2015-03-16 13:15 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On 16/03/15 23:32, Kugan wrote:
>>> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
>>> (const_int )) move but I think the intention, at least for now, is to
>>> return extra_cost->vect.alu for all the vector operations.
>>
>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>> extra_cost->vect.alu
> 
> Thanks Kyrill for the review.
> 
>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>> OK for trunk?
>>
>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>> thought for moves into vecto registers it would be a (set (reg)
>> (const_vector)) which we don't handle in our rtx costs currently. I
>> think the correct approach would be to extend the aarch64_rtx_costs
>> switch statement to handle the CONST_VECT case. I believe you can use
>> aarch64_simd_valid_immediate to check whether x is a valid immediate for
>> a simd instruction and give it a cost of extra_cost->vect.alu. The logic
>> should be similar to the CONST_INT case.
> 
> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
> is being split at 220r.subreg2 is
> 
> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>          (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
> 800 {*aarch64_simd_movv4sf}
>       (nil))
> 
> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
> split and it dosent recover from there. Therefore we need something like
> the below to prevent that happening.
> 

Hi Kyrill,

How about the attached patch? It is similar to what is currently done
for scalar register move.

Thanks,
Kugan

[-- Attachment #2: p.txt --]
[-- Type: text/plain, Size: 964 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..b9db3ac 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+              /* The cost is 1 per register copied.  */
+              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+              *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
               /* The cost is 1 per register copied.  */
               int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16 13:15     ` Kugan
@ 2015-03-16 16:42       ` Jim Wilson
  2015-03-16 16:49       ` Kyrill Tkachov
  1 sibling, 0 replies; 27+ messages in thread
From: Jim Wilson @ 2015-03-16 16:42 UTC (permalink / raw)
  To: Kugan; +Cc: Kyrill Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw

Resending, now that I've figured out how to make gmail send text email
instead of html.

> >> Almost, what we want at the moment is COSTS_N_INSNS (1) +
> >> extra_cost->vect.alu

This won't work, because extra_cost->vect.alu is COSTS_N_INSNS (1),
which means the total is COSTS_N_INSNS (2).

The lower-subreg pass makes a decision on whether to split based on
cost >= (word_move_cost * size/word_mode_size).  Vectors are twice the
size of word mode, and word moves are cost COSTS_N_INSNS (1).  Setting
the vector move cost to COSTS_N_INSNS (2) means we have COSTS_N_INSNS
(2) >= COSTS_N_INSNS (2) and vector moves are split which is bad for
vector register allocation.  This calculation happens in compute_costs
in lower-subreg.c.

> How about the attached patch? It is similar to what is currently done
> for scalar register move.

I like this approach of using the vector register size instead of word
size when we have a vector mode.

Jim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16 13:15     ` Kugan
  2015-03-16 16:42       ` Jim Wilson
@ 2015-03-16 16:49       ` Kyrill Tkachov
  2015-03-17  1:20         ` Kugan
  1 sibling, 1 reply; 27+ messages in thread
From: Kyrill Tkachov @ 2015-03-16 16:49 UTC (permalink / raw)
  To: Kugan, gcc-patches; +Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson


On 16/03/15 13:15, Kugan wrote:
> On 16/03/15 23:32, Kugan wrote:
>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set (reg)
>>>> (const_int )) move but I think the intention, at least for now, is to
>>>> return extra_cost->vect.alu for all the vector operations.
>>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>>> extra_cost->vect.alu
>> Thanks Kyrill for the review.
>>
>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>>> OK for trunk?
>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>>> thought for moves into vecto registers it would be a (set (reg)
>>> (const_vector)) which we don't handle in our rtx costs currently. I
>>> think the correct approach would be to extend the aarch64_rtx_costs
>>> switch statement to handle the CONST_VECT case. I believe you can use
>>> aarch64_simd_valid_immediate to check whether x is a valid immediate for
>>> a simd instruction and give it a cost of extra_cost->vect.alu. The logic
>>> should be similar to the CONST_INT case.
>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
>> is being split at 220r.subreg2 is
>>
>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>>           (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
>> 800 {*aarch64_simd_movv4sf}
>>        (nil))
>>
>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
>> split and it dosent recover from there. Therefore we need something like
>> the below to prevent that happening.
>>
> Hi Kyrill,
>
> How about the attached patch? It is similar to what is currently done
> for scalar register move.

Hi Kugan,
yeah, I think this is a better approach, though I can't approve.

Kyrill

>
> Thanks,
> Kugan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-16 16:49       ` Kyrill Tkachov
@ 2015-03-17  1:20         ` Kugan
  2015-03-26  7:22           ` Kugan
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-03-17  1:20 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 2311 bytes --]



On 17/03/15 03:48, Kyrill Tkachov wrote:
> 
> On 16/03/15 13:15, Kugan wrote:
>> On 16/03/15 23:32, Kugan wrote:
>>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set
>>>>> (reg)
>>>>> (const_int )) move but I think the intention, at least for now, is to
>>>>> return extra_cost->vect.alu for all the vector operations.
>>>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>>>> extra_cost->vect.alu
>>> Thanks Kyrill for the review.
>>>
>>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>>>> OK for trunk?
>>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>>>> thought for moves into vecto registers it would be a (set (reg)
>>>> (const_vector)) which we don't handle in our rtx costs currently. I
>>>> think the correct approach would be to extend the aarch64_rtx_costs
>>>> switch statement to handle the CONST_VECT case. I believe you can use
>>>> aarch64_simd_valid_immediate to check whether x is a valid immediate
>>>> for
>>>> a simd instruction and give it a cost of extra_cost->vect.alu. The
>>>> logic
>>>> should be similar to the CONST_INT case.
>>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
>>> is being split at 220r.subreg2 is
>>>
>>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>>>           (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
>>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
>>>
>>> 800 {*aarch64_simd_movv4sf}
>>>        (nil))
>>>
>>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
>>> split and it dosent recover from there. Therefore we need something like
>>> the below to prevent that happening.
>>>
>> Hi Kyrill,
>>
>> How about the attached patch? It is similar to what is currently done
>> for scalar register move.
> 
> Hi Kugan,
> yeah, I think this is a better approach, though I can't approve.
> 

Here is the patch with minor comment update. Regression tested on
aarch64-linux-gnu with no new regression. Is this
OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2015-03-17  Kugan Vivekanandarajah  <kuganv@linaro.org>
            Jim Wilson  <jim.wilson@linaro.org>

	PR target/65375
	* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle
	vector register copies.




[-- Attachment #2: p.txt --]
[-- Type: text/plain, Size: 971 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..d6ad0af 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+              /* The cost is 1 per vector-register copied.  */
+              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+              *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
               /* The cost is 1 per register copied.  */
               int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-17  1:20         ` Kugan
@ 2015-03-26  7:22           ` Kugan
  2015-04-14 22:09             ` Kugan
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-03-26  7:22 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson

ping?

Thanks,
Kugan

On 17/03/15 12:19, Kugan wrote:
> 
> 
> On 17/03/15 03:48, Kyrill Tkachov wrote:
>>
>> On 16/03/15 13:15, Kugan wrote:
>>> On 16/03/15 23:32, Kugan wrote:
>>>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set
>>>>>> (reg)
>>>>>> (const_int )) move but I think the intention, at least for now, is to
>>>>>> return extra_cost->vect.alu for all the vector operations.
>>>>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>>>>> extra_cost->vect.alu
>>>> Thanks Kyrill for the review.
>>>>
>>>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>>>>> OK for trunk?
>>>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>>>>> thought for moves into vecto registers it would be a (set (reg)
>>>>> (const_vector)) which we don't handle in our rtx costs currently. I
>>>>> think the correct approach would be to extend the aarch64_rtx_costs
>>>>> switch statement to handle the CONST_VECT case. I believe you can use
>>>>> aarch64_simd_valid_immediate to check whether x is a valid immediate
>>>>> for
>>>>> a simd instruction and give it a cost of extra_cost->vect.alu. The
>>>>> logic
>>>>> should be similar to the CONST_INT case.
>>>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
>>>> is being split at 220r.subreg2 is
>>>>
>>>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>>>>           (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
>>>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
>>>>
>>>> 800 {*aarch64_simd_movv4sf}
>>>>        (nil))
>>>>
>>>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
>>>> split and it dosent recover from there. Therefore we need something like
>>>> the below to prevent that happening.
>>>>
>>> Hi Kyrill,
>>>
>>> How about the attached patch? It is similar to what is currently done
>>> for scalar register move.
>>
>> Hi Kugan,
>> yeah, I think this is a better approach, though I can't approve.
>>
> 
> Here is the patch with minor comment update. Regression tested on
> aarch64-linux-gnu with no new regression. Is this
> OK for trunk?
> 
> Thanks,
> Kugan
> 
> gcc/ChangeLog:
> 
> 2015-03-17  Kugan Vivekanandarajah  <kuganv@linaro.org>
>             Jim Wilson  <jim.wilson@linaro.org>
> 
> 	PR target/65375
> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle
> 	vector register copies.
> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-03-26  7:22           ` Kugan
@ 2015-04-14 22:09             ` Kugan
  2015-04-15  9:25               ` James Greenhalgh
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-04-14 22:09 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, Jim Wilson, james.greenhalgh

Ping?

Now that Stage1 is open, is this OK for trunk.

Thanks,
Kugan

On 26/03/15 18:21, Kugan wrote:
> ping?
> 
> Thanks,
> Kugan
> 
> On 17/03/15 12:19, Kugan wrote:
>>
>>
>> On 17/03/15 03:48, Kyrill Tkachov wrote:
>>>
>>> On 16/03/15 13:15, Kugan wrote:
>>>> On 16/03/15 23:32, Kugan wrote:
>>>>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set
>>>>>>> (reg)
>>>>>>> (const_int )) move but I think the intention, at least for now, is to
>>>>>>> return extra_cost->vect.alu for all the vector operations.
>>>>>> Almost, what we want at the moment is COSTS_N_INSNS (1) +
>>>>>> extra_cost->vect.alu
>>>>> Thanks Kyrill for the review.
>>>>>
>>>>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this
>>>>>>> OK for trunk?
>>>>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I
>>>>>> thought for moves into vecto registers it would be a (set (reg)
>>>>>> (const_vector)) which we don't handle in our rtx costs currently. I
>>>>>> think the correct approach would be to extend the aarch64_rtx_costs
>>>>>> switch statement to handle the CONST_VECT case. I believe you can use
>>>>>> aarch64_simd_valid_immediate to check whether x is a valid immediate
>>>>>> for
>>>>>> a simd instruction and give it a cost of extra_cost->vect.alu. The
>>>>>> logic
>>>>>> should be similar to the CONST_INT case.
>>>>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that
>>>>> is being split at 220r.subreg2 is
>>>>>
>>>>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0)
>>>>>           (subreg:V4SF (reg/v:OI 73 [ __o ]) 0))
>>>>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625
>>>>>
>>>>> 800 {*aarch64_simd_movv4sf}
>>>>>        (nil))
>>>>>
>>>>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be
>>>>> split and it dosent recover from there. Therefore we need something like
>>>>> the below to prevent that happening.
>>>>>
>>>> Hi Kyrill,
>>>>
>>>> How about the attached patch? It is similar to what is currently done
>>>> for scalar register move.
>>>
>>> Hi Kugan,
>>> yeah, I think this is a better approach, though I can't approve.
>>>
>>
>> Here is the patch with minor comment update. Regression tested on
>> aarch64-linux-gnu with no new regression. Is this
>> OK for trunk?
>>
>> Thanks,
>> Kugan
>>
>> gcc/ChangeLog:
>>
>> 2015-03-17  Kugan Vivekanandarajah  <kuganv@linaro.org>
>>             Jim Wilson  <jim.wilson@linaro.org>
>>
>> 	PR target/65375
>> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle
>> 	vector register copies.
>>
>>
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-14 22:09             ` Kugan
@ 2015-04-15  9:25               ` James Greenhalgh
  2015-04-15 10:14                 ` Kyrill Tkachov
  2015-04-15 10:45                 ` Kugan
  0 siblings, 2 replies; 27+ messages in thread
From: James Greenhalgh @ 2015-04-15  9:25 UTC (permalink / raw)
  To: Kugan
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
> Now that Stage1 is open, is this OK for trunk.

Hi Kugan,

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index cba3c1a..d6ad0af 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>  
>  	  /* Fall through.  */
>  	case REG:
> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
> +	    {
> +              /* The cost is 1 per vector-register copied.  */
> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
> +			      / GET_MODE_SIZE (V4SImode);
> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
> +	    }
>  	  /* const0_rtx is in general free, but we will use an
>  	     instruction to set a register to 0.  */
> -          if (REG_P (op1) || op1 == const0_rtx)
> -            {
> +	  else if (REG_P (op1) || op1 == const0_rtx)
> +	    {
>                /* The cost is 1 per register copied.  */
>                int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>  			      / UNITS_PER_WORD;

I would not have expected control flow to reach this point, as we have:

>  /* TODO: The cost infrastructure currently does not handle
>     vector operations.  Assume that all vector operations
>     are equally expensive.  */
>  if (VECTOR_MODE_P (mode))
>    {
>      if (speed)
>	*cost += extra_cost->vect.alu;
>      return true;
>    }

But, I see that this check is broken for a set RTX (which has no mode).
So, your patch works, but only due to a bug in my original implementation.
This leaves the code with quite a messy design.

There are two ways I see that we could clean things up, both of which
require some reworking of your patch.

Either we remove my check above and teach the RTX costs how to properly
cost vector operations, or we fix my check to catch all vector RTX
and add the special cases for the small subset of things we understand
up there.

The correct approach in the long term is to fix the RTX costs to correctly
understand vector operations, so I'd much prefer to see a patch along
these lines, though I appreciate that is a substantially more invasive
piece of work.

Thanks,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15  9:25               ` James Greenhalgh
@ 2015-04-15 10:14                 ` Kyrill Tkachov
  2015-04-15 11:05                   ` James Greenhalgh
  2015-04-15 10:45                 ` Kugan
  1 sibling, 1 reply; 27+ messages in thread
From: Kyrill Tkachov @ 2015-04-15 10:14 UTC (permalink / raw)
  To: James Greenhalgh, Kugan
  Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw, Jim Wilson


On 15/04/15 10:25, James Greenhalgh wrote:
> On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
>> Now that Stage1 is open, is this OK for trunk.
> Hi Kugan,
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index cba3c1a..d6ad0af 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>>   
>>   	  /* Fall through.  */
>>   	case REG:
>> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
>> +	    {
>> +              /* The cost is 1 per vector-register copied.  */
>> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>> +			      / GET_MODE_SIZE (V4SImode);
>> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
>> +	    }
>>   	  /* const0_rtx is in general free, but we will use an
>>   	     instruction to set a register to 0.  */
>> -          if (REG_P (op1) || op1 == const0_rtx)
>> -            {
>> +	  else if (REG_P (op1) || op1 == const0_rtx)
>> +	    {
>>                 /* The cost is 1 per register copied.  */
>>                 int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>>   			      / UNITS_PER_WORD;
> I would not have expected control flow to reach this point, as we have:
>
>>   /* TODO: The cost infrastructure currently does not handle
>>      vector operations.  Assume that all vector operations
>>      are equally expensive.  */
>>   if (VECTOR_MODE_P (mode))
>>     {
>>       if (speed)
>> 	*cost += extra_cost->vect.alu;
>>       return true;
>>     }
> But, I see that this check is broken for a set RTX (which has no mode).
> So, your patch works, but only due to a bug in my original implementation.
> This leaves the code with quite a messy design.
>
> There are two ways I see that we could clean things up, both of which
> require some reworking of your patch.
>
> Either we remove my check above and teach the RTX costs how to properly
> cost vector operations, or we fix my check to catch all vector RTX
> and add the special cases for the small subset of things we understand
> up there.
>
> The correct approach in the long term is to fix the RTX costs to correctly
> understand vector operations, so I'd much prefer to see a patch along
> these lines, though I appreciate that is a substantially more invasive
> piece of work.


Would we want to catch all vector RTXes in that check at the top
and have special vector rtx handling there? (Perhaps even in a function
of its own like aarch64_vector_rtx_costs?). Or do you think it would
be cleaner to handle them in the aarch64_rtx_costs giant switch?
Vector-specific RTX codes like vec_concat, vec_select would integrate
cleanly, but handling other common rtxen could potentially be messy?

Kyrill

>
> Thanks,
> James
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15  9:25               ` James Greenhalgh
  2015-04-15 10:14                 ` Kyrill Tkachov
@ 2015-04-15 10:45                 ` Kugan
  2015-04-15 11:18                   ` James Greenhalgh
  1 sibling, 1 reply; 27+ messages in thread
From: Kugan @ 2015-04-15 10:45 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson



On 15/04/15 19:25, James Greenhalgh wrote:
> On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
>> Now that Stage1 is open, is this OK for trunk.
> 
> Hi Kugan,
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index cba3c1a..d6ad0af 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>>  
>>  	  /* Fall through.  */
>>  	case REG:
>> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
>> +	    {
>> +              /* The cost is 1 per vector-register copied.  */
>> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>> +			      / GET_MODE_SIZE (V4SImode);
>> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
>> +	    }
>>  	  /* const0_rtx is in general free, but we will use an
>>  	     instruction to set a register to 0.  */
>> -          if (REG_P (op1) || op1 == const0_rtx)
>> -            {
>> +	  else if (REG_P (op1) || op1 == const0_rtx)
>> +	    {
>>                /* The cost is 1 per register copied.  */
>>                int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>>  			      / UNITS_PER_WORD;
> 
> I would not have expected control flow to reach this point, as we have:

It does for mode == VODmode. RTL X is for example:

(set (reg:V8DI 67 virtual-incoming-args)
    (reg:V8DI 68 virtual-stack-vars))

> 
>>  /* TODO: The cost infrastructure currently does not handle
>>     vector operations.  Assume that all vector operations
>>     are equally expensive.  */
>>  if (VECTOR_MODE_P (mode))
>>    {
>>      if (speed)
>> 	*cost += extra_cost->vect.alu;
>>      return true;
>>    }
> 
> But, I see that this check is broken for a set RTX (which has no mode).
> So, your patch works, but only due to a bug in my original implementation.
> This leaves the code with quite a messy design.
> 
> There are two ways I see that we could clean things up, both of which
> require some reworking of your patch.
> 
> Either we remove my check above and teach the RTX costs how to properly
> cost vector operations, or we fix my check to catch all vector RTX
> and add the special cases for the small subset of things we understand
> up there.
> 
> The correct approach in the long term is to fix the RTX costs to correctly
> understand vector operations, so I'd much prefer to see a patch along
> these lines, though I appreciate that is a substantially more invasive
> piece of work.
> 


I agree that rtx cost for vector is not handled right now. We might not
be able to completely separate as Kyrill suggested.  We still need the
vector SET with VOIDmode to be handled inline. This patch is that part.
We can work on the others as a separate function, if you prefer that. I
am happy to look this as a separate patch.


Thanks,
Kugan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 10:14                 ` Kyrill Tkachov
@ 2015-04-15 11:05                   ` James Greenhalgh
  2015-04-15 11:17                     ` Kyrill Tkachov
  0 siblings, 1 reply; 27+ messages in thread
From: James Greenhalgh @ 2015-04-15 11:05 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Kugan, gcc-patches, Marcus Shawcroft, Richard Earnshaw, Jim Wilson

On Wed, Apr 15, 2015 at 11:14:11AM +0100, Kyrill Tkachov wrote:
> 
> On 15/04/15 10:25, James Greenhalgh wrote:
> > On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
> >> Now that Stage1 is open, is this OK for trunk.
> > Hi Kugan,
> >
> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> index cba3c1a..d6ad0af 100644
> >> --- a/gcc/config/aarch64/aarch64.c
> >> +++ b/gcc/config/aarch64/aarch64.c
> >> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
> >>   
> >>   	  /* Fall through.  */
> >>   	case REG:
> >> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
> >> +	    {
> >> +              /* The cost is 1 per vector-register copied.  */
> >> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
> >> +			      / GET_MODE_SIZE (V4SImode);
> >> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
> >> +	    }
> >>   	  /* const0_rtx is in general free, but we will use an
> >>   	     instruction to set a register to 0.  */
> >> -          if (REG_P (op1) || op1 == const0_rtx)
> >> -            {
> >> +	  else if (REG_P (op1) || op1 == const0_rtx)
> >> +	    {
> >>                 /* The cost is 1 per register copied.  */
> >>                 int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
> >>   			      / UNITS_PER_WORD;
> > I would not have expected control flow to reach this point, as we have:
> >
> >>   /* TODO: The cost infrastructure currently does not handle
> >>      vector operations.  Assume that all vector operations
> >>      are equally expensive.  */
> >>   if (VECTOR_MODE_P (mode))
> >>     {
> >>       if (speed)
> >> 	*cost += extra_cost->vect.alu;
> >>       return true;
> >>     }
> > But, I see that this check is broken for a set RTX (which has no mode).
> > So, your patch works, but only due to a bug in my original implementation.
> > This leaves the code with quite a messy design.
> >
> > There are two ways I see that we could clean things up, both of which
> > require some reworking of your patch.
> >
> > Either we remove my check above and teach the RTX costs how to properly
> > cost vector operations, or we fix my check to catch all vector RTX
> > and add the special cases for the small subset of things we understand
> > up there.
> >
> > The correct approach in the long term is to fix the RTX costs to correctly
> > understand vector operations, so I'd much prefer to see a patch along
> > these lines, though I appreciate that is a substantially more invasive
> > piece of work.
> 
> 
> Would we want to catch all vector RTXes in that check at the top
> and have special vector rtx handling there? (Perhaps even in a function
> of its own like aarch64_vector_rtx_costs?).

No, I think this would necessitate duplicating all of the idiom
recognition and RTX walking code from aarch64_rtx_costs. However, this
would be the easiest way to fix this PR in the short term.

> Or do you think it would be cleaner to handle them in the aarch64_rtx_costs
> giant switch?  Vector-specific RTX codes like vec_concat, vec_select would
> integrate cleanly, but handling other common rtxen could potentially be
> messy?

Well, if I'm allowed to dream for a bit...

To reduce the need for spaghetti code a little, what I would really like to
see is a logical split between the recognition of the instruction and the
costing of individual modes of that instruction. So we would invent a
function like aarch64_classify_rtx which would return "You gave me something
which looks like an add immediate" - then we would leave switching on modes
to aarch64_rtx_costs.

If I can dream even more - I don't see why it makes sense for us to have a
hand-rolled instruction recognizer in the back-end and I'd like to find
a way to resuse common recog infrastructure, and then add
something like what sched1 does to guess at likely register allocations
and to then extract the type attribute. For that to work, we would need
to change a huge amount of infrastructure to ensure that a register
allocation guess was available whenever someone wanted a cost estimate - 
a huge, huge problem when a Gimple pass speculatively forms some
invalid RTX and hands it off to rtx_costs. So I think this is not a
realistic plan!

Those are huge refactoring tasks which I'm not going to get a chance to
look at any time soon, so I think we have to be pragmatic about what can
be achieved.

Adding to the common RTX recognisers will potentially be messy, but it
is a neater approach than duplicating the logic (have a look at the
amount of effort we go to to spot a non-fused Multiply Add operation -
we certainly don't want to duplicate that out for vectors).

Thanks,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 11:05                   ` James Greenhalgh
@ 2015-04-15 11:17                     ` Kyrill Tkachov
  0 siblings, 0 replies; 27+ messages in thread
From: Kyrill Tkachov @ 2015-04-15 11:17 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kugan, gcc-patches, Marcus Shawcroft, Richard Earnshaw, Jim Wilson


On 15/04/15 12:05, James Greenhalgh wrote:
> On Wed, Apr 15, 2015 at 11:14:11AM +0100, Kyrill Tkachov wrote:
>> On 15/04/15 10:25, James Greenhalgh wrote:
>>> On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote:
>>>> Now that Stage1 is open, is this OK for trunk.
>>> Hi Kugan,
>>>
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index cba3c1a..d6ad0af 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>>>>    
>>>>    	  /* Fall through.  */
>>>>    	case REG:
>>>> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
>>>> +	    {
>>>> +              /* The cost is 1 per vector-register copied.  */
>>>> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>>>> +			      / GET_MODE_SIZE (V4SImode);
>>>> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
>>>> +	    }
>>>>    	  /* const0_rtx is in general free, but we will use an
>>>>    	     instruction to set a register to 0.  */
>>>> -          if (REG_P (op1) || op1 == const0_rtx)
>>>> -            {
>>>> +	  else if (REG_P (op1) || op1 == const0_rtx)
>>>> +	    {
>>>>                  /* The cost is 1 per register copied.  */
>>>>                  int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>>>>    			      / UNITS_PER_WORD;
>>> I would not have expected control flow to reach this point, as we have:
>>>
>>>>    /* TODO: The cost infrastructure currently does not handle
>>>>       vector operations.  Assume that all vector operations
>>>>       are equally expensive.  */
>>>>    if (VECTOR_MODE_P (mode))
>>>>      {
>>>>        if (speed)
>>>> 	*cost += extra_cost->vect.alu;
>>>>        return true;
>>>>      }
>>> But, I see that this check is broken for a set RTX (which has no mode).
>>> So, your patch works, but only due to a bug in my original implementation.
>>> This leaves the code with quite a messy design.
>>>
>>> There are two ways I see that we could clean things up, both of which
>>> require some reworking of your patch.
>>>
>>> Either we remove my check above and teach the RTX costs how to properly
>>> cost vector operations, or we fix my check to catch all vector RTX
>>> and add the special cases for the small subset of things we understand
>>> up there.
>>>
>>> The correct approach in the long term is to fix the RTX costs to correctly
>>> understand vector operations, so I'd much prefer to see a patch along
>>> these lines, though I appreciate that is a substantially more invasive
>>> piece of work.
>>
>> Would we want to catch all vector RTXes in that check at the top
>> and have special vector rtx handling there? (Perhaps even in a function
>> of its own like aarch64_vector_rtx_costs?).
> No, I think this would necessitate duplicating all of the idiom
> recognition and RTX walking code from aarch64_rtx_costs. However, this
> would be the easiest way to fix this PR in the short term.
>
>> Or do you think it would be cleaner to handle them in the aarch64_rtx_costs
>> giant switch?  Vector-specific RTX codes like vec_concat, vec_select would
>> integrate cleanly, but handling other common rtxen could potentially be
>> messy?
> Well, if I'm allowed to dream for a bit...
>
> To reduce the need for spaghetti code a little, what I would really like to
> see is a logical split between the recognition of the instruction and the
> costing of individual modes of that instruction. So we would invent a
> function like aarch64_classify_rtx which would return "You gave me something
> which looks like an add immediate" - then we would leave switching on modes
> to aarch64_rtx_costs.
>
> If I can dream even more - I don't see why it makes sense for us to have a
> hand-rolled instruction recognizer in the back-end and I'd like to find
> a way to resuse common recog infrastructure, and then add
> something like what sched1 does to guess at likely register allocations
> and to then extract the type attribute. For that to work, we would need
> to change a huge amount of infrastructure to ensure that a register
> allocation guess was available whenever someone wanted a cost estimate -
> a huge, huge problem when a Gimple pass speculatively forms some
> invalid RTX and hands it off to rtx_costs. So I think this is not a
> realistic plan!

(Unrelated to this patch) So, I find the worst offender in this
regard is expmed that generates rtx instances of every single integer mode
from QImode to EImode with common codes like PLUS,ASHIFT,MULT etc and asks the
backend rtx costs to assign it a number, which forces us to handle them even
though they are invalid and don't have any patterns that match them.
I'm working on some patches to remedy that, though there are some tree-ssa passes
that generate explicit rtxes that may not be valid as well.

Kyrill

>
> Those are huge refactoring tasks which I'm not going to get a chance to
> look at any time soon, so I think we have to be pragmatic about what can
> be achieved.
>
> Adding to the common RTX recognisers will potentially be messy, but it
> is a neater approach than duplicating the logic (have a look at the
> amount of effort we go to to spot a non-fused Multiply Add operation -
> we certainly don't want to duplicate that out for vectors).
>
> Thanks,
> James
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 10:45                 ` Kugan
@ 2015-04-15 11:18                   ` James Greenhalgh
  2015-04-15 11:33                     ` Kugan
  2015-04-15 11:35                     ` Maxim Kuvyrkov
  0 siblings, 2 replies; 27+ messages in thread
From: James Greenhalgh @ 2015-04-15 11:18 UTC (permalink / raw)
  To: Kugan
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

On Wed, Apr 15, 2015 at 11:45:36AM +0100, Kugan wrote:
> > There are two ways I see that we could clean things up, both of which
> > require some reworking of your patch.
> > 
> > Either we remove my check above and teach the RTX costs how to properly
> > cost vector operations, or we fix my check to catch all vector RTX
> > and add the special cases for the small subset of things we understand
> > up there.
> > 
> > The correct approach in the long term is to fix the RTX costs to correctly
> > understand vector operations, so I'd much prefer to see a patch along
> > these lines, though I appreciate that is a substantially more invasive
> > piece of work.
> > 
> 
> 
> I agree that rtx cost for vector is not handled right now. We might not
> be able to completely separate as Kyrill suggested.  We still need the
> vector SET with VOIDmode to be handled inline. This patch is that part.
> We can work on the others as a separate function, if you prefer that. I
> am happy to look this as a separate patch.

My point is that adding your patch while keeping the logic at the top
which claims to catch ALL vector operations makes for less readable
code.

At the very least you'll need to update this comment:

  /* TODO: The cost infrastructure currently does not handle
     vector operations.  Assume that all vector operations
     are equally expensive.  */

to make it clear that this doesn't catch vector set operations.

But fixing the comment doesn't improve the messy code so I'd certainly
prefer to see one of the other approaches which have been discussed.

Thanks,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 11:18                   ` James Greenhalgh
@ 2015-04-15 11:33                     ` Kugan
  2015-04-17 11:19                       ` Kugan
  2015-04-15 11:35                     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 27+ messages in thread
From: Kugan @ 2015-04-15 11:33 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson



On 15/04/15 21:18, James Greenhalgh wrote:
> On Wed, Apr 15, 2015 at 11:45:36AM +0100, Kugan wrote:
>>> There are two ways I see that we could clean things up, both of which
>>> require some reworking of your patch.
>>>
>>> Either we remove my check above and teach the RTX costs how to properly
>>> cost vector operations, or we fix my check to catch all vector RTX
>>> and add the special cases for the small subset of things we understand
>>> up there.
>>>
>>> The correct approach in the long term is to fix the RTX costs to correctly
>>> understand vector operations, so I'd much prefer to see a patch along
>>> these lines, though I appreciate that is a substantially more invasive
>>> piece of work.
>>>
>>
>>
>> I agree that rtx cost for vector is not handled right now. We might not
>> be able to completely separate as Kyrill suggested.  We still need the
>> vector SET with VOIDmode to be handled inline. This patch is that part.
>> We can work on the others as a separate function, if you prefer that. I
>> am happy to look this as a separate patch.
> 
> My point is that adding your patch while keeping the logic at the top
> which claims to catch ALL vector operations makes for less readable
> code.
> 
> At the very least you'll need to update this comment:
> 
>   /* TODO: The cost infrastructure currently does not handle
>      vector operations.  Assume that all vector operations
>      are equally expensive.  */
> 
> to make it clear that this doesn't catch vector set operations.
> 
> But fixing the comment doesn't improve the messy code so I'd certainly
> prefer to see one of the other approaches which have been discussed.

I see your point. Let me work on this based on your suggestions above.

Thanks,
Kugan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 11:18                   ` James Greenhalgh
  2015-04-15 11:33                     ` Kugan
@ 2015-04-15 11:35                     ` Maxim Kuvyrkov
  1 sibling, 0 replies; 27+ messages in thread
From: Maxim Kuvyrkov @ 2015-04-15 11:35 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kugan, Kyrylo Tkachov, gcc-patches, Marcus Shawcroft,
	Richard Earnshaw, Jim Wilson

> On Apr 15, 2015, at 2:18 PM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> 
> On Wed, Apr 15, 2015 at 11:45:36AM +0100, Kugan wrote:
>>> There are two ways I see that we could clean things up, both of which
>>> require some reworking of your patch.
>>> 
>>> Either we remove my check above and teach the RTX costs how to properly
>>> cost vector operations, or we fix my check to catch all vector RTX
>>> and add the special cases for the small subset of things we understand
>>> up there.
>>> 
>>> The correct approach in the long term is to fix the RTX costs to correctly
>>> understand vector operations, so I'd much prefer to see a patch along
>>> these lines, though I appreciate that is a substantially more invasive
>>> piece of work.
>>> 
>> 
>> 
>> I agree that rtx cost for vector is not handled right now. We might not
>> be able to completely separate as Kyrill suggested.  We still need the
>> vector SET with VOIDmode to be handled inline. This patch is that part.
>> We can work on the others as a separate function, if you prefer that. I
>> am happy to look this as a separate patch.
> 
> My point is that adding your patch while keeping the logic at the top
> which claims to catch ALL vector operations makes for less readable
> code.
> 
> At the very least you'll need to update this comment:
> 
>  /* TODO: The cost infrastructure currently does not handle
>     vector operations.  Assume that all vector operations
>     are equally expensive.  */
> 
> to make it clear that this doesn't catch vector set operations.
> 
> But fixing the comment doesn't improve the messy code so I'd certainly
> prefer to see one of the other approaches which have been discussed.

While I am for cleaning up messy code, I want to avoid Kugan's patch being held hostage until all the proper refactorings and cleanups are done.  If we consider the patch on its own merits: Is it a worthwhile improvement? -- [Probably, "yes".]  Does it make current spaghetti code significantly more difficult to understand? -- [Probably, "no", if we update the current comments.]

Let's discuss the effort of cleaning RTX costs as a separate task.  It can be either a joint effort for ARM and Linaro, or one of us can tackle it.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-15 11:33                     ` Kugan
@ 2015-04-17 11:19                       ` Kugan
  2015-04-17 11:25                         ` Kyrill Tkachov
  2015-04-20 20:22                         ` James Greenhalgh
  0 siblings, 2 replies; 27+ messages in thread
From: Kugan @ 2015-04-17 11:19 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

>> My point is that adding your patch while keeping the logic at the top
>> which claims to catch ALL vector operations makes for less readable
>> code.
>>
>> At the very least you'll need to update this comment:
>>
>>   /* TODO: The cost infrastructure currently does not handle
>>      vector operations.  Assume that all vector operations
>>      are equally expensive.  */
>>
>> to make it clear that this doesn't catch vector set operations.
>>
>> But fixing the comment doesn't improve the messy code so I'd certainly
>> prefer to see one of the other approaches which have been discussed.
> 
> I see your point. Let me work on this based on your suggestions above.

Hi James,

Here is an attempt along this line. Is this what you have in mind?
Trying to keep functionality as before so that we can tune the
parameters later. Not fully tested yet.

Thanks,
Kugan

[-- Attachment #2: cost.txt --]
[-- Type: text/plain, Size: 14879 bytes --]

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
index ae2b547..ed9432e 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -121,7 +121,9 @@ const struct cpu_cost_table thunderx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* Alu.  */
+    COSTS_N_INSNS (1),	/* Alu.  */
+    COSTS_N_INSNS (1),	/* Load.  */
+    COSTS_N_INSNS (1)	/* Store.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..c2d4a53 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5499,16 +5499,6 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
      above this default.  */
   *cost = COSTS_N_INSNS (1);
 
-  /* TODO: The cost infrastructure currently does not handle
-     vector operations.  Assume that all vector operations
-     are equally expensive.  */
-  if (VECTOR_MODE_P (mode))
-    {
-      if (speed)
-	*cost += extra_cost->vect.alu;
-      return true;
-    }
-
   switch (code)
     {
     case SET:
@@ -5523,7 +5513,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	  if (speed)
 	    {
 	      rtx address = XEXP (op0, 0);
-	      if (GET_MODE_CLASS (mode) == MODE_INT)
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.store;
+	      else if (GET_MODE_CLASS (mode) == MODE_INT)
 		*cost += extra_cost->ldst.store;
 	      else if (mode == SFmode)
 		*cost += extra_cost->ldst.storef;
@@ -5544,10 +5536,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+              /* The cost is 1 per vector-register copied.  */
+              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+              *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
               /* The cost is 1 per register copied.  */
               int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;
@@ -5570,6 +5569,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	      && (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
 		  >= INTVAL (XEXP (op0, 1))))
 	    op1 = XEXP (op1, 0);
+	  gcc_assert (!VECTOR_MODE_P (mode));
 
           if (CONST_INT_P (op1))
             {
@@ -5621,8 +5621,10 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case CONST_DOUBLE:
       if (speed)
 	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
 	  /* mov[df,sf]_aarch64.  */
-	  if (aarch64_float_const_representable_p (x))
+	  else if (aarch64_float_const_representable_p (x))
 	    /* FMOV (scalar immediate).  */
 	    *cost += extra_cost->fp[mode == DFmode].fpconst;
 	  else if (!aarch64_float_const_zero_rtx_p (x))
@@ -5650,7 +5652,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	     approximation for the additional cost of the addressing
 	     mode.  */
 	  rtx address = XEXP (x, 0);
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.load;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    *cost += extra_cost->ldst.load;
 	  else if (mode == SFmode)
 	    *cost += extra_cost->ldst.loadf;
@@ -5705,7 +5709,12 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case CLRSB:
     case CLZ:
       if (speed)
-        *cost += extra_cost->alu.clz;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.clz;
+	}
 
       return false;
 
@@ -5790,6 +5799,13 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
           return false;
         }
 
+      /* VCMP.  */
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	  return true;
+	}
       return false;
 
     case MINUS:
@@ -5808,8 +5824,13 @@ cost_minus:
 	    *cost += rtx_cost (op0, MINUS, 0, speed);
 
 	    if (speed)
-	      /* SUB(S) (immediate).  */
-	      *cost += extra_cost->alu.arith;
+	      {
+		if (VECTOR_MODE_P (mode))
+		  *cost += extra_cost->vect.alu;
+		/* SUB(S) (immediate).  */
+		else
+		  *cost += extra_cost->alu.arith;
+	      }
 	    return true;
 
 	  }
@@ -5818,8 +5839,12 @@ cost_minus:
         if (aarch64_rtx_arith_op_extract_p (op1, mode))
 	  {
 	    if (speed)
-	      *cost += extra_cost->alu.arith_shift;
-
+	      {
+		if (VECTOR_MODE_P (mode))
+		  *cost += extra_cost->vect.alu;
+		else
+		  *cost += extra_cost->alu.arith_shift;
+	      }
 	    *cost += rtx_cost (XEXP (XEXP (op1, 0), 0),
 			       (enum rtx_code) GET_CODE (op1),
 			       0, speed);
@@ -5844,7 +5869,10 @@ cost_minus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
+	    if (VECTOR_MODE_P (mode))
+	      /* Vector SUB.  */
+	      *cost += extra_cost->vect.alu;
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
 	      /* SUB(S).  */
 	      *cost += extra_cost->alu.arith;
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
@@ -5878,8 +5906,13 @@ cost_plus:
 	    *cost += rtx_cost (op0, PLUS, 0, speed);
 
 	    if (speed)
-	      /* ADD (immediate).  */
-	      *cost += extra_cost->alu.arith;
+	      {
+		if (VECTOR_MODE_P (mode))
+		  *cost += extra_cost->vect.alu;
+		/* ADD (immediate).  */
+		else
+		  *cost += extra_cost->alu.arith;
+	      }
 	    return true;
 	  }
 
@@ -5887,8 +5920,12 @@ cost_plus:
         if (aarch64_rtx_arith_op_extract_p (op0, mode))
 	  {
 	    if (speed)
-	      *cost += extra_cost->alu.arith_shift;
-
+	      {
+		if (VECTOR_MODE_P (mode))
+		  *cost += extra_cost->vect.alu;
+		else
+		  *cost += extra_cost->alu.arith_shift;
+	      }
 	    *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
 			       (enum rtx_code) GET_CODE (op0),
 			       0, speed);
@@ -5913,7 +5950,10 @@ cost_plus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
+	    if (VECTOR_MODE_P (mode))
+	      /* Vector ADD.  */
+	      *cost += extra_cost->vect.alu;
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
 	      /* ADD.  */
 	      *cost += extra_cost->alu.arith;
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
@@ -5927,8 +5967,12 @@ cost_plus:
       *cost = COSTS_N_INSNS (1);
 
       if (speed)
-        *cost += extra_cost->alu.rev;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.rev;
+	}
       return false;
 
     case IOR:
@@ -5936,10 +5980,14 @@ cost_plus:
         {
           *cost = COSTS_N_INSNS (1);
 
-          if (speed)
-            *cost += extra_cost->alu.rev;
-
-          return true;
+	  if (speed)
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.rev;
+	    }
+	  return true;
         }
     /* Fall through.  */
     case XOR:
@@ -5948,6 +5996,13 @@ cost_plus:
       op0 = XEXP (x, 0);
       op1 = XEXP (x, 1);
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	  return true;
+	}
+
       if (code == AND
           && GET_CODE (op0) == MULT
           && CONST_INT_P (XEXP (op0, 1))
@@ -6013,10 +6068,15 @@ cost_plus:
       return false;
 
     case NOT:
-      /* MVN.  */
       if (speed)
-	*cost += extra_cost->alu.logical;
-
+	{
+	  /* VNEG.  */
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  /* MVN.  */
+	  else
+	    *cost += extra_cost->alu.logical;
+	}
       /* The logical instruction could have the shifted register form,
          but the cost is the same if the shift is processed as a separate
          instruction, so we don't bother with it here.  */
@@ -6057,13 +6117,18 @@ cost_plus:
 
       /* UXTB/UXTH.  */
       if (speed)
-	*cost += extra_cost->alu.extend;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.extend;
+	}
       return false;
 
     case SIGN_EXTEND:
       if (MEM_P (XEXP (x, 0)))
 	{
+	  gcc_assert (!VECTOR_MODE_P (mode));
 	  /* LDRSH.  */
 	  if (speed)
 	    {
@@ -6078,7 +6143,12 @@ cost_plus:
 	}
 
       if (speed)
-	*cost += extra_cost->alu.extend;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.extend;
+	}
       return false;
 
     case ASHIFT:
@@ -6087,10 +6157,16 @@ cost_plus:
 
       if (CONST_INT_P (op1))
         {
-	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
-	     aliases.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      /* VSHL (immediate).  */
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
+		 aliases.  */
+	      else
+		*cost += extra_cost->alu.shift;
+	    }
 
           /* We can incorporate zero/sign extend for free.  */
           if (GET_CODE (op0) == ZERO_EXTEND
@@ -6102,10 +6178,15 @@ cost_plus:
         }
       else
         {
-	  /* LSLV.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      /* VSHL (register).  */
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      /* LSLV.  */
+	      else
+		*cost += extra_cost->alu.shift_reg;
+	    }
 	  return false;  /* All arguments need to be in registers.  */
         }
 
@@ -6120,18 +6201,27 @@ cost_plus:
 	{
 	  /* ASR (immediate) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.shift;
+	    }
 
 	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
 	  return true;
 	}
       else
 	{
-
-	  /* ASR (register) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      /* VAHR (register).  */
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      /* ASR (register) and friends.  */
+	      else
+		*cost += extra_cost->alu.shift_reg;
+	    }
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
@@ -6179,7 +6269,12 @@ cost_plus:
     case SIGN_EXTRACT:
       /* UBFX/SBFX.  */
       if (speed)
-	*cost += extra_cost->alu.bfx;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.bfx;
+	}
 
       /* We can trust that the immediates used will be correct (there
 	 are no by-register forms), so we need only cost op0.  */
@@ -6196,7 +6291,9 @@ cost_plus:
     case UMOD:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
 	    *cost += (extra_cost->mult[GET_MODE (x) == DImode].add
 		      + extra_cost->mult[GET_MODE (x) == DImode].idiv);
 	  else if (GET_MODE (x) == DFmode)
@@ -6213,7 +6310,9 @@ cost_plus:
     case SQRT:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    /* There is no integer SQRT, so only DIV and UDIV can get
 	       here.  */
 	    *cost += extra_cost->mult[mode == DImode].idiv;
@@ -6245,7 +6344,12 @@ cost_plus:
       op2 = XEXP (x, 2);
 
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].fma;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].fma;
+	}
 
       /* FMSUB, FNMADD, and FNMSUB are free.  */
       if (GET_CODE (op0) == NEG)
@@ -6285,7 +6389,13 @@ cost_plus:
 
     case FLOAT_EXTEND:
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].widen;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    /*Vector convertion.  */
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].widen;
+	}
       return false;
 
     case FLOAT_TRUNCATE:
@@ -6311,8 +6421,13 @@ cost_plus:
         }
 
       if (speed)
-        *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
-
+	{
+	  /* FCVT.  */
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
+	}
       *cost += rtx_cost (x, (enum rtx_code) code, 0, speed);
       return true;
 
@@ -6321,7 +6436,12 @@ cost_plus:
 	{
 	  /* FABS and FNEG are analogous.  */
 	  if (speed)
-	    *cost += extra_cost->fp[mode == DFmode].neg;
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->fp[mode == DFmode].neg;
+	    }
 	}
       else
 	{
@@ -6338,10 +6458,13 @@ cost_plus:
     case SMIN:
       if (speed)
 	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
 	  /* FMAXNM/FMINNM/FMAX/FMIN.
 	     TODO: This may not be accurate for all implementations, but
 	     we do not model this in the cost tables.  */
-	  *cost += extra_cost->fp[mode == DFmode].addsub;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].addsub;
 	}
       return false;
 
diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 3ee7ebf..c8e1d2e 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -124,6 +124,8 @@ struct fp_cost_table
 struct vector_cost_table
 {
   const int alu;
+  const int load;
+  const int store;
 };
 
 struct cpu_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
index 05e96a9..257902c 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -119,7 +119,9 @@ const struct cpu_cost_table generic_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (1),	/* Load.  */
+    COSTS_N_INSNS (1)	/* Store.  */
   }
 };
 
@@ -220,7 +222,9 @@ const struct cpu_cost_table cortexa53_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (1),	/* Load.  */
+    COSTS_N_INSNS (1)	/* Store.  */
   }
 };
 
@@ -321,7 +325,9 @@ const struct cpu_cost_table cortexa57_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (1),  /* Load.  */
+    COSTS_N_INSNS (1)   /* Store.  */
   }
 };
 
@@ -422,7 +428,9 @@ const struct cpu_cost_table xgene1_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (2)  /* alu.  */
+    COSTS_N_INSNS (2),  /* alu.  */
+    COSTS_N_INSNS (1),  /* Load.  */
+    COSTS_N_INSNS (1),  /* Store.  */
   }
 };
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-17 11:19                       ` Kugan
@ 2015-04-17 11:25                         ` Kyrill Tkachov
  2015-04-20 20:22                         ` James Greenhalgh
  1 sibling, 0 replies; 27+ messages in thread
From: Kyrill Tkachov @ 2015-04-17 11:25 UTC (permalink / raw)
  To: Kugan, James Greenhalgh
  Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw, Jim Wilson


On 17/04/15 12:19, Kugan wrote:
> Hi James,
>
> Here is an attempt along this line. Is this what you have in mind?
> Trying to keep functionality as before so that we can tune the
> parameters later. Not fully tested yet.

Hi Kugan,
I'm not doing a full review here, just have a comment inline.

Thanks,
Kyrill

>
> Thanks,
> Kugan
>
> cost.txt
>
>
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
> index ae2b547..ed9432e 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -121,7 +121,9 @@ const struct cpu_cost_table thunderx_extra_costs =
>     },
>     /* Vector */
>     {
> -    COSTS_N_INSNS (1)	/* Alu.  */
> +    COSTS_N_INSNS (1),	/* Alu.  */
> +    COSTS_N_INSNS (1),	/* Load.  */
> +    COSTS_N_INSNS (1)	/* Store.  */
>     }

What about the other CPU vector costs?
Also, tune_params would need updating.

>   };
>   
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index cba3c1a..c2d4a53 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5499,16 +5499,6 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>        above this default.  */
>     *cost = COSTS_N_INSNS (1);
>   
> -  /* TODO: The cost infrastructure currently does not handle
> -     vector operations.  Assume that all vector operations
> -     are equally expensive.  */
> -  if (VECTOR_MODE_P (mode))
> -    {
> -      if (speed)
> -	*cost += extra_cost->vect.alu;
> -      return true;
> -    }
> -
>     switch (code)
>       {
>       case SET:
> @@ -5523,7 +5513,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>   	  if (speed)
>   	    {
>   	      rtx address = XEXP (op0, 0);
> -	      if (GET_MODE_CLASS (mode) == MODE_INT)
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.store;
> +	      else if (GET_MODE_CLASS (mode) == MODE_INT)
>   		*cost += extra_cost->ldst.store;
>   	      else if (mode == SFmode)
>   		*cost += extra_cost->ldst.storef;
> @@ -5544,10 +5536,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>   
>   	  /* Fall through.  */
>   	case REG:
> +	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
> +	    {
> +              /* The cost is 1 per vector-register copied.  */
> +              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
> +			      / GET_MODE_SIZE (V4SImode);
> +              *cost = COSTS_N_INSNS (n_minus_1 + 1);
> +	    }
>   	  /* const0_rtx is in general free, but we will use an
>   	     instruction to set a register to 0.  */
> -          if (REG_P (op1) || op1 == const0_rtx)
> -            {
> +	  else if (REG_P (op1) || op1 == const0_rtx)
> +	    {
>                 /* The cost is 1 per register copied.  */
>                 int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
>   			      / UNITS_PER_WORD;
> @@ -5570,6 +5569,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>   	      && (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
>   		  >= INTVAL (XEXP (op0, 1))))
>   	    op1 = XEXP (op1, 0);
> +	  gcc_assert (!VECTOR_MODE_P (mode));

We shouldn't assert in rtx costs. If some control flow logic gets buggy,
at worst we'd return a wrong rtx cost and make a suboptimal decision.
This shouldn't ever lead to a crash, IMO.

>   
>             if (CONST_INT_P (op1))
>               {
> @@ -5621,8 +5621,10 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>       case CONST_DOUBLE:
>         if (speed)
>   	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
>   	  /* mov[df,sf]_aarch64.  */
> -	  if (aarch64_float_const_representable_p (x))
> +	  else if (aarch64_float_const_representable_p (x))
>   	    /* FMOV (scalar immediate).  */
>   	    *cost += extra_cost->fp[mode == DFmode].fpconst;
>   	  else if (!aarch64_float_const_zero_rtx_p (x))
> @@ -5650,7 +5652,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>   	     approximation for the additional cost of the addressing
>   	     mode.  */
>   	  rtx address = XEXP (x, 0);
> -	  if (GET_MODE_CLASS (mode) == MODE_INT)
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.load;
> +	  else if (GET_MODE_CLASS (mode) == MODE_INT)
>   	    *cost += extra_cost->ldst.load;
>   	  else if (mode == SFmode)
>   	    *cost += extra_cost->ldst.loadf;
> @@ -5705,7 +5709,12 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>       case CLRSB:
>       case CLZ:
>         if (speed)
> -        *cost += extra_cost->alu.clz;
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->alu.clz;
> +	}
>   
>         return false;
>   
> @@ -5790,6 +5799,13 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>             return false;
>           }
>   
> +      /* VCMP.  */
> +      if (VECTOR_MODE_P (mode))
> +	{
> +	  if (speed)
> +	    *cost += extra_cost->vect.alu;
> +	  return true;
> +	}
>         return false;
>   
>       case MINUS:
> @@ -5808,8 +5824,13 @@ cost_minus:
>   	    *cost += rtx_cost (op0, MINUS, 0, speed);
>   
>   	    if (speed)
> -	      /* SUB(S) (immediate).  */
> -	      *cost += extra_cost->alu.arith;
> +	      {
> +		if (VECTOR_MODE_P (mode))
> +		  *cost += extra_cost->vect.alu;
> +		/* SUB(S) (immediate).  */
> +		else
> +		  *cost += extra_cost->alu.arith;
> +	      }
>   	    return true;
>   
>   	  }
> @@ -5818,8 +5839,12 @@ cost_minus:
>           if (aarch64_rtx_arith_op_extract_p (op1, mode))
>   	  {
>   	    if (speed)
> -	      *cost += extra_cost->alu.arith_shift;
> -
> +	      {
> +		if (VECTOR_MODE_P (mode))
> +		  *cost += extra_cost->vect.alu;
> +		else
> +		  *cost += extra_cost->alu.arith_shift;
> +	      }
>   	    *cost += rtx_cost (XEXP (XEXP (op1, 0), 0),
>   			       (enum rtx_code) GET_CODE (op1),
>   			       0, speed);
> @@ -5844,7 +5869,10 @@ cost_minus:
>   
>   	if (speed)
>   	  {
> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
> +	    if (VECTOR_MODE_P (mode))
> +	      /* Vector SUB.  */
> +	      *cost += extra_cost->vect.alu;
> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>   	      /* SUB(S).  */
>   	      *cost += extra_cost->alu.arith;
>   	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
> @@ -5878,8 +5906,13 @@ cost_plus:
>   	    *cost += rtx_cost (op0, PLUS, 0, speed);
>   
>   	    if (speed)
> -	      /* ADD (immediate).  */
> -	      *cost += extra_cost->alu.arith;
> +	      {
> +		if (VECTOR_MODE_P (mode))
> +		  *cost += extra_cost->vect.alu;
> +		/* ADD (immediate).  */
> +		else
> +		  *cost += extra_cost->alu.arith;
> +	      }
>   	    return true;
>   	  }
>   
> @@ -5887,8 +5920,12 @@ cost_plus:
>           if (aarch64_rtx_arith_op_extract_p (op0, mode))
>   	  {
>   	    if (speed)
> -	      *cost += extra_cost->alu.arith_shift;
> -
> +	      {
> +		if (VECTOR_MODE_P (mode))
> +		  *cost += extra_cost->vect.alu;
> +		else
> +		  *cost += extra_cost->alu.arith_shift;
> +	      }
>   	    *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
>   			       (enum rtx_code) GET_CODE (op0),
>   			       0, speed);
> @@ -5913,7 +5950,10 @@ cost_plus:
>   
>   	if (speed)
>   	  {
> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
> +	    if (VECTOR_MODE_P (mode))
> +	      /* Vector ADD.  */
> +	      *cost += extra_cost->vect.alu;
> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>   	      /* ADD.  */
>   	      *cost += extra_cost->alu.arith;
>   	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
> @@ -5927,8 +5967,12 @@ cost_plus:
>         *cost = COSTS_N_INSNS (1);
>   
>         if (speed)
> -        *cost += extra_cost->alu.rev;
> -
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->alu.rev;
> +	}
>         return false;
>   
>       case IOR:
> @@ -5936,10 +5980,14 @@ cost_plus:
>           {
>             *cost = COSTS_N_INSNS (1);
>   
> -          if (speed)
> -            *cost += extra_cost->alu.rev;
> -
> -          return true;
> +	  if (speed)
> +	    {
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      else
> +		*cost += extra_cost->alu.rev;
> +	    }
> +	  return true;
>           }
>       /* Fall through.  */
>       case XOR:
> @@ -5948,6 +5996,13 @@ cost_plus:
>         op0 = XEXP (x, 0);
>         op1 = XEXP (x, 1);
>   
> +      if (VECTOR_MODE_P (mode))
> +	{
> +	  if (speed)
> +	    *cost += extra_cost->vect.alu;
> +	  return true;
> +	}
> +
>         if (code == AND
>             && GET_CODE (op0) == MULT
>             && CONST_INT_P (XEXP (op0, 1))
> @@ -6013,10 +6068,15 @@ cost_plus:
>         return false;
>   
>       case NOT:
> -      /* MVN.  */
>         if (speed)
> -	*cost += extra_cost->alu.logical;
> -
> +	{
> +	  /* VNEG.  */
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  /* MVN.  */
> +	  else
> +	    *cost += extra_cost->alu.logical;
> +	}
>         /* The logical instruction could have the shifted register form,
>            but the cost is the same if the shift is processed as a separate
>            instruction, so we don't bother with it here.  */
> @@ -6057,13 +6117,18 @@ cost_plus:
>   
>         /* UXTB/UXTH.  */
>         if (speed)
> -	*cost += extra_cost->alu.extend;
> -
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->alu.extend;
> +	}
>         return false;
>   
>       case SIGN_EXTEND:
>         if (MEM_P (XEXP (x, 0)))
>   	{
> +	  gcc_assert (!VECTOR_MODE_P (mode));

Same here.

>   	  /* LDRSH.  */
>   	  if (speed)
>   	    {
> @@ -6078,7 +6143,12 @@ cost_plus:
>   	}
>   
>         if (speed)
> -	*cost += extra_cost->alu.extend;
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->alu.extend;
> +	}
>         return false;
>   
>       case ASHIFT:
> @@ -6087,10 +6157,16 @@ cost_plus:
>   
>         if (CONST_INT_P (op1))
>           {
> -	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> -	     aliases.  */
>   	  if (speed)
> -	    *cost += extra_cost->alu.shift;
> +	    {
> +	      /* VSHL (immediate).  */
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> +		 aliases.  */
> +	      else
> +		*cost += extra_cost->alu.shift;
> +	    }
>   
>             /* We can incorporate zero/sign extend for free.  */
>             if (GET_CODE (op0) == ZERO_EXTEND
> @@ -6102,10 +6178,15 @@ cost_plus:
>           }
>         else
>           {
> -	  /* LSLV.  */
>   	  if (speed)
> -	    *cost += extra_cost->alu.shift_reg;
> -
> +	    {
> +	      /* VSHL (register).  */
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      /* LSLV.  */
> +	      else
> +		*cost += extra_cost->alu.shift_reg;
> +	    }
>   	  return false;  /* All arguments need to be in registers.  */
>           }
>   
> @@ -6120,18 +6201,27 @@ cost_plus:
>   	{
>   	  /* ASR (immediate) and friends.  */
>   	  if (speed)
> -	    *cost += extra_cost->alu.shift;
> +	    {
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      else
> +		*cost += extra_cost->alu.shift;
> +	    }
>   
>   	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
>   	  return true;
>   	}
>         else
>   	{
> -
> -	  /* ASR (register) and friends.  */
>   	  if (speed)
> -	    *cost += extra_cost->alu.shift_reg;
> -
> +	    {
> +	      /* VAHR (register).  */
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      /* ASR (register) and friends.  */
> +	      else
> +		*cost += extra_cost->alu.shift_reg;
> +	    }
>   	  return false;  /* All arguments need to be in registers.  */
>   	}
>   
> @@ -6179,7 +6269,12 @@ cost_plus:
>       case SIGN_EXTRACT:
>         /* UBFX/SBFX.  */
>         if (speed)
> -	*cost += extra_cost->alu.bfx;
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->alu.bfx;
> +	}
>   
>         /* We can trust that the immediates used will be correct (there
>   	 are no by-register forms), so we need only cost op0.  */
> @@ -6196,7 +6291,9 @@ cost_plus:
>       case UMOD:
>         if (speed)
>   	{
> -	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
>   	    *cost += (extra_cost->mult[GET_MODE (x) == DImode].add
>   		      + extra_cost->mult[GET_MODE (x) == DImode].idiv);
>   	  else if (GET_MODE (x) == DFmode)
> @@ -6213,7 +6310,9 @@ cost_plus:
>       case SQRT:
>         if (speed)
>   	{
> -	  if (GET_MODE_CLASS (mode) == MODE_INT)
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else if (GET_MODE_CLASS (mode) == MODE_INT)
>   	    /* There is no integer SQRT, so only DIV and UDIV can get
>   	       here.  */
>   	    *cost += extra_cost->mult[mode == DImode].idiv;
> @@ -6245,7 +6344,12 @@ cost_plus:
>         op2 = XEXP (x, 2);
>   
>         if (speed)
> -	*cost += extra_cost->fp[mode == DFmode].fma;
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->fp[mode == DFmode].fma;
> +	}
>   
>         /* FMSUB, FNMADD, and FNMSUB are free.  */
>         if (GET_CODE (op0) == NEG)
> @@ -6285,7 +6389,13 @@ cost_plus:
>   
>       case FLOAT_EXTEND:
>         if (speed)
> -	*cost += extra_cost->fp[mode == DFmode].widen;
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    /*Vector convertion.  */
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->fp[mode == DFmode].widen;
> +	}
>         return false;
>   
>       case FLOAT_TRUNCATE:
> @@ -6311,8 +6421,13 @@ cost_plus:
>           }
>   
>         if (speed)
> -        *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
> -
> +	{
> +	  /* FCVT.  */
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
> +	}
>         *cost += rtx_cost (x, (enum rtx_code) code, 0, speed);
>         return true;
>   
> @@ -6321,7 +6436,12 @@ cost_plus:
>   	{
>   	  /* FABS and FNEG are analogous.  */
>   	  if (speed)
> -	    *cost += extra_cost->fp[mode == DFmode].neg;
> +	    {
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      else
> +		*cost += extra_cost->fp[mode == DFmode].neg;
> +	    }
>   	}
>         else
>   	{
> @@ -6338,10 +6458,13 @@ cost_plus:
>       case SMIN:
>         if (speed)
>   	{
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
>   	  /* FMAXNM/FMINNM/FMAX/FMIN.
>   	     TODO: This may not be accurate for all implementations, but
>   	     we do not model this in the cost tables.  */
> -	  *cost += extra_cost->fp[mode == DFmode].addsub;
> +	  else
> +	    *cost += extra_cost->fp[mode == DFmode].addsub;
>   	}
>         return false;
>   
> diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
> index 3ee7ebf..c8e1d2e 100644
> --- a/gcc/config/arm/aarch-common-protos.h
> +++ b/gcc/config/arm/aarch-common-protos.h
> @@ -124,6 +124,8 @@ struct fp_cost_table
>   struct vector_cost_table
>   {
>     const int alu;
> +  const int load;
> +  const int store;
>   };
>   
>   struct cpu_cost_table
> diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
> index 05e96a9..257902c 100644
> --- a/gcc/config/arm/aarch-cost-tables.h
> +++ b/gcc/config/arm/aarch-cost-tables.h
> @@ -119,7 +119,9 @@ const struct cpu_cost_table generic_extra_costs =
>     },
>     /* Vector */
>     {
> -    COSTS_N_INSNS (1)	/* alu.  */
> +    COSTS_N_INSNS (1),	/* alu.  */
> +    COSTS_N_INSNS (1),	/* Load.  */
> +    COSTS_N_INSNS (1)	/* Store.  */
>     }
>   };
>   
> @@ -220,7 +222,9 @@ const struct cpu_cost_table cortexa53_extra_costs =
>     },
>     /* Vector */
>     {
> -    COSTS_N_INSNS (1)	/* alu.  */
> +    COSTS_N_INSNS (1),	/* alu.  */
> +    COSTS_N_INSNS (1),	/* Load.  */
> +    COSTS_N_INSNS (1)	/* Store.  */
>     }
>   };
>   
> @@ -321,7 +325,9 @@ const struct cpu_cost_table cortexa57_extra_costs =
>     },
>     /* Vector */
>     {
> -    COSTS_N_INSNS (1)  /* alu.  */
> +    COSTS_N_INSNS (1),  /* alu.  */
> +    COSTS_N_INSNS (1),  /* Load.  */
> +    COSTS_N_INSNS (1)   /* Store.  */
>     }
>   };
>   
> @@ -422,7 +428,9 @@ const struct cpu_cost_table xgene1_extra_costs =
>     },
>     /* Vector */
>     {
> -    COSTS_N_INSNS (2)  /* alu.  */
> +    COSTS_N_INSNS (2),  /* alu.  */
> +    COSTS_N_INSNS (1),  /* Load.  */
> +    COSTS_N_INSNS (1),  /* Store.  */
>     }
>   };
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-17 11:19                       ` Kugan
  2015-04-17 11:25                         ` Kyrill Tkachov
@ 2015-04-20 20:22                         ` James Greenhalgh
  2015-04-24 23:26                           ` Kugan
  1 sibling, 1 reply; 27+ messages in thread
From: James Greenhalgh @ 2015-04-20 20:22 UTC (permalink / raw)
  To: Kugan
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

On Fri, Apr 17, 2015 at 12:19:14PM +0100, Kugan wrote:
> >> My point is that adding your patch while keeping the logic at the top
> >> which claims to catch ALL vector operations makes for less readable
> >> code.
> >>
> >> At the very least you'll need to update this comment:
> >>
> >>   /* TODO: The cost infrastructure currently does not handle
> >>      vector operations.  Assume that all vector operations
> >>      are equally expensive.  */
> >>
> >> to make it clear that this doesn't catch vector set operations.
> >>
> >> But fixing the comment doesn't improve the messy code so I'd certainly
> >> prefer to see one of the other approaches which have been discussed.
> > 
> > I see your point. Let me work on this based on your suggestions above.
> 
> Hi James,
> 
> Here is an attempt along this line. Is this what you have in mind?
> Trying to keep functionality as before so that we can tune the
> parameters later. Not fully tested yet.

Hi Kugan,

Sorry to have dropped out of the thread for a while, I'm currently
travelling in the US.

This is along the lines of what I had in mind, thanks for digging through
and doing it. It needs a little polishing, just neaten up the rough edges
of comments and where they sit next to the new if conditionals, and of course,
testing, and I have a few comments below.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
> index ae2b547..ed9432e 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -121,7 +121,9 @@ const struct cpu_cost_table thunderx_extra_costs =
>    },
>    /* Vector */
>    {
> -    COSTS_N_INSNS (1)	/* Alu.  */
> +    COSTS_N_INSNS (1),	/* Alu.  */
> +    COSTS_N_INSNS (1),	/* Load.  */
> +    COSTS_N_INSNS (1)	/* Store.  */
>    }
>  };

Can you push the Load/Stores in to the LD/ST section above and give
them a name like loadv/storev.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index cba3c1a..c2d4a53 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c

<snip>

> @@ -5570,6 +5569,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>  	      && (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
>  		  >= INTVAL (XEXP (op0, 1))))
>  	    op1 = XEXP (op1, 0);
> +	  gcc_assert (!VECTOR_MODE_P (mode));

As Kyrill asked, please drop this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-20 20:22                         ` James Greenhalgh
@ 2015-04-24 23:26                           ` Kugan
  2015-04-24 23:30                             ` [ARM] " Kugan
  2015-05-05  6:17                             ` [AArch64][PR65375] " James Greenhalgh
  0 siblings, 2 replies; 27+ messages in thread
From: Kugan @ 2015-04-24 23:26 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 3321 bytes --]


On 21/04/15 06:22, James Greenhalgh wrote:
> On Fri, Apr 17, 2015 at 12:19:14PM +0100, Kugan wrote:
>>>> My point is that adding your patch while keeping the logic at the top
>>>> which claims to catch ALL vector operations makes for less readable
>>>> code.
>>>>
>>>> At the very least you'll need to update this comment:
>>>>
>>>>   /* TODO: The cost infrastructure currently does not handle
>>>>      vector operations.  Assume that all vector operations
>>>>      are equally expensive.  */
>>>>
>>>> to make it clear that this doesn't catch vector set operations.
>>>>
>>>> But fixing the comment doesn't improve the messy code so I'd certainly
>>>> prefer to see one of the other approaches which have been discussed.
>>>
>>> I see your point. Let me work on this based on your suggestions above.
>>
>> Hi James,
>>
>> Here is an attempt along this line. Is this what you have in mind?
>> Trying to keep functionality as before so that we can tune the
>> parameters later. Not fully tested yet.
> 
> Hi Kugan,
> 
> Sorry to have dropped out of the thread for a while, I'm currently
> travelling in the US.
> 
> This is along the lines of what I had in mind, thanks for digging through
> and doing it. It needs a little polishing, just neaten up the rough edges
> of comments and where they sit next to the new if conditionals, and of course,
> testing, and I have a few comments below.
> 
> Thanks,
> James
> 
>> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
>> index ae2b547..ed9432e 100644
>> --- a/gcc/config/aarch64/aarch64-cost-tables.h
>> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
>> @@ -121,7 +121,9 @@ const struct cpu_cost_table thunderx_extra_costs =
>>    },
>>    /* Vector */
>>    {
>> -    COSTS_N_INSNS (1)	/* Alu.  */
>> +    COSTS_N_INSNS (1),	/* Alu.  */
>> +    COSTS_N_INSNS (1),	/* Load.  */
>> +    COSTS_N_INSNS (1)	/* Store.  */
>>    }
>>  };
> 
> Can you push the Load/Stores in to the LD/ST section above and give
> them a name like loadv/storev.
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index cba3c1a..c2d4a53 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
> 
> <snip>
> 
>> @@ -5570,6 +5569,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>>  	      && (GET_MODE_BITSIZE (GET_MODE (XEXP (op1, 0)))
>>  		  >= INTVAL (XEXP (op0, 1))))
>>  	    op1 = XEXP (op1, 0);
>> +	  gcc_assert (!VECTOR_MODE_P (mode));
> 
> As Kyrill asked, please drop this.



Thanks for the review. I have updated the patch based on the comments
with some other minor changes. Bootstrapped and regression tested on
aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?


Thanks,
Kugan


gcc/ChangeLog:

2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
	    Jim Wilson  <jim.wilson@linaro.org>

	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
	new  fields loadv and storev.
	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
	Initialize loadv and storev.
	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
	(cortexa53_extra_costs): Likewise.
	(cortexa57_extra_costs): Likewise.
	(xgene1_extra_costs): Likewise.
	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
	rtx_costs.

[-- Attachment #2: cost2.txt --]
[-- Type: text/plain, Size: 14656 bytes --]

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
index ae2b547..939125c 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -83,7 +83,9 @@ const struct cpu_cost_table thunderx_extra_costs =
     0,			/* N/A: Stm_regs_per_insn_subsequent.  */
     0,			/* Storef.  */
     0,			/* Stored.  */
-    COSTS_N_INSNS (1)  /* Store_unaligned.  */
+    COSTS_N_INSNS (1),	/* Store_unaligned.  */
+    COSTS_N_INSNS (1),	/* Loadv.  */
+    COSTS_N_INSNS (1)	/* Storev.  */
   },
   {
     /* FP SFmode */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..13425fc 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5499,16 +5499,6 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
      above this default.  */
   *cost = COSTS_N_INSNS (1);
 
-  /* TODO: The cost infrastructure currently does not handle
-     vector operations.  Assume that all vector operations
-     are equally expensive.  */
-  if (VECTOR_MODE_P (mode))
-    {
-      if (speed)
-	*cost += extra_cost->vect.alu;
-      return true;
-    }
-
   switch (code)
     {
     case SET:
@@ -5523,7 +5513,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	  if (speed)
 	    {
 	      rtx address = XEXP (op0, 0);
-	      if (GET_MODE_CLASS (mode) == MODE_INT)
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->ldst.storev;
+	      else if (GET_MODE_CLASS (mode) == MODE_INT)
 		*cost += extra_cost->ldst.store;
 	      else if (mode == SFmode)
 		*cost += extra_cost->ldst.storef;
@@ -5544,15 +5536,22 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  /* The cost is one per vector-register copied.  */
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+	      int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+	      *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
-              /* The cost is 1 per register copied.  */
-              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
+	      /* The cost is 1 per register copied.  */
+	      int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;
-              *cost = COSTS_N_INSNS (n_minus_1 + 1);
-            }
+	      *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
           else
 	    /* Cost is just the cost of the RHS of the set.  */
 	    *cost += rtx_cost (op1, SET, 1, speed);
@@ -5650,7 +5649,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	     approximation for the additional cost of the addressing
 	     mode.  */
 	  rtx address = XEXP (x, 0);
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->ldst.loadv;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    *cost += extra_cost->ldst.load;
 	  else if (mode == SFmode)
 	    *cost += extra_cost->ldst.loadf;
@@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case NEG:
       op0 = XEXP (x, 0);
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    /* FNEG.  */
+	    *cost += extra_cost->vect.alu;
+	  return false;
+	}
+
       if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
        {
           if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
@@ -5705,7 +5714,12 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case CLRSB:
     case CLZ:
       if (speed)
-        *cost += extra_cost->alu.clz;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.clz;
+	}
 
       return false;
 
@@ -5790,6 +5804,20 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
           return false;
         }
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  /* Vector compare.  */
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+
+	  if (aarch64_float_const_zero_rtx_p (op1))
+	    {
+	      /* Vector cm (eq|ge|gt|lt|le) supports constant 0.0 for no extra
+		 cost.  */
+	      return true;
+	    }
+	  return false;
+	}
       return false;
 
     case MINUS:
@@ -5844,7 +5872,10 @@ cost_minus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
+	    if (VECTOR_MODE_P (mode))
+	      /* Vector SUB.  */
+	      *cost += extra_cost->vect.alu;
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
 	      /* SUB(S).  */
 	      *cost += extra_cost->alu.arith;
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
@@ -5888,7 +5919,6 @@ cost_plus:
 	  {
 	    if (speed)
 	      *cost += extra_cost->alu.arith_shift;
-
 	    *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
 			       (enum rtx_code) GET_CODE (op0),
 			       0, speed);
@@ -5913,7 +5943,10 @@ cost_plus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
+	    if (VECTOR_MODE_P (mode))
+	      /* Vector ADD.  */
+	      *cost += extra_cost->vect.alu;
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
 	      /* ADD.  */
 	      *cost += extra_cost->alu.arith;
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
@@ -5927,8 +5960,12 @@ cost_plus:
       *cost = COSTS_N_INSNS (1);
 
       if (speed)
-        *cost += extra_cost->alu.rev;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.rev;
+	}
       return false;
 
     case IOR:
@@ -5936,10 +5973,14 @@ cost_plus:
         {
           *cost = COSTS_N_INSNS (1);
 
-          if (speed)
-            *cost += extra_cost->alu.rev;
-
-          return true;
+	  if (speed)
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.rev;
+	    }
+	  return true;
         }
     /* Fall through.  */
     case XOR:
@@ -5948,6 +5989,13 @@ cost_plus:
       op0 = XEXP (x, 0);
       op1 = XEXP (x, 1);
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	  return true;
+	}
+
       if (code == AND
           && GET_CODE (op0) == MULT
           && CONST_INT_P (XEXP (op0, 1))
@@ -6013,10 +6061,15 @@ cost_plus:
       return false;
 
     case NOT:
-      /* MVN.  */
       if (speed)
-	*cost += extra_cost->alu.logical;
-
+	{
+	  /* Vector NOT.  */
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  /* MVN.  */
+	  else
+	    *cost += extra_cost->alu.logical;
+	}
       /* The logical instruction could have the shifted register form,
          but the cost is the same if the shift is processed as a separate
          instruction, so we don't bother with it here.  */
@@ -6055,10 +6108,15 @@ cost_plus:
 	  return true;
 	}
 
-      /* UXTB/UXTH.  */
       if (speed)
-	*cost += extra_cost->alu.extend;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    /* UMOV.  */
+	    *cost += extra_cost->vect.alu;
+	  else
+	    /* UXTB/UXTH.  */
+	    *cost += extra_cost->alu.extend;
+	}
       return false;
 
     case SIGN_EXTEND:
@@ -6078,7 +6136,12 @@ cost_plus:
 	}
 
       if (speed)
-	*cost += extra_cost->alu.extend;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.extend;
+	}
       return false;
 
     case ASHIFT:
@@ -6087,10 +6150,16 @@ cost_plus:
 
       if (CONST_INT_P (op1))
         {
-	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
-	     aliases.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      /* Vector shift (immediate).  */
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
+		 aliases.  */
+	      else
+		*cost += extra_cost->alu.shift;
+	    }
 
           /* We can incorporate zero/sign extend for free.  */
           if (GET_CODE (op0) == ZERO_EXTEND
@@ -6102,10 +6171,15 @@ cost_plus:
         }
       else
         {
-	  /* LSLV.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      /* Vector shift (register).  */
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      /* LSLV.  */
+	      else
+		*cost += extra_cost->alu.shift_reg;
+	    }
 	  return false;  /* All arguments need to be in registers.  */
         }
 
@@ -6120,7 +6194,12 @@ cost_plus:
 	{
 	  /* ASR (immediate) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.shift;
+	    }
 
 	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
 	  return true;
@@ -6130,8 +6209,12 @@ cost_plus:
 
 	  /* ASR (register) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.shift_reg;
+	    }
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
@@ -6179,7 +6262,12 @@ cost_plus:
     case SIGN_EXTRACT:
       /* UBFX/SBFX.  */
       if (speed)
-	*cost += extra_cost->alu.bfx;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.bfx;
+	}
 
       /* We can trust that the immediates used will be correct (there
 	 are no by-register forms), so we need only cost op0.  */
@@ -6196,7 +6284,9 @@ cost_plus:
     case UMOD:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
 	    *cost += (extra_cost->mult[GET_MODE (x) == DImode].add
 		      + extra_cost->mult[GET_MODE (x) == DImode].idiv);
 	  else if (GET_MODE (x) == DFmode)
@@ -6213,7 +6303,9 @@ cost_plus:
     case SQRT:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    /* There is no integer SQRT, so only DIV and UDIV can get
 	       here.  */
 	    *cost += extra_cost->mult[mode == DImode].idiv;
@@ -6245,7 +6337,12 @@ cost_plus:
       op2 = XEXP (x, 2);
 
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].fma;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].fma;
+	}
 
       /* FMSUB, FNMADD, and FNMSUB are free.  */
       if (GET_CODE (op0) == NEG)
@@ -6285,12 +6382,24 @@ cost_plus:
 
     case FLOAT_EXTEND:
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].widen;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    /*Vector truncate.  */
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].widen;
+	}
       return false;
 
     case FLOAT_TRUNCATE:
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].narrow;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    /*Vector conversion.  */
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].narrow;
+	}
       return false;
 
     case FIX:
@@ -6311,13 +6420,23 @@ cost_plus:
         }
 
       if (speed)
-        *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
+	}
       *cost += rtx_cost (x, (enum rtx_code) code, 0, speed);
       return true;
 
     case ABS:
-      if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+      if (VECTOR_MODE_P (mode))
+	{
+	  /* ABS (vector).  */
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	}
+      else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
 	{
 	  /* FABS and FNEG are analogous.  */
 	  if (speed)
@@ -6338,10 +6457,13 @@ cost_plus:
     case SMIN:
       if (speed)
 	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
 	  /* FMAXNM/FMINNM/FMAX/FMIN.
 	     TODO: This may not be accurate for all implementations, but
 	     we do not model this in the cost tables.  */
-	  *cost += extra_cost->fp[mode == DFmode].addsub;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].addsub;
 	}
       return false;
 
diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 3ee7ebf..29f7c99 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -102,6 +102,8 @@ struct mem_cost_table
   const int storef;		/* SFmode.  */
   const int stored;		/* DFmode.  */
   const int store_unaligned;	/* Extra for unaligned stores.  */
+  const int loadv;		/* Vector load.  */
+  const int storev;		/* Vector store.  */
 };
 
 struct fp_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
index 05e96a9..809feb8 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -81,7 +81,9 @@ const struct cpu_cost_table generic_extra_costs =
     1,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (3),	/* stored.  */
-    COSTS_N_INSNS (1)  /* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -182,7 +184,9 @@ const struct cpu_cost_table cortexa53_extra_costs =
     2,				/* stm_regs_per_insn_subsequent.  */
     0,				/* storef.  */
     0,				/* stored.  */
-    COSTS_N_INSNS (1)		/* store_unaligned.  */
+    COSTS_N_INSNS (1),		/* store_unaligned.  */
+    COSTS_N_INSNS (1),		/* loadv.  */
+    COSTS_N_INSNS (1)		/* storev.  */
   },
   {
     /* FP SFmode */
@@ -283,7 +287,9 @@ const struct cpu_cost_table cortexa57_extra_costs =
     2,                         /* stm_regs_per_insn_subsequent.  */
     0,                         /* storef.  */
     0,                         /* stored.  */
-    COSTS_N_INSNS (1)          /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* loadv.  */
+    COSTS_N_INSNS (1)          /* storev.  */
   },
   {
     /* FP SFmode */
@@ -385,6 +391,8 @@ const struct cpu_cost_table xgene1_extra_costs =
     0,                         /* storef.  */
     0,                         /* stored.  */
     0,                         /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* loadv.  */
+    COSTS_N_INSNS (1)          /* storev.  */
   },
   {
     /* FP SFmode */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [ARM] Fix RTX cost for vector SET
  2015-04-24 23:26                           ` Kugan
@ 2015-04-24 23:30                             ` Kugan
  2015-04-27 11:02                               ` Kyrill Tkachov
  2015-05-05  6:17                             ` [AArch64][PR65375] " James Greenhalgh
  1 sibling, 1 reply; 27+ messages in thread
From: Kugan @ 2015-04-24 23:30 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

> 
> Thanks for the review. I have updated the patch based on the comments
> with some other minor changes. Bootstrapped and regression tested on
> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
> 
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
> 	    Jim Wilson  <jim.wilson@linaro.org>
> 
> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
> 	new  fields loadv and storev.
> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> 	Initialize loadv and storev.
> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
> 	(cortexa53_extra_costs): Likewise.
> 	(cortexa57_extra_costs): Likewise.
> 	(xgene1_extra_costs): Likewise.
> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
> 	rtx_costs.
> 

Due to the struct mem_cost_table update for aarch64, arm cost tables
also need to be updated. Please find the patch that does this.
Regression tested on arm-none-linux-gnu with no-new regressions. Is this
OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2015-04-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

	* config/arm/arm.c (cortexa9_extra_costs): Initialize loadv and
	 storev.
	(cortexa8_extra_costs): Likewise.
	(cortexa5_extra_costs): Likewise.
	(cortexa7_extra_costs): Likewise.
	(cortexa12_extra_costs): Likewise.
	(cortexa15_extra_costs): Likewise.
	(v7m_extra_costs): Likewise.

[-- Attachment #2: arm_rtx_cost.txt --]
[-- Type: text/plain, Size: 2863 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6826c78..d43239a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1027,7 +1027,9 @@ const struct cpu_cost_table cortexa9_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (1),	/* storef.  */
     COSTS_N_INSNS (1),	/* stored.  */
-    COSTS_N_INSNS (1)	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1128,7 +1130,9 @@ const struct cpu_cost_table cortexa8_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (1),	/* storef.  */
     COSTS_N_INSNS (1),	/* stored.  */
-    COSTS_N_INSNS (1)	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1230,7 +1234,9 @@ const struct cpu_cost_table cortexa5_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (2),	/* stored.  */
-    COSTS_N_INSNS (1)	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1333,7 +1339,9 @@ const struct cpu_cost_table cortexa7_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (2),	/* stored.  */
-    COSTS_N_INSNS (1)	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1434,7 +1442,9 @@ const struct cpu_cost_table cortexa12_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (2),	/* stored.  */
-    0			/* store_unaligned.  */
+    0,			/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1535,7 +1545,9 @@ const struct cpu_cost_table cortexa15_extra_costs =
     2,			/* stm_regs_per_insn_subsequent.  */
     0,			/* storef.  */
     0,			/* stored.  */
-    0			/* store_unaligned.  */
+    0,			/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -1636,7 +1648,9 @@ const struct cpu_cost_table v7m_extra_costs =
     1,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (3),	/* stored.  */
-    COSTS_N_INSNS (1)  /* store_unaligned.  */
+    COSTS_N_INSNS (1),  /* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ARM] Fix RTX cost for vector SET
  2015-04-24 23:30                             ` [ARM] " Kugan
@ 2015-04-27 11:02                               ` Kyrill Tkachov
  0 siblings, 0 replies; 27+ messages in thread
From: Kyrill Tkachov @ 2015-04-27 11:02 UTC (permalink / raw)
  To: Kugan, James Greenhalgh
  Cc: gcc-patches, Marcus Shawcroft, Richard Earnshaw, Jim Wilson,
	Ramana Radhakrishnan

Hi Kugan,

On 25/04/15 00:30, Kugan wrote:
>> Thanks for the review. I have updated the patch based on the comments
>> with some other minor changes. Bootstrapped and regression tested on
>> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
>>
>>
>> Thanks,
>> Kugan
>>
>>
>> gcc/ChangeLog:
>>
>> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
>> 	    Jim Wilson  <jim.wilson@linaro.org>
>>
>> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
>> 	new  fields loadv and storev.
>> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
>> 	Initialize loadv and storev.
>> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
>> 	(cortexa53_extra_costs): Likewise.
>> 	(cortexa57_extra_costs): Likewise.
>> 	(xgene1_extra_costs): Likewise.
>> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
>> 	rtx_costs.
>>
> Due to the struct mem_cost_table update for aarch64, arm cost tables
> also need to be updated. Please find the patch that does this.
> Regression tested on arm-none-linux-gnu with no-new regressions. Is this
> OK for trunk?
>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2015-04-25  Kugan Vivekanandarajah  <kuganv@linaro.org>
>
> 	* config/arm/arm.c (cortexa9_extra_costs): Initialize loadv and
> 	 storev.
> 	(cortexa8_extra_costs): Likewise.
> 	(cortexa5_extra_costs): Likewise.
> 	(cortexa7_extra_costs): Likewise.
> 	(cortexa12_extra_costs): Likewise.
> 	(cortexa15_extra_costs): Likewise.
> 	(v7m_extra_costs): Likewise.

This arm part looks ok to me FWIW (if the approach in the aarch64
patch is deemed ok by the aarch64 maintainers), it just syncs the
fields of the common cost struct between arm and aarch64.

Please only commit this if the aarch64 patch gets approved and
committed at the same time, otherwise we'll get a build failure.
Having a look at the arm rtx costs and seeing if we can improve them
in the same way as the aarch64 ones would be good as a follow up at some
point too.

Thanks,
Kyrill



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-04-24 23:26                           ` Kugan
  2015-04-24 23:30                             ` [ARM] " Kugan
@ 2015-05-05  6:17                             ` James Greenhalgh
  2015-05-06  2:12                               ` Kugan
  1 sibling, 1 reply; 27+ messages in thread
From: James Greenhalgh @ 2015-05-05  6:17 UTC (permalink / raw)
  To: Kugan
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

On Sat, Apr 25, 2015 at 12:26:16AM +0100, Kugan wrote:
> 
> Thanks for the review. I have updated the patch based on the comments
> with some other minor changes. Bootstrapped and regression tested on
> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
> 
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
> 	    Jim Wilson  <jim.wilson@linaro.org>
> 
> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
> 	new  fields loadv and storev.
> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> 	Initialize loadv and storev.
> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
> 	(cortexa53_extra_costs): Likewise.
> 	(cortexa57_extra_costs): Likewise.
> 	(xgene1_extra_costs): Likewise.
> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
> 	rtx_costs.

Hi Kugan,

Just a few syle comments, regarding the placements of comments in single-line
if statements. I know the current code does not neccesarily always follow the
comments below, I'll write a patch cleaning that up at some point when I'm back
at my desk.

Thanks,
James

> @@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>      case NEG:
>        op0 = XEXP (x, 0);
>  
> +      if (VECTOR_MODE_P (mode))
> +	{
> +	  if (speed)
> +	    /* FNEG.  */
> +	    *cost += extra_cost->vect.alu;
> +	  return false;
> +	}
> +
>        if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
>         {
>            if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE

Personally, I find commented if statements without braces hard to
quickly parse. Something like this is much faster for me:

	  if (speed)
	    {
	      /* FNEG.  */
	      *cost += extra_cost->vect.alu;
	    }

> @@ -5844,7 +5872,10 @@ cost_minus:
>  
>  	if (speed)
>  	  {
> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
> +	    if (VECTOR_MODE_P (mode))
> +	      /* Vector SUB.  */
> +	      *cost += extra_cost->vect.alu;
> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>  	      /* SUB(S).  */
>  	      *cost += extra_cost->alu.arith;
>  	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)

As above.

> @@ -5888,7 +5919,6 @@ cost_plus:
>  	  {
>  	    if (speed)
>  	      *cost += extra_cost->alu.arith_shift;
> -
>  	    *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
>  			       (enum rtx_code) GET_CODE (op0),
>  			       0, speed);

Drop this whitespace change.

> @@ -5913,7 +5943,10 @@ cost_plus:
>  
>  	if (speed)
>  	  {
> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
> +	    if (VECTOR_MODE_P (mode))
> +	      /* Vector ADD.  */
> +	      *cost += extra_cost->vect.alu;
> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>  	      /* ADD.  */
>  	      *cost += extra_cost->alu.arith;
>  	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)

As above.

> @@ -6013,10 +6061,15 @@ cost_plus:
>        return false;
>  
>      case NOT:
> -      /* MVN.  */
>        if (speed)
> -	*cost += extra_cost->alu.logical;
> -
> +	{
> +	  /* Vector NOT.  */
> +	  if (VECTOR_MODE_P (mode))
> +	    *cost += extra_cost->vect.alu;
> +	  /* MVN.  */
> +	  else
> +	    *cost += extra_cost->alu.logical;
> +	}
>        /* The logical instruction could have the shifted register form,
>           but the cost is the same if the shift is processed as a separate
>           instruction, so we don't bother with it here.  */

As above.

> @@ -6055,10 +6108,15 @@ cost_plus:
>  	  return true;
>  	}
>  
> -      /* UXTB/UXTH.  */
>        if (speed)
> -	*cost += extra_cost->alu.extend;
> -
> +	{
> +	  if (VECTOR_MODE_P (mode))
> +	    /* UMOV.  */
> +	    *cost += extra_cost->vect.alu;
> +	  else
> +	    /* UXTB/UXTH.  */
> +	    *cost += extra_cost->alu.extend;
> +	}
>        return false;
>  
>      ca§se SIGN_EXTEND:

And again :)

> @@ -6087,10 +6150,16 @@ cost_plus:
>  
>        if (CONST_INT_P (op1))
>          {
> -	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> -	     aliases.  */
>  	  if (speed)
> -	    *cost += extra_cost->alu.shift;
> +	    {
> +	      /* Vector shift (immediate).  */
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> +		 aliases.  */
> +	      else
> +		*cost += extra_cost->alu.shift;
> +	    }
>  
>            /* We can incorporate zero/sign extend for free.  */
>            if (GET_CODE (op0) == ZERO_EXTEND

Again, the comment here makes it very difficult to spot the form of
the if/else statement.

> @@ -6102,10 +6171,15 @@ cost_plus:
>          }
>        else
>          {
> -	  /* LSLV.  */
>  	  if (speed)
> -	    *cost += extra_cost->alu.shift_reg;
> -
> +	    {
> +	      /* Vector shift (register).  */
> +	      if (VECTOR_MODE_P (mode))
> +		*cost += extra_cost->vect.alu;
> +	      /* LSLV.  */
> +	      else
> +		*cost += extra_cost->alu.shift_reg;
> +	    }
>  	  return false;  /* All arguments need to be in registers.  */
>          }
>  

Likewise here.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-05-05  6:17                             ` [AArch64][PR65375] " James Greenhalgh
@ 2015-05-06  2:12                               ` Kugan
  2015-05-07  7:24                                 ` James Greenhalgh
  0 siblings, 1 reply; 27+ messages in thread
From: Kugan @ 2015-05-06  2:12 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

[-- Attachment #1: Type: text/plain, Size: 5385 bytes --]



On 05/05/15 16:17, James Greenhalgh wrote:
> On Sat, Apr 25, 2015 at 12:26:16AM +0100, Kugan wrote:
>>
>> Thanks for the review. I have updated the patch based on the comments
>> with some other minor changes. Bootstrapped and regression tested on
>> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
>>
>>
>> Thanks,
>> Kugan
>>
>>
>> gcc/ChangeLog:
>>
>> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
>> 	    Jim Wilson  <jim.wilson@linaro.org>
>>
>> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
>> 	new  fields loadv and storev.
>> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
>> 	Initialize loadv and storev.
>> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
>> 	(cortexa53_extra_costs): Likewise.
>> 	(cortexa57_extra_costs): Likewise.
>> 	(xgene1_extra_costs): Likewise.
>> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
>> 	rtx_costs.
> 
> Hi Kugan,
> 
> Just a few syle comments, regarding the placements of comments in single-line
> if statements. I know the current code does not neccesarily always follow the
> comments below, I'll write a patch cleaning that up at some point when I'm back
> at my desk.
> 
> Thanks,
> James
> 
>> @@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
>>      case NEG:
>>        op0 = XEXP (x, 0);
>>  
>> +      if (VECTOR_MODE_P (mode))
>> +	{
>> +	  if (speed)
>> +	    /* FNEG.  */
>> +	    *cost += extra_cost->vect.alu;
>> +	  return false;
>> +	}
>> +
>>        if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
>>         {
>>            if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
> 
> Personally, I find commented if statements without braces hard to
> quickly parse. Something like this is much faster for me:
> 
> 	  if (speed)
> 	    {
> 	      /* FNEG.  */
> 	      *cost += extra_cost->vect.alu;
> 	    }
> 
>> @@ -5844,7 +5872,10 @@ cost_minus:
>>  
>>  	if (speed)
>>  	  {
>> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
>> +	    if (VECTOR_MODE_P (mode))
>> +	      /* Vector SUB.  */
>> +	      *cost += extra_cost->vect.alu;
>> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>>  	      /* SUB(S).  */
>>  	      *cost += extra_cost->alu.arith;
>>  	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
> 
> As above.
> 
>> @@ -5888,7 +5919,6 @@ cost_plus:
>>  	  {
>>  	    if (speed)
>>  	      *cost += extra_cost->alu.arith_shift;
>> -
>>  	    *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
>>  			       (enum rtx_code) GET_CODE (op0),
>>  			       0, speed);
> 
> Drop this whitespace change.
> 
>> @@ -5913,7 +5943,10 @@ cost_plus:
>>  
>>  	if (speed)
>>  	  {
>> -	    if (GET_MODE_CLASS (mode) == MODE_INT)
>> +	    if (VECTOR_MODE_P (mode))
>> +	      /* Vector ADD.  */
>> +	      *cost += extra_cost->vect.alu;
>> +	    else if (GET_MODE_CLASS (mode) == MODE_INT)
>>  	      /* ADD.  */
>>  	      *cost += extra_cost->alu.arith;
>>  	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
> 
> As above.
> 
>> @@ -6013,10 +6061,15 @@ cost_plus:
>>        return false;
>>  
>>      case NOT:
>> -      /* MVN.  */
>>        if (speed)
>> -	*cost += extra_cost->alu.logical;
>> -
>> +	{
>> +	  /* Vector NOT.  */
>> +	  if (VECTOR_MODE_P (mode))
>> +	    *cost += extra_cost->vect.alu;
>> +	  /* MVN.  */
>> +	  else
>> +	    *cost += extra_cost->alu.logical;
>> +	}
>>        /* The logical instruction could have the shifted register form,
>>           but the cost is the same if the shift is processed as a separate
>>           instruction, so we don't bother with it here.  */
> 
> As above.
> 
>> @@ -6055,10 +6108,15 @@ cost_plus:
>>  	  return true;
>>  	}
>>  
>> -      /* UXTB/UXTH.  */
>>        if (speed)
>> -	*cost += extra_cost->alu.extend;
>> -
>> +	{
>> +	  if (VECTOR_MODE_P (mode))
>> +	    /* UMOV.  */
>> +	    *cost += extra_cost->vect.alu;
>> +	  else
>> +	    /* UXTB/UXTH.  */
>> +	    *cost += extra_cost->alu.extend;
>> +	}
>>        return false;
>>  
>>      ca§se SIGN_EXTEND:
> 
> And again :)
> 
>> @@ -6087,10 +6150,16 @@ cost_plus:
>>  
>>        if (CONST_INT_P (op1))
>>          {
>> -	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
>> -	     aliases.  */
>>  	  if (speed)
>> -	    *cost += extra_cost->alu.shift;
>> +	    {
>> +	      /* Vector shift (immediate).  */
>> +	      if (VECTOR_MODE_P (mode))
>> +		*cost += extra_cost->vect.alu;
>> +	      /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
>> +		 aliases.  */
>> +	      else
>> +		*cost += extra_cost->alu.shift;
>> +	    }
>>  
>>            /* We can incorporate zero/sign extend for free.  */
>>            if (GET_CODE (op0) == ZERO_EXTEND
> 
> Again, the comment here makes it very difficult to spot the form of
> the if/else statement.
> 
>> @@ -6102,10 +6171,15 @@ cost_plus:
>>          }
>>        else
>>          {
>> -	  /* LSLV.  */
>>  	  if (speed)
>> -	    *cost += extra_cost->alu.shift_reg;
>> -
>> +	    {
>> +	      /* Vector shift (register).  */
>> +	      if (VECTOR_MODE_P (mode))
>> +		*cost += extra_cost->vect.alu;
>> +	      /* LSLV.  */
>> +	      else
>> +		*cost += extra_cost->alu.shift_reg;
>> +	    }
>>  	  return false;  /* All arguments need to be in registers.  */
>>          }
>>  
> 
> Likewise here.
> 
> 

Thanks James for the review. Attached patch changes this. Is this OK ?


Thanks,
Kugan

[-- Attachment #2: cost3.txt --]
[-- Type: text/plain, Size: 15355 bytes --]

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
index ae2b547..939125c 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -83,7 +83,9 @@ const struct cpu_cost_table thunderx_extra_costs =
     0,			/* N/A: Stm_regs_per_insn_subsequent.  */
     0,			/* Storef.  */
     0,			/* Stored.  */
-    COSTS_N_INSNS (1)  /* Store_unaligned.  */
+    COSTS_N_INSNS (1),	/* Store_unaligned.  */
+    COSTS_N_INSNS (1),	/* Loadv.  */
+    COSTS_N_INSNS (1)	/* Storev.  */
   },
   {
     /* FP SFmode */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cba3c1a..586caaf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5499,16 +5499,6 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
      above this default.  */
   *cost = COSTS_N_INSNS (1);
 
-  /* TODO: The cost infrastructure currently does not handle
-     vector operations.  Assume that all vector operations
-     are equally expensive.  */
-  if (VECTOR_MODE_P (mode))
-    {
-      if (speed)
-	*cost += extra_cost->vect.alu;
-      return true;
-    }
-
   switch (code)
     {
     case SET:
@@ -5523,7 +5513,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	  if (speed)
 	    {
 	      rtx address = XEXP (op0, 0);
-	      if (GET_MODE_CLASS (mode) == MODE_INT)
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->ldst.storev;
+	      else if (GET_MODE_CLASS (mode) == MODE_INT)
 		*cost += extra_cost->ldst.store;
 	      else if (mode == SFmode)
 		*cost += extra_cost->ldst.storef;
@@ -5544,15 +5536,22 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 
 	  /* Fall through.  */
 	case REG:
+	  /* The cost is one per vector-register copied.  */
+	  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
+	    {
+	      int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+			      / GET_MODE_SIZE (V4SImode);
+	      *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
 	  /* const0_rtx is in general free, but we will use an
 	     instruction to set a register to 0.  */
-          if (REG_P (op1) || op1 == const0_rtx)
-            {
-              /* The cost is 1 per register copied.  */
-              int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
+	  else if (REG_P (op1) || op1 == const0_rtx)
+	    {
+	      /* The cost is 1 per register copied.  */
+	      int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
 			      / UNITS_PER_WORD;
-              *cost = COSTS_N_INSNS (n_minus_1 + 1);
-            }
+	      *cost = COSTS_N_INSNS (n_minus_1 + 1);
+	    }
           else
 	    /* Cost is just the cost of the RHS of the set.  */
 	    *cost += rtx_cost (op1, SET, 1, speed);
@@ -5650,7 +5649,9 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	     approximation for the additional cost of the addressing
 	     mode.  */
 	  rtx address = XEXP (x, 0);
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->ldst.loadv;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    *cost += extra_cost->ldst.load;
 	  else if (mode == SFmode)
 	    *cost += extra_cost->ldst.loadf;
@@ -5667,6 +5668,16 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case NEG:
       op0 = XEXP (x, 0);
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    {
+	      /* FNEG.  */
+	      *cost += extra_cost->vect.alu;
+	    }
+	  return false;
+	}
+
       if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
        {
           if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
@@ -5705,7 +5716,12 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
     case CLRSB:
     case CLZ:
       if (speed)
-        *cost += extra_cost->alu.clz;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.clz;
+	}
 
       return false;
 
@@ -5790,6 +5806,20 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
           return false;
         }
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  /* Vector compare.  */
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+
+	  if (aarch64_float_const_zero_rtx_p (op1))
+	    {
+	      /* Vector cm (eq|ge|gt|lt|le) supports constant 0.0 for no extra
+		 cost.  */
+	      return true;
+	    }
+	  return false;
+	}
       return false;
 
     case MINUS:
@@ -5844,12 +5874,21 @@ cost_minus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
-	      /* SUB(S).  */
-	      *cost += extra_cost->alu.arith;
+	    if (VECTOR_MODE_P (mode))
+	      {
+		/* Vector SUB.  */
+		*cost += extra_cost->vect.alu;
+	      }
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
+	      {
+		/* SUB(S).  */
+		*cost += extra_cost->alu.arith;
+	      }
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
-	      /* FSUB.  */
-	      *cost += extra_cost->fp[mode == DFmode].addsub;
+	      {
+		/* FSUB.  */
+		*cost += extra_cost->fp[mode == DFmode].addsub;
+	      }
 	  }
 	return true;
       }
@@ -5913,12 +5952,21 @@ cost_plus:
 
 	if (speed)
 	  {
-	    if (GET_MODE_CLASS (mode) == MODE_INT)
-	      /* ADD.  */
-	      *cost += extra_cost->alu.arith;
+	    if (VECTOR_MODE_P (mode))
+	      {
+		/* Vector ADD.  */
+		*cost += extra_cost->vect.alu;
+	      }
+	    else if (GET_MODE_CLASS (mode) == MODE_INT)
+	      {
+		/* ADD.  */
+		*cost += extra_cost->alu.arith;
+	      }
 	    else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
-	      /* FADD.  */
-	      *cost += extra_cost->fp[mode == DFmode].addsub;
+	      {
+		/* FADD.  */
+		*cost += extra_cost->fp[mode == DFmode].addsub;
+	      }
 	  }
 	return true;
       }
@@ -5927,8 +5975,12 @@ cost_plus:
       *cost = COSTS_N_INSNS (1);
 
       if (speed)
-        *cost += extra_cost->alu.rev;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.rev;
+	}
       return false;
 
     case IOR:
@@ -5936,10 +5988,14 @@ cost_plus:
         {
           *cost = COSTS_N_INSNS (1);
 
-          if (speed)
-            *cost += extra_cost->alu.rev;
-
-          return true;
+	  if (speed)
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.rev;
+	    }
+	  return true;
         }
     /* Fall through.  */
     case XOR:
@@ -5948,6 +6004,13 @@ cost_plus:
       op0 = XEXP (x, 0);
       op1 = XEXP (x, 1);
 
+      if (VECTOR_MODE_P (mode))
+	{
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	  return true;
+	}
+
       if (code == AND
           && GET_CODE (op0) == MULT
           && CONST_INT_P (XEXP (op0, 1))
@@ -6013,10 +6076,19 @@ cost_plus:
       return false;
 
     case NOT:
-      /* MVN.  */
       if (speed)
-	*cost += extra_cost->alu.logical;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    {
+	      /* Vector NOT.  */
+	      *cost += extra_cost->vect.alu;
+	    }
+	  else
+	    {
+	      /* MVN.  */
+	      *cost += extra_cost->alu.logical;
+	    }
+	}
       /* The logical instruction could have the shifted register form,
          but the cost is the same if the shift is processed as a separate
          instruction, so we don't bother with it here.  */
@@ -6055,10 +6127,19 @@ cost_plus:
 	  return true;
 	}
 
-      /* UXTB/UXTH.  */
       if (speed)
-	*cost += extra_cost->alu.extend;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    {
+	      /* UMOV.  */
+	      *cost += extra_cost->vect.alu;
+	    }
+	  else
+	    {
+	      /* UXTB/UXTH.  */
+	      *cost += extra_cost->alu.extend;
+	    }
+	}
       return false;
 
     case SIGN_EXTEND:
@@ -6078,7 +6159,12 @@ cost_plus:
 	}
 
       if (speed)
-	*cost += extra_cost->alu.extend;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.extend;
+	}
       return false;
 
     case ASHIFT:
@@ -6087,10 +6173,20 @@ cost_plus:
 
       if (CONST_INT_P (op1))
         {
-	  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
-	     aliases.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      if (VECTOR_MODE_P (mode))
+		{
+		  /* Vector shift (immediate).  */
+		  *cost += extra_cost->vect.alu;
+		}
+	      else
+		{
+		  /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
+		     aliases.  */
+		  *cost += extra_cost->alu.shift;
+		}
+	    }
 
           /* We can incorporate zero/sign extend for free.  */
           if (GET_CODE (op0) == ZERO_EXTEND
@@ -6102,10 +6198,19 @@ cost_plus:
         }
       else
         {
-	  /* LSLV.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      if (VECTOR_MODE_P (mode))
+		{
+		  /* Vector shift (register).  */
+		  *cost += extra_cost->vect.alu;
+		}
+	      else
+		{
+		  /* LSLV.  */
+		  *cost += extra_cost->alu.shift_reg;
+		}
+	    }
 	  return false;  /* All arguments need to be in registers.  */
         }
 
@@ -6120,7 +6225,12 @@ cost_plus:
 	{
 	  /* ASR (immediate) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift;
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.shift;
+	    }
 
 	  *cost += rtx_cost (op0, (enum rtx_code) code, 0, speed);
 	  return true;
@@ -6130,8 +6240,12 @@ cost_plus:
 
 	  /* ASR (register) and friends.  */
 	  if (speed)
-	    *cost += extra_cost->alu.shift_reg;
-
+	    {
+	      if (VECTOR_MODE_P (mode))
+		*cost += extra_cost->vect.alu;
+	      else
+		*cost += extra_cost->alu.shift_reg;
+	    }
 	  return false;  /* All arguments need to be in registers.  */
 	}
 
@@ -6179,7 +6293,12 @@ cost_plus:
     case SIGN_EXTRACT:
       /* UBFX/SBFX.  */
       if (speed)
-	*cost += extra_cost->alu.bfx;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->alu.bfx;
+	}
 
       /* We can trust that the immediates used will be correct (there
 	 are no by-register forms), so we need only cost op0.  */
@@ -6196,7 +6315,9 @@ cost_plus:
     case UMOD:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
 	    *cost += (extra_cost->mult[GET_MODE (x) == DImode].add
 		      + extra_cost->mult[GET_MODE (x) == DImode].idiv);
 	  else if (GET_MODE (x) == DFmode)
@@ -6213,7 +6334,9 @@ cost_plus:
     case SQRT:
       if (speed)
 	{
-	  if (GET_MODE_CLASS (mode) == MODE_INT)
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else if (GET_MODE_CLASS (mode) == MODE_INT)
 	    /* There is no integer SQRT, so only DIV and UDIV can get
 	       here.  */
 	    *cost += extra_cost->mult[mode == DImode].idiv;
@@ -6245,7 +6368,12 @@ cost_plus:
       op2 = XEXP (x, 2);
 
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].fma;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].fma;
+	}
 
       /* FMSUB, FNMADD, and FNMSUB are free.  */
       if (GET_CODE (op0) == NEG)
@@ -6285,12 +6413,28 @@ cost_plus:
 
     case FLOAT_EXTEND:
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].widen;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    {
+	      /*Vector truncate.  */
+	      *cost += extra_cost->vect.alu;
+	    }
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].widen;
+	}
       return false;
 
     case FLOAT_TRUNCATE:
       if (speed)
-	*cost += extra_cost->fp[mode == DFmode].narrow;
+	{
+	  if (VECTOR_MODE_P (mode))
+	    {
+	      /*Vector conversion.  */
+	      *cost += extra_cost->vect.alu;
+	    }
+	  else
+	    *cost += extra_cost->fp[mode == DFmode].narrow;
+	}
       return false;
 
     case FIX:
@@ -6311,13 +6455,23 @@ cost_plus:
         }
 
       if (speed)
-        *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
-
+	{
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
+	}
       *cost += rtx_cost (x, (enum rtx_code) code, 0, speed);
       return true;
 
     case ABS:
-      if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+      if (VECTOR_MODE_P (mode))
+	{
+	  /* ABS (vector).  */
+	  if (speed)
+	    *cost += extra_cost->vect.alu;
+	}
+      else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
 	{
 	  /* FABS and FNEG are analogous.  */
 	  if (speed)
@@ -6338,10 +6492,15 @@ cost_plus:
     case SMIN:
       if (speed)
 	{
-	  /* FMAXNM/FMINNM/FMAX/FMIN.
-	     TODO: This may not be accurate for all implementations, but
-	     we do not model this in the cost tables.  */
-	  *cost += extra_cost->fp[mode == DFmode].addsub;
+	  if (VECTOR_MODE_P (mode))
+	    *cost += extra_cost->vect.alu;
+	  else
+	    {
+	      /* FMAXNM/FMINNM/FMAX/FMIN.
+	         TODO: This may not be accurate for all implementations, but
+	         we do not model this in the cost tables.  */
+	      *cost += extra_cost->fp[mode == DFmode].addsub;
+	    }
 	}
       return false;
 
diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 3ee7ebf..29f7c99 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -102,6 +102,8 @@ struct mem_cost_table
   const int storef;		/* SFmode.  */
   const int stored;		/* DFmode.  */
   const int store_unaligned;	/* Extra for unaligned stores.  */
+  const int loadv;		/* Vector load.  */
+  const int storev;		/* Vector store.  */
 };
 
 struct fp_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
index 05e96a9..809feb8 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -81,7 +81,9 @@ const struct cpu_cost_table generic_extra_costs =
     1,			/* stm_regs_per_insn_subsequent.  */
     COSTS_N_INSNS (2),	/* storef.  */
     COSTS_N_INSNS (3),	/* stored.  */
-    COSTS_N_INSNS (1)  /* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* store_unaligned.  */
+    COSTS_N_INSNS (1),	/* loadv.  */
+    COSTS_N_INSNS (1)	/* storev.  */
   },
   {
     /* FP SFmode */
@@ -182,7 +184,9 @@ const struct cpu_cost_table cortexa53_extra_costs =
     2,				/* stm_regs_per_insn_subsequent.  */
     0,				/* storef.  */
     0,				/* stored.  */
-    COSTS_N_INSNS (1)		/* store_unaligned.  */
+    COSTS_N_INSNS (1),		/* store_unaligned.  */
+    COSTS_N_INSNS (1),		/* loadv.  */
+    COSTS_N_INSNS (1)		/* storev.  */
   },
   {
     /* FP SFmode */
@@ -283,7 +287,9 @@ const struct cpu_cost_table cortexa57_extra_costs =
     2,                         /* stm_regs_per_insn_subsequent.  */
     0,                         /* storef.  */
     0,                         /* stored.  */
-    COSTS_N_INSNS (1)          /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* loadv.  */
+    COSTS_N_INSNS (1)          /* storev.  */
   },
   {
     /* FP SFmode */
@@ -385,6 +391,8 @@ const struct cpu_cost_table xgene1_extra_costs =
     0,                         /* storef.  */
     0,                         /* stored.  */
     0,                         /* store_unaligned.  */
+    COSTS_N_INSNS (1),         /* loadv.  */
+    COSTS_N_INSNS (1)          /* storev.  */
   },
   {
     /* FP SFmode */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-05-06  2:12                               ` Kugan
@ 2015-05-07  7:24                                 ` James Greenhalgh
  2015-05-20  3:32                                   ` Kugan
  0 siblings, 1 reply; 27+ messages in thread
From: James Greenhalgh @ 2015-05-07  7:24 UTC (permalink / raw)
  To: Kugan
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson

On Wed, May 06, 2015 at 03:12:33AM +0100, Kugan wrote:
> >> gcc/ChangeLog:
> >>
> >> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
> >> 	    Jim Wilson  <jim.wilson@linaro.org>
> >>
> >> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
> >> 	new  fields loadv and storev.
> >> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> >> 	Initialize loadv and storev.
> >> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
> >> 	(cortexa53_extra_costs): Likewise.
> >> 	(cortexa57_extra_costs): Likewise.
> >> 	(xgene1_extra_costs): Likewise.
> >> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
> >> 	rtx_costs.
> 
> Thanks James for the review. Attached patch changes this. Is this OK ?

Hi Kugan,

Thanks for sticking with it through a long review, sorry that the replies
have been patchy, I'm still travelling.

This patch is OK for trunk, with an updated ChangeLog and assuming no
regressions after a test run (And a quick check with some popular
benchmarks if possible)

Thanks, and sorry again for the delay,
James

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [AArch64][PR65375] Fix RTX cost for vector SET
  2015-05-07  7:24                                 ` James Greenhalgh
@ 2015-05-20  3:32                                   ` Kugan
  0 siblings, 0 replies; 27+ messages in thread
From: Kugan @ 2015-05-20  3:32 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Kyrylo Tkachov, gcc-patches, Marcus Shawcroft, Richard Earnshaw,
	Jim Wilson



On 07/05/15 17:24, James Greenhalgh wrote:
> On Wed, May 06, 2015 at 03:12:33AM +0100, Kugan wrote:
>>>> gcc/ChangeLog:
>>>>
>>>> 2015-04-24  Kugan Vivekanandarajah  <kuganv@linaro.org>
>>>> 	    Jim Wilson  <jim.wilson@linaro.org>
>>>>
>>>> 	* config/arm/aarch-common-protos.h (struct mem_cost_table): Added
>>>> 	new  fields loadv and storev.
>>>> 	* config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
>>>> 	Initialize loadv and storev.
>>>> 	* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
>>>> 	(cortexa53_extra_costs): Likewise.
>>>> 	(cortexa57_extra_costs): Likewise.
>>>> 	(xgene1_extra_costs): Likewise.
>>>> 	* config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
>>>> 	rtx_costs.
>>
>> Thanks James for the review. Attached patch changes this. Is this OK ?
> 
> Hi Kugan,
> 
> Thanks for sticking with it through a long review, sorry that the replies
> have been patchy, I'm still travelling.
> 
> This patch is OK for trunk, with an updated ChangeLog and assuming no
> regressions after a test run (And a quick check with some popular
> benchmarks if possible)

Committed as r223432 after fresh bootstrap and spec2k benchmarking.

Thanks,
Kugan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-05-20  3:08 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16  5:36 [AArch64][PR65375] Fix RTX cost for vector SET Kugan
2015-03-16 10:02 ` Kyrill Tkachov
2015-03-16 12:33   ` Kugan
2015-03-16 13:15     ` Kugan
2015-03-16 16:42       ` Jim Wilson
2015-03-16 16:49       ` Kyrill Tkachov
2015-03-17  1:20         ` Kugan
2015-03-26  7:22           ` Kugan
2015-04-14 22:09             ` Kugan
2015-04-15  9:25               ` James Greenhalgh
2015-04-15 10:14                 ` Kyrill Tkachov
2015-04-15 11:05                   ` James Greenhalgh
2015-04-15 11:17                     ` Kyrill Tkachov
2015-04-15 10:45                 ` Kugan
2015-04-15 11:18                   ` James Greenhalgh
2015-04-15 11:33                     ` Kugan
2015-04-17 11:19                       ` Kugan
2015-04-17 11:25                         ` Kyrill Tkachov
2015-04-20 20:22                         ` James Greenhalgh
2015-04-24 23:26                           ` Kugan
2015-04-24 23:30                             ` [ARM] " Kugan
2015-04-27 11:02                               ` Kyrill Tkachov
2015-05-05  6:17                             ` [AArch64][PR65375] " James Greenhalgh
2015-05-06  2:12                               ` Kugan
2015-05-07  7:24                                 ` James Greenhalgh
2015-05-20  3:32                                   ` Kugan
2015-04-15 11:35                     ` Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).