* Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
[not found] <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com>
@ 2019-12-06 14:05 ` Sudakshina Das
2019-12-07 17:44 ` Jeff Law
0 siblings, 1 reply; 5+ messages in thread
From: Sudakshina Das @ 2019-12-06 14:05 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, Richard Biener
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
Hi
While looking at the vectorization for following example, we realized
that even though vectorizable_shift function was distinguishing vector
shifted by vector from vector shifted by scalar, while modeling the cost
it would always add the cost of building a vector constant despite not
needing it for vector shifted by scalar.
This patch fixes this by using scalar_shift_arg to determine whether we
need to build a vector for the second operand or not. This reduces
prologue cost as shown in the test.
Build and regression tests pass on aarch64-none-elf and
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
Spec2017 for AArch64.
gcc/ChangeLog:
2019-xx-xx Sudakshina Das <sudi.das@arm.com>
Richard Sandiford <richard.sandiford@arm.com>
* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
vect_model_simple_cost call on scalar_shift_arg.
gcc/testsuite/ChangeLog:
2019-xx-xx Sudakshina Das <sudi.das@arm.com>
* gcc.dg/vect/vect-shift-5.c: New test.
Is this ok for trunk?
Thanks
Sudi
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: vect-shift.diff --]
[-- Type: text/x-patch; name="vect-shift.diff", Size: 1420 bytes --]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 0000000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+ int sum = 0;
+ for(int i = 0; i < 4; i++)
+ {
+ sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+ + ((arr[2][i] >> 12) * 7) + ((arr[3][i] >> 13) ^ 43);
+ }
+ return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
{
STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
DUMP_VECT_SCOPE ("vectorizable_shift");
- vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+ vect_model_simple_cost (stmt_info, ncopies, dt,
+ scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
return true;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
2019-12-06 14:05 ` Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts Sudakshina Das
@ 2019-12-07 17:44 ` Jeff Law
2019-12-09 10:23 ` Sudakshina Das
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Law @ 2019-12-07 17:44 UTC (permalink / raw)
To: Sudakshina Das, gcc-patches; +Cc: Richard Sandiford, Richard Biener
On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> Hi
>
> While looking at the vectorization for following example, we
> realized
> that even though vectorizable_shift function was distinguishing
> vector
> shifted by vector from vector shifted by scalar, while modeling the
> cost
> it would always add the cost of building a vector constant despite
> not
> needing it for vector shifted by scalar.
>
> This patch fixes this by using scalar_shift_arg to determine whether
> we
> need to build a vector for the second operand or not. This reduces
> prologue cost as shown in the test.
>
> Build and regression tests pass on aarch64-none-elf and
> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
> Spec2017 for AArch64.
>
> gcc/ChangeLog:
>
> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
> Richard Sandiford <richard.sandiford@arm.com>
>
> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> vect_model_simple_cost call on scalar_shift_arg.
>
> gcc/testsuite/ChangeLog:
>
> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
>
> * gcc.dg/vect/vect-shift-5.c: New test.
It's a bit borderline, but it's really just twiddling a cost, so OK.
jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
2019-12-07 17:44 ` Jeff Law
@ 2019-12-09 10:23 ` Sudakshina Das
2019-12-10 9:01 ` Christophe Lyon
0 siblings, 1 reply; 5+ messages in thread
From: Sudakshina Das @ 2019-12-09 10:23 UTC (permalink / raw)
To: law, gcc-patches; +Cc: Richard Sandiford, Richard Biener
Hi Jeff
On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
>> Richard Sandiford <richard.sandiford@arm.com>
>>
>> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>> vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
>>
>> * gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.
Thanks :) Committed as r279114.
Sudi
>
> jeff
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
2019-12-09 10:23 ` Sudakshina Das
@ 2019-12-10 9:01 ` Christophe Lyon
2019-12-10 10:43 ` Sudakshina Das
0 siblings, 1 reply; 5+ messages in thread
From: Christophe Lyon @ 2019-12-10 9:01 UTC (permalink / raw)
To: Sudakshina Das; +Cc: law, gcc-patches, Richard Sandiford, Richard Biener
Hi,
On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>
> Hi Jeff
>
> On 07/12/2019 17:44, Jeff Law wrote:
> > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
> >> Hi
> >>
> >> While looking at the vectorization for following example, we
> >> realized
> >> that even though vectorizable_shift function was distinguishing
> >> vector
> >> shifted by vector from vector shifted by scalar, while modeling the
> >> cost
> >> it would always add the cost of building a vector constant despite
> >> not
> >> needing it for vector shifted by scalar.
> >>
> >> This patch fixes this by using scalar_shift_arg to determine whether
> >> we
> >> need to build a vector for the second operand or not. This reduces
> >> prologue cost as shown in the test.
> >>
> >> Build and regression tests pass on aarch64-none-elf and
> >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
> >> Spec2017 for AArch64.
> >>
Looks like you didn't check on arm, where I can see that the new testcase fails:
FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
scan-tree-dump vect "vectorizable_shift
===[\\n\\r][^\\n]*prologue_cost = 0"
FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
"vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
Seen on arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16
Christophe
> >> gcc/ChangeLog:
> >>
> >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
> >> Richard Sandiford <richard.sandiford@arm.com>
> >>
> >> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
> >> vect_model_simple_cost call on scalar_shift_arg.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
> >>
> >> * gcc.dg/vect/vect-shift-5.c: New test.
> > It's a bit borderline, but it's really just twiddling a cost, so OK.
>
> Thanks :) Committed as r279114.
>
> Sudi
>
> >
> > jeff
> >
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
2019-12-10 9:01 ` Christophe Lyon
@ 2019-12-10 10:43 ` Sudakshina Das
0 siblings, 0 replies; 5+ messages in thread
From: Sudakshina Das @ 2019-12-10 10:43 UTC (permalink / raw)
To: Christophe Lyon; +Cc: law, gcc-patches, Richard Sandiford, Richard Biener
Hi Christophe
On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
>
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das <Sudi.Das@arm.com> wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> While looking at the vectorization for following example, we
>>>> realized
>>>> that even though vectorizable_shift function was distinguishing
>>>> vector
>>>> shifted by vector from vector shifted by scalar, while modeling the
>>>> cost
>>>> it would always add the cost of building a vector constant despite
>>>> not
>>>> needing it for vector shifted by scalar.
>>>>
>>>> This patch fixes this by using scalar_shift_arg to determine whether
>>>> we
>>>> need to build a vector for the second operand or not. This reduces
>>>> prologue cost as shown in the test.
>>>>
>>>> Build and regression tests pass on aarch64-none-elf and
>>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>>>> Spec2017 for AArch64.
>>>>
>
> Looks like you didn't check on arm, where I can see that the new testcase fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
>
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
>
> Christophe
Thanks for reporting this. There is already a bugzilla report PR92870
for powerpc that I am looking at. Apologies I couldn't find your email
address there to add you to the cc list.
Thanks
Sudi
>
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
>>>> Richard Sandiford <richard.sandiford@arm.com>
>>>>
>>>> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>>> vect_model_simple_cost call on scalar_shift_arg.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-xx-xx Sudakshina Das <sudi.das@arm.com>
>>>>
>>>> * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-12-10 10:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bd1be471-af66-45dc-e05f-7bd04966f0e7@arm.com>
2019-12-06 14:05 ` Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts Sudakshina Das
2019-12-07 17:44 ` Jeff Law
2019-12-09 10:23 ` Sudakshina Das
2019-12-10 9:01 ` Christophe Lyon
2019-12-10 10:43 ` Sudakshina Das
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).