public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant
@ 2015-04-30 17:26 alalaw01 at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-04-30 17:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
Bug ID: 65951
Summary: [AArch64] Will not vectorize multiplication by long
constant
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: alalaw01 at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
This loop:
void
foo (long *arr)
{
for (int i = 0; i < 256; i++)
arr[i] *= 19594L;
}
will not vectorize on AArch64, but does on x86. On AArch64,
-fdump-tree-vect-details reveals:
test.c:4:3: note: ==> examining statement: _9 = _8 * 19594;
test.c:4:3: note: vect_is_simple_use: operand _8
test.c:4:3: note: def_stmt: _8 = *_7;
test.c:4:3: note: type of def: 3.
test.c:4:3: note: vect_is_simple_use: operand 19594
test.c:4:3: note: op not supported by target.
test.c:4:3: note: not vectorized: relevant stmt not supported: _9 = _8 * 19594;
on x86, vectorization fails with vectorization_factor = 4 (V4DI), but succeeds
at V2DI.
We could vectorize this on AArch64 even if we have to perform a
multiple-instruction load of that constant (invariant!) before the
loop...right?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/65951] [AArch64] Will not vectorize multiplication by long constant
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
@ 2015-04-30 18:55 ` pinskia at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-04-30 18:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a better example for AARCH64 due to ILP32 using long as 32bits:
void
foo (long long *arr)
{
for (int i = 0; i < 256; i++)
arr[i] *= 19594LL;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
@ 2015-04-30 18:55 ` pinskia at gcc dot gnu.org
2015-05-02 11:45 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-04-30 18:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-04-30
Summary|[AArch64] Will not |[AArch64] Will not
|vectorize multiplication by |vectorize 64bit integer
|long constant |multiplication
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Another example which shows the problem is not related to the constant but just
the multiply:
void
foo (long long * restrict arr, long long * restrict arr1)
{
for (int i = 0; i < 256; i++)
arr[i] *= arr1[i];
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
@ 2015-05-02 11:45 ` rguenth at gcc dot gnu.org
2015-05-05 9:47 ` alalaw01 at gcc dot gnu.org
2015-07-09 11:22 ` vekumar at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-02 11:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
ISTR there was some pattern support for division by constant so maybe you can
add some for multiplication as well.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
` (2 preceding siblings ...)
2015-05-02 11:45 ` rguenth at gcc dot gnu.org
@ 2015-05-05 9:47 ` alalaw01 at gcc dot gnu.org
2015-07-09 11:22 ` vekumar at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-05-05 9:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
--- Comment #5 from alalaw01 at gcc dot gnu.org ---
I believe the definitive algorithm for converting multiply-by-constant into
adds+shifts(+etc.) lives in expmed.c. I don't at present have a plan for how to
reuse that, but if we could do so _in_some_form_ then that would be the ideal??
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
` (3 preceding siblings ...)
2015-05-05 9:47 ` alalaw01 at gcc dot gnu.org
@ 2015-07-09 11:22 ` vekumar at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: vekumar at gcc dot gnu.org @ 2015-07-09 11:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
vekumar at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vekumar at gcc dot gnu.org
--- Comment #6 from vekumar at gcc dot gnu.org ---
I found similar pattern in SPEC2006 hmmer benchmark, when comparing x86_64 (
-O3 + -march=bdver3 vs. -O3 + -mcpu=cortex-a57). x86_64 was able to vectorize 5
additional loops. Out of 5 loops, two were cost model related and aarch64
rejects because of running high vector cost.
Remaining three loops are of this pattern. one used a constant 104.
The other two of them used multiplication by 4 and that could be converted to
vector shifts.
I made a simple test case and wanted to open a PR. James pointed me to this PR.
Thought of posting it as comments.
unsigned long int __attribute__ ((aligned (64)))arr[100];
int i;
void test_vector_shifts()
{
for(i=0; i<=99;i++)
arr[i]=arr[i]<<2;
}
void test_vectorshift_via_mul()
{
for(i=0; i<=99;i++)
arr[i]=arr[i]*4 ;
}
Assembly
------------
.cpu cortex-a57+fp+simd+crc
.file "test.c"
.text
.align 2
.p2align 4,,15
.global test_vector_shifts
.type test_vector_shifts, %function
test_vector_shifts:
adrp x0, arr
add x0, x0, :lo12:arr
adrp x1, arr+800
add x1, x1, :lo12:arr+800
.p2align 2
.L2:
ldr q0, [x0]
shl v0.2d, v0.2d, 2 <==vector shifts
str q0, [x0], 16
cmp x0, x1
bne .L2
adrp x0, i
mov w1, 100
str w1, [x0, #:lo12:i]
ret
.size test_vector_shifts, .-test_vector_shifts
.align 2
.p2align 4,,15
.global test_vectorshift_via_mul
.type test_vectorshift_via_mul, %function
test_vectorshift_via_mul:
adrp x0, arr
add x0, x0, :lo12:arr
adrp x2, arr+800
add x2, x2, :lo12:arr+800
.p2align 2
.L6:
ldr x1, [x0]
lsl x1, x1, 2
str x1, [x0], 8 <==scalar shifts
cmp x0, x2
bne .L6
adrp x0, i
mov w1, 100
str w1, [x0, #:lo12:i]
ret
.size test_vectorshift_via_mul, .-test_vectorshift_via_mul
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-07-09 11:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
2015-05-02 11:45 ` rguenth at gcc dot gnu.org
2015-05-05 9:47 ` alalaw01 at gcc dot gnu.org
2015-07-09 11:22 ` vekumar at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).