public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant
@ 2015-04-30 17:26 alalaw01 at gcc dot gnu.org
  2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-04-30 17:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

            Bug ID: 65951
           Summary: [AArch64] Will not vectorize multiplication by long
                    constant
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alalaw01 at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

This loop:
void
foo (long *arr)
{
  for (int i = 0; i < 256; i++)
    arr[i] *= 19594L;
}

will not vectorize on AArch64, but does on x86. On AArch64,
-fdump-tree-vect-details reveals:
test.c:4:3: note: ==> examining statement: _9 = _8 * 19594;
test.c:4:3: note: vect_is_simple_use: operand _8
test.c:4:3: note: def_stmt: _8 = *_7;
test.c:4:3: note: type of def: 3.
test.c:4:3: note: vect_is_simple_use: operand 19594
test.c:4:3: note: op not supported by target.
test.c:4:3: note: not vectorized: relevant stmt not supported: _9 = _8 * 19594;

on x86, vectorization fails with vectorization_factor = 4 (V4DI), but succeeds
at V2DI.

We could vectorize this on AArch64 even if we have to perform a
multiple-instruction load of that constant (invariant!) before the
loop...right?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65951] [AArch64] Will not vectorize multiplication by long constant
  2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
@ 2015-04-30 18:55 ` pinskia at gcc dot gnu.org
  2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-04-30 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a better example for AARCH64 due to ILP32 using long as 32bits:
void
foo (long long *arr)
{
  for (int i = 0; i < 256; i++)
    arr[i] *= 19594LL;
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
  2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
  2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
@ 2015-04-30 18:55 ` pinskia at gcc dot gnu.org
  2015-05-02 11:45 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-04-30 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-04-30
            Summary|[AArch64] Will not          |[AArch64] Will not
                   |vectorize multiplication by |vectorize 64bit integer
                   |long constant               |multiplication
     Ever confirmed|0                           |1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Another example which shows the problem is not related to the constant but just
the multiply:
void
foo (long long * restrict arr, long long * restrict arr1)
{
  for (int i = 0; i < 256; i++)
    arr[i] *= arr1[i];
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
  2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
  2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
  2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
@ 2015-05-02 11:45 ` rguenth at gcc dot gnu.org
  2015-05-05  9:47 ` alalaw01 at gcc dot gnu.org
  2015-07-09 11:22 ` vekumar at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-02 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
ISTR there was some pattern support for division by constant so maybe you can
add some for multiplication as well.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
  2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-05-02 11:45 ` rguenth at gcc dot gnu.org
@ 2015-05-05  9:47 ` alalaw01 at gcc dot gnu.org
  2015-07-09 11:22 ` vekumar at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: alalaw01 at gcc dot gnu.org @ 2015-05-05  9:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

--- Comment #5 from alalaw01 at gcc dot gnu.org ---
I believe the definitive algorithm for converting multiply-by-constant into
adds+shifts(+etc.) lives in expmed.c. I don't at present have a plan for how to
reuse that, but if we could do so _in_some_form_ then that would be the ideal??


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication
  2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-05-05  9:47 ` alalaw01 at gcc dot gnu.org
@ 2015-07-09 11:22 ` vekumar at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: vekumar at gcc dot gnu.org @ 2015-07-09 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951

vekumar at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vekumar at gcc dot gnu.org

--- Comment #6 from vekumar at gcc dot gnu.org ---
I found similar pattern in SPEC2006 hmmer benchmark, when comparing x86_64 (
-O3 + -march=bdver3 vs. -O3 + -mcpu=cortex-a57). x86_64 was able to vectorize 5
additional loops. Out of 5 loops, two were cost model related and aarch64
rejects because of running high vector cost. 

Remaining three loops are of this pattern. one used a constant 104. 
The other two of them used multiplication by 4 and that could be converted to
vector shifts.

I made a simple test case and wanted to open a PR. James pointed me to this PR.
Thought of posting it as comments.


unsigned long int __attribute__ ((aligned (64)))arr[100];
int i;

void test_vector_shifts()
{
        for(i=0; i<=99;i++)
        arr[i]=arr[i]<<2;
}


void test_vectorshift_via_mul()
{
        for(i=0; i<=99;i++)
        arr[i]=arr[i]*4            ;

}

Assembly
------------
        .cpu cortex-a57+fp+simd+crc
        .file   "test.c"
        .text
        .align  2
        .p2align 4,,15
        .global test_vector_shifts
        .type   test_vector_shifts, %function
test_vector_shifts:
        adrp    x0, arr
        add     x0, x0, :lo12:arr
        adrp    x1, arr+800
        add     x1, x1, :lo12:arr+800
        .p2align 2
.L2:
        ldr     q0, [x0]
        shl     v0.2d, v0.2d, 2 <==vector shifts 
        str     q0, [x0], 16
        cmp     x0, x1
        bne     .L2
        adrp    x0, i
        mov     w1, 100
        str     w1, [x0, #:lo12:i]
        ret
        .size   test_vector_shifts, .-test_vector_shifts
        .align  2
       .p2align 4,,15
        .global test_vectorshift_via_mul
        .type   test_vectorshift_via_mul, %function
test_vectorshift_via_mul:
        adrp    x0, arr
        add     x0, x0, :lo12:arr
        adrp    x2, arr+800
        add     x2, x2, :lo12:arr+800
        .p2align 2
.L6:
        ldr     x1, [x0]
        lsl     x1, x1, 2
        str     x1, [x0], 8 <==scalar shifts 
        cmp     x0, x2
        bne     .L6
        adrp    x0, i
        mov     w1, 100
        str     w1, [x0, #:lo12:i]
        ret
        .size   test_vectorshift_via_mul, .-test_vectorshift_via_mul


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-09 11:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 17:26 [Bug target/65951] New: [AArch64] Will not vectorize multiplication by long constant alalaw01 at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] " pinskia at gcc dot gnu.org
2015-04-30 18:55 ` [Bug target/65951] [AArch64] Will not vectorize 64bit integer multiplication pinskia at gcc dot gnu.org
2015-05-02 11:45 ` rguenth at gcc dot gnu.org
2015-05-05  9:47 ` alalaw01 at gcc dot gnu.org
2015-07-09 11:22 ` vekumar at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).