public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
@ 2015-09-17 14:58 dmalcolm at gcc dot gnu.org
  2015-09-18  8:14 ` [Bug tree-optimization/67612] Unable to vectorize DOT_PROD_EXPR (PMADDWD?) rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2015-09-17 14:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67612

            Bug ID: 67612
           Summary: Unable to vectorize DOT_PROD_EXPR (PMADDWD)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

Created attachment 36346
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36346&action=edit
Test case

The attached code is a reduced form of a loop that the user hoped would be
auto-vectorized to using PMADDWD, but no vectorization occurs (this was whilst
investigating possible use of libgccjit for autovectorization).

With a recent gcc trunk (r227686), I get this for the reproducer at -O3:

0000000000000000 <test_muladd>:
   0:   31 c0                   xor    %eax,%eax
   2:   85 c9                   test   %ecx,%ecx
   4:   7e 37                   jle    3d <test_muladd+0x3d>
   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   d:   00 00 00 
  10:   44 0f bf 04 82          movswl (%rdx,%rax,4),%r8d
  15:   44 0f bf 14 86          movswl (%rsi,%rax,4),%r10d
  1a:   44 0f bf 4c 82 02       movswl 0x2(%rdx,%rax,4),%r9d
  20:   45 0f af d0             imul   %r8d,%r10d
  24:   44 0f bf 44 86 02       movswl 0x2(%rsi,%rax,4),%r8d
  2a:   45 0f af c1             imul   %r9d,%r8d
  2e:   45 01 d0                add    %r10d,%r8d
  31:   44 89 04 87             mov    %r8d,(%rdi,%rax,4)
  35:   48 83 c0 01             add    $0x1,%rax
  39:   39 c1                   cmp    %eax,%ecx
  3b:   7f d3                   jg     10 <test_muladd+0x10>
  3d:   f3 c3                   repz retq 

Building with -fdump-tree-vect-details to see why gcc -O3 fails to vectorize,
I see this in FILENAME.c.130t.vect:
  (snip)
  ../../src/vector_dot_prod.c:11:3: note: ==> examining pattern statement:
patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
  ../../src/vector_dot_prod.c:11:3: note: vect_is_simple_use: operand _14
  ../../src/vector_dot_prod.c:11:3: note: def_stmt: _14 = *_13;
  ../../src/vector_dot_prod.c:11:3: note: type of def: internal
  ../../src/vector_dot_prod.c:11:3: note: not vectorized: relevant stmt not
supported: patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
  ../../src/vector_dot_prod.c:11:3: note: bad operation or unsupported loop
bound.
  ../../src/vector_dot_prod.c:5:1: note: vectorized 0 loops in function.

Stepping through:
  gcc/tree-vect-stmts.c:vect_analyze_stmt
for stmt:
  patt_91 = DOT_PROD_EXPR <_14, _18, _29>;
I see that vectorizable_operation returns false here:

  4821    if (nunits_out != nunits_in)
  4910        return false;

  (gdb) p nunits_out
  $16 = 4
  (gdb) p nunits_in
  $17 = 8

Should this be a vectorizable_operation?


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/67612] Unable to vectorize DOT_PROD_EXPR (PMADDWD?)
  2015-09-17 14:58 [Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD) dmalcolm at gcc dot gnu.org
@ 2015-09-18  8:14 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-09-18  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67612

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-09-18
             Blocks|                            |53947
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think DOT_PROD_EXRP is only supported as reduction operation right now.  It
would need to be supported in vectorizable_conversion.  Disabling DOT_PROD
pattern detection makes the loop vectorized (but with awkward code...).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-09-18  8:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 14:58 [Bug tree-optimization/67612] New: Unable to vectorize DOT_PROD_EXPR (PMADDWD) dmalcolm at gcc dot gnu.org
2015-09-18  8:14 ` [Bug tree-optimization/67612] Unable to vectorize DOT_PROD_EXPR (PMADDWD?) rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).