public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug ada/46006] New: vectorization outside of loops
@ 2010-10-13 14:24 jakub at gcc dot gnu.org
  2010-10-17 13:22 ` [Bug tree-optimization/46006] " irar at il dot ibm.com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2010-10-13 14:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46006

           Summary: vectorization outside of loops
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ada
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: jakub@gcc.gnu.org
                CC: irar@gcc.gnu.org


Are there any plans to try to vectorize parts of code like:
struct A
{
  double x, y, z;
};

struct B
{
  struct A a, b;
};

struct C
{
  struct A c;
  double d;
};

__attribute__((noinline, noclone)) int
foo (const struct C *u, struct B v)
{
  double a, b, c, d;

  a = v.b.x * v.b.x + v.b.y * v.b.y + v.b.z * v.b.z;
  b = 2.0 * v.b.x * (v.a.x - u->c.x)
      + 2.0 * v.b.y * (v.a.y - u->c.y) + 2.0 * v.b.z * (v.a.z - u->c.z);
  c = u->c.x * u->c.x + u->c.y * u->c.y + u->c.z * u->c.z
      + v.a.x * v.a.x + v.a.y * v.a.y + v.a.z * v.a.z
      + 2.0 * (-u->c.x * v.a.x - u->c.y * v.a.y - u->c.z * v.a.z)
      - u->d * u->d;
  if ((d = b * b - 4.0 * a * c) < 0.0)
    return 0;
  return d;
}

int
main (void)
{
  int i, j;
  struct C c = { { 1.0, 1.0, 1.0 }, 1.0 };
  struct B b = { { 1.0, 1.0, 1.0 }, { 1.0, 1.0, 1.0 } };
  for (i = 0; i < 100000000; i++)
    {
      asm volatile ("" : : "r" (&c), "r" (&b) : "memory");
      j = foo (&c, b);
      asm volatile ("" : : "r" (j));
    }
  return 0;
}
(this is the hot spot from c-ray benchmark, the function is actually larger but
at least according to callgrind in most cases the early return on < 0.0
happens;
as the function is large and called from multiple spots, it isn't inlined).
I'd say (though, haven't tried to code it by hand using intrinsics) that by
doing many of the multiplications/additions in parallel (especially for AVX)
there could be significant speedups (-O3 -ffast-math).


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46006] vectorization outside of loops
  2010-10-13 14:24 [Bug ada/46006] New: vectorization outside of loops jakub at gcc dot gnu.org
@ 2010-10-17 13:22 ` irar at il dot ibm.com
  2012-03-13 23:16 ` pinskia at gcc dot gnu.org
  2023-06-21 13:17 ` [Bug tree-optimization/46006] vectorization outside of loops starting from loads rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: irar at il dot ibm.com @ 2010-10-17 13:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46006

Ira Rosen <irar at il dot ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm.com

--- Comment #1 from Ira Rosen <irar at il dot ibm.com> 2010-10-17 13:22:18 UTC ---
This code requires SLP to originate from loads, which seems to be a bit more
complicated than the currently implemented use-def scan (it will also need to
reduce/extract scalars from the vectors in the end of vector computation). I
don't see any major obstacles for this, however, currently I don't plan to work
on this.

Another required feature is to work on groups bigger than vectorization factor,
i.e., combining 2 statements in this example and leaving the 3rd one scalar.

Ira


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46006] vectorization outside of loops
  2010-10-13 14:24 [Bug ada/46006] New: vectorization outside of loops jakub at gcc dot gnu.org
  2010-10-17 13:22 ` [Bug tree-optimization/46006] " irar at il dot ibm.com
@ 2012-03-13 23:16 ` pinskia at gcc dot gnu.org
  2023-06-21 13:17 ` [Bug tree-optimization/46006] vectorization outside of loops starting from loads rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-03-13 23:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46006

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-03-13
     Ever Confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-03-13 22:59:24 UTC ---
Confirmed.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/46006] vectorization outside of loops starting from loads
  2010-10-13 14:24 [Bug ada/46006] New: vectorization outside of loops jakub at gcc dot gnu.org
  2010-10-17 13:22 ` [Bug tree-optimization/46006] " irar at il dot ibm.com
  2012-03-13 23:16 ` pinskia at gcc dot gnu.org
@ 2023-06-21 13:17 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-21 13:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46006

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2021-08-24 00:00:00         |2023-6-21

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're almost there:

t2.c:22:5: note:   Starting SLP discovery for
t2.c:22:5: note:     powmult_4 = v$b$z_53 * v$b$z_53;
t2.c:22:5: note:     powmult_1 = v$b$x_51 * v$b$x_51;
t2.c:22:5: note:     powmult_2 = v$b$y_52 * v$b$y_52;

but:

t2.c:22:5: note:   vectype: vector(2) double
t2.c:22:5: note:   nunits = 2
t2.c:22:5: missed:   Build SLP failed: unrolling required in basic block SLP

and for reductions we do not try to split the group.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-21 13:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-13 14:24 [Bug ada/46006] New: vectorization outside of loops jakub at gcc dot gnu.org
2010-10-17 13:22 ` [Bug tree-optimization/46006] " irar at il dot ibm.com
2012-03-13 23:16 ` pinskia at gcc dot gnu.org
2023-06-21 13:17 ` [Bug tree-optimization/46006] vectorization outside of loops starting from loads rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).