public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??)
@ 2014-11-14 10:29 vermaelen.wouter at gmail dot com
  2014-11-14 11:22 ` [Bug tree-optimization/63864] Missed late memory CSE rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: vermaelen.wouter at gmail dot com @ 2014-11-14 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

            Bug ID: 63864
           Summary: Missed optimization, related to SRA(??)
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vermaelen.wouter at gmail dot com

Hi,

In my code I replaced some 'manual' vector/matrix calculations with
(inlined) function calls using vector/matrix types. When using clang
both approaches result in nearly identical generated code. But when
using gcc the code becomes much worse.

I don't know too much about compiler internals, but if I had to make a
guess I'd say that for some reason SRA doesn't work in this case.

See the code below: 'test_ok()' is the original function,
'test_slow()' is the rewritten version. I tried to simplify the code
as much as possible while not making it too simple (so that neither
compiler starts vectorizing the code).

Tested with:
  g++ (GCC) 5.0.0 20141114 (experimental)

Wouter

 - - - 8< - - - 8< - - - 8< - - - 8< - - - 8< - - - 8< - - - 8< - - -

// Original code with 'manual' matrix multiplication
float test_ok(float m[3][3], float x, float y, float z, float s, float b) {
    float p = x*s + b;
    float q = y*s + b;
    float r = z*s + b;

    float u = m[0][0]*p + m[1][0]*q + m[2][0]*r;
    float v = m[0][1]*p + m[1][1]*q + m[2][1]*r;
    float w = m[0][2]*p + m[1][2]*q + m[2][2]*r;

    return u + v + w;
}

// (Much simplified) vec3/mat3 types
struct vec3 {
    vec3() {}
    vec3(float x, float y, float z) { e[0] = x; e[1] = y; e[2] = z; }
    float  operator[](int i) const { return e[i]; }
    float& operator[](int i)       { return e[i]; }
private:
    float e[3];
};
struct mat3 { vec3 c[3]; };

inline vec3 operator+(const vec3& x, const vec3& y) {
    vec3 r;
    for (int i = 0; i < 3; ++i) r[i] = x[i] + y[i];
    return r;
}

inline vec3 operator*(const vec3& x, float y) {
    vec3 r;
    for (int i = 0; i < 3; ++i) r[i] = x[i] * y;
    return r;
}

inline vec3 operator*(const vec3& x, const vec3& y) {
    vec3 r;
    for (int i = 0; i < 3; ++i) r[i] = x[i] * y[i];
    return r;
}

inline vec3 operator*(const mat3& m, const vec3& v) {
    return m.c[0] * v[0] + m.c[1] * v[1] + m.c[2] * v[2];
}

// Rewritten version of the original function
float test_slow(mat3& m, float x, float y, float z, float s, float b) {
    vec3 t = m * (vec3(x,y,z) * s + vec3(b,b,b));
    return t[0] + t[1] + t[2];
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/63864] Missed late memory CSE
  2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
@ 2014-11-14 11:22 ` rguenth at gcc dot gnu.org
  2014-11-18 15:46 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-14 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2014-11-14
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
            Summary|Missed optimization,        |Missed late memory CSE
                   |related to SRA(??)          |
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed - mine.  Note that SRA cannot decompose arrays but I see a lot
of missed CSE opportunities here which is because we unroll the loops
completely only very late and the only memory CSE pass after that is
DOM which is somewhat very limited here...

I'll try to improve that.  Related to some other PR.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/63864] Missed late memory CSE
  2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
  2014-11-14 11:22 ` [Bug tree-optimization/63864] Missed late memory CSE rguenth at gcc dot gnu.org
@ 2014-11-18 15:46 ` rguenth at gcc dot gnu.org
  2014-11-20  8:44 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-18 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34025
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34025&action=edit
candidate patch for DOM

Ok, so I have a patch to teach DOM to do more memory CSE but for this testcase
what remains is stuff like

  MEM[(float &)&r].e[0] = _220;
  _228 = y_5(D) * s_8(D);
  MEM[(float &)&r].e[1] = _228;
  _21 = z_6(D) * s_8(D);
  MEM[(float &)&r].e[2] = _21;
  D.2621 = r;
  r ={v} {CLOBBER};
  D.2620 = D.2621;
  D.2459 = D.2620;
  _201 = D.2459.e[0];

thus it isn't able to look through aggregate copies (which wouldn't fit
how I implemented it very well).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/63864] Missed late memory CSE
  2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
  2014-11-14 11:22 ` [Bug tree-optimization/63864] Missed late memory CSE rguenth at gcc dot gnu.org
  2014-11-18 15:46 ` rguenth at gcc dot gnu.org
@ 2014-11-20  8:44 ` rguenth at gcc dot gnu.org
  2021-12-12 12:49 ` pinskia at gcc dot gnu.org
  2021-12-12 12:50 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-11-20  8:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
DOM is now improved but as said this testcase needs handling of agggregate
copies which DOM doesn't handle (and I don't think we want to complicate it
with that).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/63864] Missed late memory CSE
  2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
                   ` (2 preceding siblings ...)
  2014-11-20  8:44 ` rguenth at gcc dot gnu.org
@ 2021-12-12 12:49 ` pinskia at gcc dot gnu.org
  2021-12-12 12:50 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-12 12:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note I noticed at -O3 on the trunk, test_slow SLP vectorizer can happen while
test_ok does not. Anyways I think the orginal problem was fully fixed in GCC 6.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/63864] Missed late memory CSE
  2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
                   ` (3 preceding siblings ...)
  2021-12-12 12:49 ` pinskia at gcc dot gnu.org
@ 2021-12-12 12:50 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-12 12:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63864

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-12 12:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 10:29 [Bug tree-optimization/63864] New: Missed optimization, related to SRA(??) vermaelen.wouter at gmail dot com
2014-11-14 11:22 ` [Bug tree-optimization/63864] Missed late memory CSE rguenth at gcc dot gnu.org
2014-11-18 15:46 ` rguenth at gcc dot gnu.org
2014-11-20  8:44 ` rguenth at gcc dot gnu.org
2021-12-12 12:49 ` pinskia at gcc dot gnu.org
2021-12-12 12:50 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).