public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/98598] New: Missed opportunity to optimize dependent loads in loops
@ 2021-01-08  8:24 hliu at amperecomputing dot com
  2021-01-08  8:43 ` [Bug tree-optimization/98598] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: hliu at amperecomputing dot com @ 2021-01-08  8:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598

            Bug ID: 98598
           Summary: Missed opportunity to optimize dependent loads in
                    loops
           Product: gcc
           Version: tree-ssa
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hliu at amperecomputing dot com
  Target Milestone: ---

As we know, dependent loads are not friendly to cache. Especially when in
nested loops, dependent loads such as pa->pb->pc->val may be repeated many
times. For example:

typedef struct C { int val; } C;
typedef struct B { C *pc; } B;
typedef struct A { B *pb; } A;

int foo (int n, int m, A *pa) {
  int sum;

  for (int i = 0; i < n; i++) {
    for (int j = 0; j < m; j++) {
      sum += pa[j].pb->pc->val;  // each value is repeatedly loaded "n" times
      // ...
    }
    // ...
  }

  return sum;
}

Such access pattern can be found in real applications and benchmarks, and this
can be critical to performance.

Can we cache the loaded value and avoid repeated dependent loads? E.g.
transform above case into following (suppose there is no alias issue or other
clobber, and "n" is big enough):

int foo (int n, int m, A *pa) {
  int *cache = (int *) malloc(m * sizeof(int));
  for (int j = 0; j < m; j++) {
    cache[j] = pa[j].pb->pc->val;
  }

  int sum;

  for (int i = 0; i < n; i++) {
    for (int j = 0; j < m; j++) {
      sum += cache[j];   // pa[j].pb->pc->val;
      // ...
    }
    // ...
  }

  free(cache);
  return sum;
}

This should improve performance a lot.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-23  9:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08  8:24 [Bug tree-optimization/98598] New: Missed opportunity to optimize dependent loads in loops hliu at amperecomputing dot com
2021-01-08  8:43 ` [Bug tree-optimization/98598] " rguenth at gcc dot gnu.org
2021-01-08  9:15 ` jiangning.liu at amperecomputing dot com
2021-01-08  9:55 ` rguenther at suse dot de
2021-01-08 11:57 ` amker at gcc dot gnu.org
2021-01-09  3:17 ` jiangning.liu at amperecomputing dot com
2021-01-09  9:48 ` rguenther at suse dot de
2021-01-09 10:42 ` jiangning.liu at amperecomputing dot com
2021-01-11  7:17 ` rguenther at suse dot de
2021-01-11  9:16 ` crazylht at gmail dot com
2021-01-11 11:41 ` jiangning.liu at amperecomputing dot com
2021-01-11 12:05 ` jiangning.liu at amperecomputing dot com
2021-01-14 22:53 ` jiangning.liu at amperecomputing dot com
2021-12-23  9:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).