[Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal
@ 2015-03-09 16:41 vries at gcc dot gnu.org
  2015-03-10  9:11 ` [Bug tree-optimization/65363] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2015-03-09 16:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65363

            Bug ID: 65363
           Summary: trivial redundant code reordering makes code less
                    optimal
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org

Consider this test-case test.c (based on PR65270 comment 27/28):
...
#define N 100000
struct a 
{
  int a[N];
};
typedef struct a misaligned_t __attribute__ ((aligned (8)));
typedef struct a aligned_t __attribute__ ((aligned (32)));

static void
__attribute__ ((noinline))
__attribute__ ((noclone))
__attribute__ ((used))
t (void *a, aligned_t *d)
{
  int v, v2;
  int i;
  for (i=0; i < N; i++)
    {
#if REORDER
      v2 = ((misaligned_t *)a)->a[i];
      v = ((aligned_t *)a)->a[i];
#else
      v = ((aligned_t *)a)->a[i];
      v2 = ((misaligned_t *)a)->a[i];
#endif
      d->a[i] += v + v2;
    }
}

aligned_t aa;
aligned_t d;

int
main (void)
{
  t (&aa, &d);
  return 0;
}
...

Changing the order of loads in the loop body results in different instructions
(and I assume the unaligned one (movdqu) is more expensive than the aligned one
(movdqa)):
...
$ n=0; gcc -O2 -ftree-vectorize test.c -DREORDER=$n -S -o $n
$ n=1; gcc -O2 -ftree-vectorize test.c -DREORDER=$n -S -o $n
$ diff -u 0 1
--- 0    2015-03-09 15:46:41.395919753 +0100
+++ 1    2015-03-09 15:46:43.747919840 +0100
@@ -19,7 +19,7 @@
     .p2align 4,,10
     .p2align 3
 .L4:
-    movdqa    (%rdi,%rax), %xmm0
+    movdqu    (%rdi,%rax), %xmm0
     paddd    %xmm0, %xmm0
     paddd    (%rsi,%rax), %xmm0
     movaps    %xmm0, (%rsi,%rax)
...

The two loads are redundant, and fre is the pass that picks the first one and
eliminates the second one. I'm not sure though whether you want to fix this
particular example in fre. Perhaps you want to propagate alignment before doing
fre.

OTOH, fre does not take the cost of the value producing statements into account
when determining which to choose as representative and which to eliminate.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/65363] trivial redundant code reordering makes code less optimal
  2015-03-09 16:41 [Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal vries at gcc dot gnu.org
@ 2015-03-10  9:11 ` rguenth at gcc dot gnu.org
  2015-03-10  9:13 ` jakub at gcc dot gnu.org
  2015-03-10 11:51 ` vries at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-10  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65363

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-03-10
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
FRE can only eliminate the dominated one (obviously), so the first one is the
one prevailing.  Not removing the redundant load is obviously worse ;)

The only possibility I see is to somehow derive alignment information from
the always-executed second load and propagate that backwards.  For example
with VRP if you see

  tem_1 = MEM [ptr_2];

in a basic-block and the access tells you that ptr_2 is aligned, thus
ptr_2 & 3 == 0 for example, you put an assert for that _at the beginning_
of the basic block.  And you'd then make sure to adjust memory accesses
to "apply" alignment you computed for SSA names (which may be just temporarily
available during VRP but no longer after we removed the asserts).

Not sure if that is an important optimization in practice.  And of course
FRE runs before VRP so we'd need to do this quite early on...

Also consider

  v2 = ((misaligned_t *)a)->a[i];
  if (a & 15 == 0)
    v = ((aligned_t *)a)->a[i];

where we may not do this but FRE still will replace the aligned load.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/65363] trivial redundant code reordering makes code less optimal
  2015-03-09 16:41 [Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal vries at gcc dot gnu.org
  2015-03-10  9:11 ` [Bug tree-optimization/65363] " rguenth at gcc dot gnu.org
@ 2015-03-10  9:13 ` jakub at gcc dot gnu.org
  2015-03-10 11:51 ` vries at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-10  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65363

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Putting such assert at the beginning of the basic block might still be
incorrect, consider a call in the middle of the basic block that might not
return or loop forever.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/65363] trivial redundant code reordering makes code less optimal
  2015-03-09 16:41 [Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal vries at gcc dot gnu.org
  2015-03-10  9:11 ` [Bug tree-optimization/65363] " rguenth at gcc dot gnu.org
  2015-03-10  9:13 ` jakub at gcc dot gnu.org
@ 2015-03-10 11:51 ` vries at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vries at gcc dot gnu.org @ 2015-03-10 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65363

--- Comment #3 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #1)
> FRE can only eliminate the dominated one (obviously), so the first one is
> the one prevailing. 

I don't understand that.

Say we have load A (loading into tmp.a) and load B (loading into tmp.b).

If load A dominates load B, then we can replace the uses of tmp.b with uses of
tmp.a, since the domination relation guarantees that tmp.a is available at all
uses of tmp.b.

However, if load A dominates load B, but load B dominates the uses of tmp.a,
then we can replace the uses of tmp.a with uses of tmp.b.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-10 11:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-09 16:41 [Bug tree-optimization/65363] New: trivial redundant code reordering makes code less optimal vries at gcc dot gnu.org
2015-03-10  9:11 ` [Bug tree-optimization/65363] " rguenth at gcc dot gnu.org
2015-03-10  9:13 ` jakub at gcc dot gnu.org
2015-03-10 11:51 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).