[Bug tree-optimization/102139] New: -O3 miscompile due to slp-vectorize on strict align target

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "wilson at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/102139] New: -O3 miscompile due to slp-vectorize on strict align target
Date: Tue, 31 Aug 2021 00:15:01 +0000	[thread overview]
Message-ID: <bug-102139-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139

            Bug ID: 102139
           Summary: -O3 miscompile due to slp-vectorize on strict align
                    target
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wilson at gcc dot gnu.org
  Target Milestone: ---

This was originally reported here.
https://github.com/riscv/riscv-gcc/issues/289

This testcase is miscompiled at -O3 for a riscv64 target, though this is not a
bug in the riscv64 port.  I think it will fail for any strict align target.

typedef unsigned short uint16_t;

void zero_two_uint16(uint16_t* ptr) {
  ptr[0] = 0;
  ptr[1] = 0;
}

void zero(uint16_t* ptr) {
  for (int i = 0; i < 16; ++i) {
    zero_two_uint16(ptr);
    ptr += 2;
  }
}

The output is
zero:
        sd      zero,0(a0)
        sd      zero,8(a0)
        sd      zero,16(a0)
        sd      zero,24(a0)
        sd      zero,32(a0)
        sd      zero,40(a0)
        sd      zero,48(a0)
        sd      zero,56(a0)
        ret
which fails due to unaligned accesses as a0 only has 2 byte alignment.

A git bisect tracked the problem down to this commit.

commit f5e18dd
Author: Kewen Lin linkw@gcc.gnu.org
Date: Tue Nov 3 02:51:47 2020 +0000

        pass: Run cleanup passes before SLP [PR96789]
        ...

I get correct code if I disable the fre4 pass, which is the fre pass inside
pre_slp_scalar_cleanup which was added by this patch.

The 169t.vectorize pass adds an address alignment check, and then emits a loop
with double-word stores if aligned, and a loop with half-word stores if
unaligned.  172t.cunroll fully unrolls both loops.  The 173t.fre4 pass deletes
a phi node before the half-word stores.  The 172t output has
  <bb 13> [local count: 12627204]:
  # ptr_3 = PHI <ptr_4(D)(2)>
  # ivtmp_15 = PHI <16(2)>
  *ptr_3 = 0;
and the 173t.fre4 output has
  <bb 13> [local count: 12627204]:
  *ptr_4(D) = 0;
In the 175t.slp1 pass, the block of half-word stores gets vectorized which is
wrong.  Then later 207t.dce7 notices duplicate code and deletes the second
block of stores.

Comparing the full slp1 dump with fre4 disabled versus the unmodified slp1
dump, I see that the first significant difference is when computing pointer
alignment.  With fre4 disabled, I get

tmp.c:4:10: note:  recording new base alignment for vectp_ptr.8_125
  alignment:    8
  misalignment: 0
  based on:     MEM <vector(4) short unsigned int> [(uint16_t
*)vectp_ptr.8_125] = { 0, 0, 0, 0 };
tmp.c:4:10: note:  recording new base alignment for ptr_3
  alignment:    2
  misalignment: 0
  based on:     *ptr_3 = 0;
tmp.c:4:10: note:   === vect_slp_analyze_instance_alignment ===
tmp.c:4:10: note:   vect_compute_data_ref_alignment:
tmp.c:4:10: note:   can't force alignment of ref: *ptr_3

It then refuses to vectorize.  With the unmodified compiler I get

tmp.c:4:10: note:  recording new base alignment for ptr_4(D)
  alignment:    8
  misalignment: 0
  based on:     MEM <vector(4) short unsigned int> [(uint16_t *)ptr_4(D)] = {
0, 0, 0, 0 };
tmp.c:4:10: note:   === vect_slp_analyze_instance_alignment ===
tmp.c:4:10: note:   vect_compute_data_ref_alignment:
tmp.c:4:10: missed:   misalign = 0 bytes of ref *ptr_4(D)

and then goes ahead and vectorizes which is wrong.

Maybe fre4 shouldn't optimize away a phi node when the pointers have different
alignment?

I noticed that before slp1 runs, the double-word store block has
  # ALIGN = 8, MISALIGN = 0
but the half-word store block does not.  After slp1 runs, both the double-word
store and the half-word store block have these notes.

next             reply	other threads:[~2021-08-31  0:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31  0:15 wilson at gcc dot gnu.org [this message]
2021-08-31  0:35 ` [Bug tree-optimization/102139] " pinskia at gcc dot gnu.org
2021-08-31  0:38 ` pinskia at gcc dot gnu.org
2021-08-31  1:24 ` [Bug tree-optimization/102139] [11/12 Regression] " pinskia at gcc dot gnu.org
2021-08-31  1:35 ` pinskia at gcc dot gnu.org
2021-08-31  1:47 ` pinskia at gcc dot gnu.org
2021-08-31  7:38 ` rguenth at gcc dot gnu.org
2021-08-31 10:18 ` rguenth at gcc dot gnu.org
2021-08-31 10:22 ` rguenth at gcc dot gnu.org
2021-09-01 10:54 ` cvs-commit at gcc dot gnu.org
2021-09-01 10:55 ` [Bug tree-optimization/102139] [11 " rguenth at gcc dot gnu.org
2021-11-08 12:35 ` cvs-commit at gcc dot gnu.org
2021-11-08 12:37 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102139-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).