[Bug tree-optimization/111257] New: new signed overflow after vectorizer

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/111257] New: new signed overflow after vectorizer
@ 2023-08-31 11:48 kristerw at gcc dot gnu.org
  2023-08-31 13:22 ` [Bug tree-optimization/111257] " rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: kristerw at gcc dot gnu.org @ 2023-08-31 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111257

            Bug ID: 111257
           Summary: new signed overflow after vectorizer
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The vectorizer is not removing the original scalar calculations, and they may
overflow after vectorization.

This can be seen with

  int a[8];

  void foo(void)
  {
    for (int i = 0; i < 8; i++)
      a[i] = a[i] + 5;
  }

The IR for the loop before vectorization looks like

  <bb 3> [local count: 954449104]:
  # i_10 = PHI <i_7(5), 0(2)>
  # ivtmp_4 = PHI <ivtmp_3(5), 8(2)>
  _1 = a[i_10];
  _2 = _1 + 5;
  a[i_10] = _2;
  i_7 = i_10 + 1;
  ivtmp_3 = ivtmp_4 - 1;
  if (ivtmp_3 != 0)
    goto <bb 5>; [87.50%]
  else
    goto <bb 4>; [12.50%]

  <bb 5> [local count: 835156385]:
  goto <bb 3>; [100.00%]

and it is vectorized to

  <bb 3> [local count: 238585440]:
  # i_10 = PHI <i_7(5), 0(2)>
  # ivtmp_4 = PHI <ivtmp_3(5), 8(2)>
  # vectp_a.4_9 = PHI <vectp_a.4_8(5), &a(2)>
  # vectp_a.8_16 = PHI <vectp_a.8_17(5), &a(2)>
  # ivtmp_19 = PHI <ivtmp_20(5), 0(2)>
  vect__1.6_13 = MEM <vector(4) int> [(int *)vectp_a.4_9];
  _1 = a[i_10];
  vect__2.7_15 = vect__1.6_13 + { 5, 5, 5, 5 };
  _2 = _1 + 5;
  MEM <vector(4) int> [(int *)vectp_a.8_16] = vect__2.7_15;
  i_7 = i_10 + 1;
  ivtmp_3 = ivtmp_4 - 1;
  vectp_a.4_8 = vectp_a.4_9 + 16;
  vectp_a.8_17 = vectp_a.8_16 + 16;
  ivtmp_20 = ivtmp_19 + 1;
  if (ivtmp_20 < 2)
    goto <bb 5>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 5> [local count: 119292723]:
  goto <bb 3>; [100.00%]

This vectorized loop still read _1 from a[i_10] and adds 5 to it, so the second
loop iteration will add 5 to the value of a[1]. But the first iteration has
already added 5 to a[1], so we are now doing a different calculation compared
to the original loop, and this can overflow even if the original did not.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/111257] new signed overflow after vectorizer
  2023-08-31 11:48 [Bug tree-optimization/111257] New: new signed overflow after vectorizer kristerw at gcc dot gnu.org
@ 2023-08-31 13:22 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-31 13:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111257

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-08-31
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
           Keywords|                            |missed-optimization,
                   |                            |wrong-code

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yeah, we modify a copy of the scalar loop in-place, only removing original
stmts that have side-effects, leaving dead code elimination to the followup DCE
pass.  Note we run that immediately after loop vectorization but for SLP
vectorization
there are quite some intermediate passes until we perform DCE, for example
there's IVOPTS which might re-compute number of iterations and at least
max_iteration estimate also looks at undefined behavior.  OTOH with SLP
vectorization we don't change any loop iteration which means the original
stmts only compute the very original values (should, at least).

That means there might be an actual issue for those cases but for loop
vectorization the issue should be moot unless those stmts survive the
DCE pass after it.

I would suggest to "blacklist" analyzing the "vect" dump, the followup "dce"
dump should be fine.

For SLP I'm not sure, I guess no actual problems should show up but we
should maybe try to use simple_dce_from_worklist with the root stmts
original SSA uses (and defs in some cases), possibly the root stmt
vectorization code generation code could gather relevant defs.

So I think, correctness wise it should be a non-issue but it's a bit
ugly also since some fuzzers like to disable DCE which then indeed
would create wrong-code issues.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-08-31 13:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-31 11:48 [Bug tree-optimization/111257] New: new signed overflow after vectorizer kristerw at gcc dot gnu.org
2023-08-31 13:22 ` [Bug tree-optimization/111257] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).