[Bug tree-optimization/114346] New: vectorizer generates the same IV twice

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/114346] New: vectorizer generates the same IV twice
@ 2024-03-15  4:21 tnfchris at gcc dot gnu.org
  2024-03-15  8:47 ` [Bug tree-optimization/114346] " rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-03-15  4:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346

            Bug ID: 114346
           Summary: vectorizer generates the same IV twice
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example:

---
double f(int n, double *data, double b) {
    double res = b;

    for (int i=0;i<n;i++) {
        res += data[i] * i;
    }

    return res;
}
---

generates at -Ofast -march=armv9-a this code:


        cntd    x5
        mov     z28.s, w5
        index   z30.d, #0, #1
.L4:
        incw    x2
        add     z1.s, z30.s, z28.s
        ld1d    z25.d, p7/z, [x3, #1, mul vl]
        mov     z26.d, z30.d
        ld1d    z2.d, p7/z, [x3]
        sxtw    z1.d, p7/m, z1.d
        sxtw    z26.d, p7/m, z26.d
        scvtf   z1.d, p7/m, z1.d
        scvtf   z26.d, p7/m, z26.d
        incb    x3, all, mul #2
        fmla    z29.d, p7/m, z25.d, z1.d
        incw    z30.s
        fmla    z31.d, p7/m, z2.d, z26.d
        cmp     w4, w2
        bcs     .L4

note that the incw is calculating the vectorized IV of i, initialized and z28
is filled with the VL.

so the incw z30.s and the add z1.s, z30.s, z28.s are calculating the same
thing.

there are other issues with this codegen but this ticket is about the double
IVs.

The vectorizer genertes:

  # vect_vec_iv_.7_45 = PHI <_49(6), { 0, 1, 2, ... }(15)>
  _48 = vect_vec_iv_.7_45 + { POLY_INT_CST [2, 2], ... };
  _71 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>(vect_vec_iv_.7_45);
  _72 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>({ POLY_INT_CST [4, 4],
... });
  _73 = _71 + _72;
  _49 = VIEW_CONVERT_EXPR<vector([2,2]) int>(_73);

so it looks like _48 and _49 are the same value, except that _48 is done as
32-bit IV and _49 is calculated as a 64-bit one and truncated to 32?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug tree-optimization/114346] vectorizer generates the same IV twice
  2024-03-15  4:21 [Bug tree-optimization/114346] New: vectorizer generates the same IV twice tnfchris at gcc dot gnu.org
@ 2024-03-15  8:47 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-15  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-03-15
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'll note the missing constant folding of

  _72 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>({ POLY_INT_CST [4, 4],
... });

(it's just a sign change)

Note the vectorizer generates

  <bb 15> [local count: 94607391]:
  vect_cst__44 = { POLY_INT_CST [4, 4], ... };
  vect_cst__47 = { POLY_INT_CST [2, 2], ... };
  _68 = niters.4_23 - POLY_INT_CST [4, 4];

  <bb 3> [local count: 860067200]:
  # res_17 = PHI <res_13(6), b_9(D)(15)>
  # i_19 = PHI <i_14(6), 0(15)>
  # vect_res_17.6_42 = PHI <vect_res_13.15_61(6), { 0.0, ... }(15)>
  # vect_res_17.6_43 = PHI <vect_res_13.15_62(6), { 0.0, ... }(15)>
  # vect_vec_iv_.7_45 = PHI <_49(6), { 0, 1, 2, ... }(15)>
  # vectp_data.8_50 = PHI <vectp_data.8_51(6), data_12(D)(15)>
  # ivtmp_69 = PHI <ivtmp_70(6), 0(15)>
  _46 = vect_vec_iv_.7_45 + vect_cst__44;
  _48 = vect_vec_iv_.7_45 + vect_cst__47;
  _49 = _48 + vect_cst__47;
  _1 = (long unsigned int) i_19;
  _2 = _1 * 8;
  _3 = data_12(D) + _2;
  vect__4.10_52 = MEM <vector([2,2]) double> [(double *)vectp_data.8_50];
  vectp_data.8_53 = vectp_data.8_50 + POLY_INT_CST [16, 16];
  vect__4.11_54 = MEM <vector([2,2]) double> [(double *)vectp_data.8_53];
  _4 = *_3;
  vect__5.13_55 = (vector([2,2]) signed long) vect_vec_iv_.7_45;
  vect__5.12_56 = (vector([2,2]) double) vect__5.13_55;
  vect__5.13_57 = (vector([2,2]) signed long) _48;
  vect__5.12_58 = (vector([2,2]) double) vect__5.13_57;
  _5 = (double) i_19;
  vect__6.14_59 = vect__4.10_52 * vect__5.12_56;
  vect__6.14_60 = vect__4.11_54 * vect__5.12_58;
  _6 = _4 * _5;
  vect_res_13.15_61 = vect__6.14_59 + vect_res_17.6_42;
  vect_res_13.15_62 = vect__6.14_60 + vect_res_17.6_43;
  res_13 = _6 + res_17;
  i_14 = i_19 + 1;
  vectp_data.8_51 = vectp_data.8_53 + POLY_INT_CST [16, 16];
  ivtmp_70 = ivtmp_69 + POLY_INT_CST [4, 4];
  if (ivtmp_70 <= _68)
    goto <bb 6>; [89.00%]

so there's just one IV here (the reduction needs two)

  _46 = vect_vec_iv_.7_45 + vect_cst__44;
  _48 = vect_vec_iv_.7_45 + vect_cst__47;
  _49 = _48 + vect_cst__47;

looks somewhat redundant but the result you quote is from applying VN
and match.pd patterns.  And in the original I can't
see the promotion to unsigned (possibly caused by some match.pd):

Value numbering stmt = _49 = _48 + vect_cst__47;
Setting value number of _49 to _49 (changed)
Matching expression match.pd:163, gimple-match-10.cc:57
Matching expression match.pd:163, gimple-match-10.cc:57
Matching expression match.pd:163, gimple-match-10.cc:57
Applying pattern match.pd:3561, gimple-match-8.cc:746
gimple_simplified to _71 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned
int>(vect_vec_iv_.7_45);
_72 = VIEW_CONVERT_EXPR<vector([2,2]) unsigned int>({ POLY_INT_CST [4, 4], ...
});
_73 = _71 + _72;
_49 = VIEW_CONVERT_EXPR<vector([2,2]) int>(_73);

it seems we think that (x + POLY_INT_CST) + POLY_INT_CST cannot be
associated with signed.  And we fail to value-number both increments
to the same value because of that.  Also _46 is dead, so the first thing
is to see where we code-generate those initial stmts.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-03-15  8:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15  4:21 [Bug tree-optimization/114346] New: vectorizer generates the same IV twice tnfchris at gcc dot gnu.org
2024-03-15  8:47 ` [Bug tree-optimization/114346] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).