[Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer
@ 2024-01-25  3:48 liuhongt at gcc dot gnu.org
  2024-01-25  3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  3:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

            Bug ID: 113592
           Summary: missed partial sum optimization in vectorizer
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

double
foo (short* p, int n)
{
    double sum = 0;
    for (int i = 0; i != n; i++)
      sum += p[i] * (double)p[i];
    return sum;
}

w/ fast-math vectorizer generates

  <bb 5> [local count: 860067200]:
  # vect_sum_16.8_44 = PHI <vect_sum_12.15_61(5), { 0.0, 0.0, 0.0, 0.0 }(4)>
  # ivtmp.35_152 = PHI <ivtmp.35_151(5), ivtmp.35_150(4)>
  # DEBUG BEGIN_STMT
  # DEBUG D#13 => D#14 * 2
  # DEBUG D#12 => p_11(D) + D#13
  _149 = (void *) ivtmp.35_152;
  vect__4.11_47 = MEM <vector(16) short int> [(short int *)_149];
  # DEBUG D#11 => *D#12
  vect__5.13_48 = [vec_unpack_lo_expr] vect__4.11_47;
  vect__5.13_49 = [vec_unpack_hi_expr] vect__4.11_47;
  vect__5.12_50 = [vec_unpack_float_lo_expr] vect__5.13_48;
  vect__5.12_51 = [vec_unpack_float_hi_expr] vect__5.13_48;
  vect__5.12_52 = [vec_unpack_float_lo_expr] vect__5.13_49;
  vect__5.12_53 = [vec_unpack_float_hi_expr] vect__5.13_49;
  # DEBUG D#10 => (double) D#11
  vect_powmult_6.14_55 = vect__5.12_51 * vect__5.12_51;
  _62 = .FMA (vect__5.12_50, vect__5.12_50, vect_powmult_6.14_55);
  vect_powmult_6.14_57 = vect__5.12_53 * vect__5.12_53;
  _45 = .FMA (vect__5.12_52, vect__5.12_52, vect_powmult_6.14_57);
  _46 = _45 + _62;
  # DEBUG D#9 => D#10 * D#10
  vect_sum_12.15_61 = vect_sum_16.8_44 + _46;
  # DEBUG sum => D#8
  # DEBUG BEGIN_STMT
  # DEBUG i => NULL
  # DEBUG sum => D#8
  # DEBUG BEGIN_STMT
  ivtmp.35_151 = ivtmp.35_152 + 32;
  if (_18 != ivtmp.35_151)
    goto <bb 5>; [89.00%]
  else
    goto <bb 8>; [11.00%]

But it can be better with.
....
  vect_powmult_6.14_55 = .FMA (vect__5.12_51, vect__5.12_51, 0);
  _62 = .FMA (vect__5.12_50, vect__5.12_50, 0);
  vect_powmult_6.14_57 = .FMA (vect__5.12_53, vect__5.12_53, 0);
  _45 = .FMA (vect__5.12_52, vect__5.12_52, 0);
  ivtmp.35_151 = ivtmp.35_152 + 32;
  if (_18 != ivtmp.35_151)
    goto <bb 5>; [89.00%]
  else
    goto <bb 8>; [11.00%]

<bb 8>
   _tmp1 = vect_powmult_6.14_55 + _62;
   _tmp2 = vect_powmult_6.14_57 + _45;
   _tmp3 = _tmp1 + _tmp2;
   _tmp4_scalar = .REDUCE_SUM (_tmp3);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
  2024-01-25  3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
@ 2024-01-25  3:50 ` liuhongt at gcc dot gnu.org
  2024-01-25  3:50 ` liuhongt at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  3:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
*** Bug 113593 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
  2024-01-25  3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
  2024-01-25  3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
@ 2024-01-25  3:50 ` liuhongt at gcc dot gnu.org
  2024-01-25  4:46 ` liuhongt at gcc dot gnu.org
  2024-01-25  9:57 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  3:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

--- Comment #2 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
*** Bug 113594 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
  2024-01-25  3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
  2024-01-25  3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
  2024-01-25  3:50 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  4:46 ` liuhongt at gcc dot gnu.org
  2024-01-25  9:57 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-01-25  4:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

--- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
This testcase is probably not a good example for typical partail sum which
relies on unroll loops.

double
foo (double* p, int n)
{
    double sum = 0;
    for (int i = 0; i != n; i++)
      sum += p[i] * p[i];
    return sum;
}

This should be better, vectorizer can do unroll + partial_sum itselves.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/113592] missed partial sum optimization in vectorizer
  2024-01-25  3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-01-25  4:46 ` liuhongt at gcc dot gnu.org
@ 2024-01-25  9:57 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-25  9:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer for the original testcase generates

  # vect_sum_20.8_49 = PHI <vect_sum_16.21_75(6), { 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0 }(9)>
...
  vect__9.20_68 = vect__5.12_55 * vect__8.16_61;
  vect__9.20_69 = vect__5.12_56 * vect__8.17_63;
  vect__9.20_70 = vect__5.12_57 * vect__8.18_65;
  vect__9.20_71 = vect__5.12_58 * vect__8.19_67;
  _9 = _5 * _8;
  vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49;
  vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72;
  vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73;
  vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74;
  sum_16 = _9 + sum_20;

the adds are from the optimization to reduce the number of reduction IVs
(we could alternatively keep them independent with 4 IVs and handle the
reducing in the epilogue).  This is to reduce register pressure.

But this also shows if the issue isn't the multiple IVs, that this could
be handled by reassoc + FMA forming given the vectorizer itself doesn't
produce FMAs here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-25  9:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25  3:48 [Bug tree-optimization/113592] New: missed partial sum optimization in vectorizer liuhongt at gcc dot gnu.org
2024-01-25  3:50 ` [Bug tree-optimization/113592] " liuhongt at gcc dot gnu.org
2024-01-25  3:50 ` liuhongt at gcc dot gnu.org
2024-01-25  4:46 ` liuhongt at gcc dot gnu.org
2024-01-25  9:57 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).