public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] test builtin ratio for loop distribution
@ 2021-01-27 12:40 Alexandre Oliva
  2021-01-27 15:12 ` Richard Biener
  2021-02-05  0:13 ` Jim Wilson
  0 siblings, 2 replies; 26+ messages in thread
From: Alexandre Oliva @ 2021-01-27 12:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Zdenek Dvorak


This patch attempts to fix a libgcc codegen regression introduced in
gcc-10, as -ftree-loop-distribute-patterns was enabled at -O2.


The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch adds to the loop distribution pass some cost analysis based
on preexisting *_RATIO macros, so that we won't transform loops with
trip counts as low as the ratios we'd rather expand inline.


This patch is not finished; it needs adjustments to the testsuite, to
make up for the behavior changes it brings about.  Specifically, on a
x86_64-linux-gnu regstrap, it regresses:

> FAIL: gcc.dg/pr53265.c  (test for warnings, line 40)
> FAIL: gcc.dg/pr53265.c  (test for warnings, line 42)
> FAIL: gcc.dg/tree-ssa/ldist-38.c scan-tree-dump ldist "split to 0 loops and 1 library cal> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++14  scan-tree-dump ldist "split to 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++17  scan-tree-dump ldist "split to 0 loops and 1 library calls"
> FAIL: g++.dg/tree-ssa/pr78847.C  -std=gnu++2a  scan-tree-dump ldist "split to 0 loops and 1 library calls"

I suppose just lengthening the loops will take care of ldist-38 and
pr78847, but the loss of the warnings in pr53265 is more concerning, and
will require investigation.

Nevertheless, I seek feedback on whether this is an acceptable approach,
or whether we should use alternate tuning parameters for ldist, or
something entirely different.  Thanks in advance,


for  gcc/ChangeLog

	* tree-loop-distribution.c (maybe_normalize_partition): New.
	(loop_distribution::distribute_loop): Call it.

[requires testsuite adjustments and investigation of a warning regression]
---
 gcc/tree-loop-distribution.c |   54 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723fb6..b5198652817ee 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2848,6 +2848,52 @@ fuse_memset_builtins (vec<struct partition *> *partitions)
     }
 }
 
+/* Return false if it's profitable to turn the LOOP PARTITION into a builtin
+   call, and true if it wasn't, changing the PARTITION to PKIND_NORMAL.  */
+
+static bool
+maybe_normalize_partition (class loop *loop, struct partition *partition)
+{
+  unsigned HOST_WIDE_INT ratio;
+
+  switch (partition->kind)
+    {
+    case PKIND_NORMAL:
+    case PKIND_PARTIAL_MEMSET:
+      return false;
+
+    case PKIND_MEMSET:
+      if (integer_zerop (gimple_assign_rhs1 (DR_STMT
+					     (partition->builtin->dst_dr))))
+	ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
+      else
+	ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
+      break;
+
+    case PKIND_MEMCPY:
+    case PKIND_MEMMOVE:
+      ratio = MOVE_RATIO (optimize_loop_for_speed_p (loop));
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  tree niters = number_of_latch_executions (loop);
+  if (niters == NULL_TREE || niters == chrec_dont_know)
+    return false;
+
+  wide_int minit, maxit;
+  value_range_kind vrk = determine_value_range (niters, &minit, &maxit);
+  if (vrk == VR_RANGE && wi::ltu_p (maxit, ratio))
+    {
+      partition->kind = PKIND_NORMAL;
+      return true;
+    }
+
+  return false;
+}
+
 void
 loop_distribution::finalize_partitions (class loop *loop,
 					vec<struct partition *> *partitions,
@@ -3087,6 +3133,14 @@ loop_distribution::distribute_loop (class loop *loop, vec<gimple *> stmts,
     }
 
   finalize_partitions (loop, &partitions, &alias_ddrs);
+  {
+    bool any_changes_p = false;
+    for (i = 0; partitions.iterate (i, &partition); ++i)
+      if (maybe_normalize_partition (loop, partition))
+	any_changes_p = true;
+    if (any_changes_p)
+      finalize_partitions (loop, &partitions, &alias_ddrs);
+  }
 
   /* If there is a reduction in all partitions make sure the last one
      is not classified for builtin code generation.  */

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist         GNU Toolchain Engineer
        Vim, Vi, Voltei pro Emacs -- GNUlius Caesar

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-05-04  6:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 12:40 [RFC] test builtin ratio for loop distribution Alexandre Oliva
2021-01-27 15:12 ` Richard Biener
2021-01-28  5:28   ` Alexandre Oliva
2021-01-28  8:59     ` Richard Biener
2021-02-02 17:13       ` Alexandre Oliva
2021-02-03  8:59         ` Richard Biener
2021-02-03 15:11           ` Alexandre Oliva
2021-02-04  8:37             ` Richard Biener
2021-02-04 22:17               ` Alexandre Oliva
2021-02-05  8:02                 ` Richard Biener
2021-02-11 10:19                 ` Alexandre Oliva
2021-02-11 12:14                   ` Alexandre Oliva
2021-02-12 11:34                   ` Richard Biener
2021-02-16  4:56                     ` Alexandre Oliva
2021-02-16 10:47                       ` Alexandre Oliva
2021-02-16 12:11                         ` Richard Biener
2021-02-19  8:08                           ` [PR94092] " Alexandre Oliva
2021-02-22  9:53                             ` Richard Biener
2021-04-29  4:26                               ` Alexandre Oliva
2021-04-30 14:42                                 ` Jeff Law
2021-05-03  8:55                                   ` Richard Biener
2021-05-04  1:59                                     ` Alexandre Oliva
2021-05-04  5:49                                       ` Prathamesh Kulkarni
2021-05-04  6:09                                         ` Alexandre Oliva
2021-02-05  0:13 ` Jim Wilson
2021-02-11 10:11   ` Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).