public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
@ 2016-09-22 19:30 Pat Haugen
  2016-10-04 20:26 ` PING " Pat Haugen
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Pat Haugen @ 2016-09-22 19:30 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

I noticed the loop unroller peels an extra copy of the loop before it enters the switch block code to round the iteration count to a multiple of the unroll factor. This peeled copy is only needed for the case where the exit test is at the beginning of the loop since in that case it inserts the test for zero peel iterations before that peeled copy.

This patch bumps the iteration count by 1 for loops with the exit at the end so that it represents the number of times the loop body is executed, and therefore removes the need to always execute that first peeled copy. With this change, when the number of executions of the loop is an even multiple of the unroll factor then the code will jump to the unrolled loop immediately instead of executing all the switch code and peeled copies of the loop and then falling into the unrolled loop. This change also reduces code size by removing a peeled copy of the loop. 

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?



2016-09-22  Pat Haugen  <pthaugen@us.ibm.com>

	* loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
	loop peel to loops with exit test at the beginning.



[-- Attachment #2: unroll-peel.diff --]
[-- Type: text/x-patch, Size: 2538 bytes --]

Index: gcc/loop-unroll.c
===================================================================
--- gcc/loop-unroll.c	(revision 240167)
+++ gcc/loop-unroll.c	(working copy)
@@ -857,7 +857,7 @@ unroll_loop_runtime_iterations (struct l
   rtx old_niter, niter, tmp;
   rtx_insn *init_code, *branch_code;
   unsigned i, j, p;
-  basic_block preheader, *body, swtch, ezc_swtch;
+  basic_block preheader, *body, swtch, ezc_swtch = NULL;
   int may_exit_copy;
   unsigned n_peel;
   edge e;
@@ -916,6 +916,16 @@ unroll_loop_runtime_iterations (struct l
   if (tmp != niter)
     emit_move_insn (niter, tmp);
 
+  /* For loops that exit at end, add one to niter to account for first pass
+     through loop body before reaching exit test. */
+  if (exit_at_end)
+    {
+      niter = expand_simple_binop (desc->mode, PLUS,
+				   niter, const1_rtx,
+				   NULL_RTX, 0, OPTAB_LIB_WIDEN);
+      old_niter = niter;
+    }
+
   /* Count modulo by ANDing it with max_unroll; we use the fact that
      the number of unrollings is a power of two, and thus this is correct
      even if there is overflow in the computation.  */
@@ -934,20 +944,21 @@ unroll_loop_runtime_iterations (struct l
 
   auto_sbitmap wont_exit (max_unroll + 2);
 
-  /* Peel the first copy of loop body (almost always we must leave exit test
-     here; the only exception is when we have extra zero check and the number
-     of iterations is reliable.  Also record the place of (possible) extra
-     zero check.  */
-  bitmap_clear (wont_exit);
-  if (extra_zero_check
-      && !desc->noloop_assumptions)
-    bitmap_set_bit (wont_exit, 1);
-  ezc_swtch = loop_preheader_edge (loop)->src;
-  ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
-				      1, wont_exit, desc->out_edge,
-				      &remove_edges,
-				      DLTHE_FLAG_UPDATE_FREQ);
-  gcc_assert (ok);
+  if (extra_zero_check)
+    {
+      /* Peel the first copy of loop body.  Leave the exit test if the number
+	 of iterations is not reliable.  Also record the place of the extra zero
+	 check.  */
+      bitmap_clear (wont_exit);
+      if (!desc->noloop_assumptions)
+	bitmap_set_bit (wont_exit, 1);
+      ezc_swtch = loop_preheader_edge (loop)->src;
+      ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop),
+					  1, wont_exit, desc->out_edge,
+					  &remove_edges,
+					  DLTHE_FLAG_UPDATE_FREQ);
+      gcc_assert (ok);
+    }
 
   /* Record the place where switch will be built for preconditioning.  */
   swtch = split_edge (loop_preheader_edge (loop));

^ permalink raw reply	[flat|nested] 6+ messages in thread

* PING Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
  2016-09-22 19:30 [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end Pat Haugen
@ 2016-10-04 20:26 ` Pat Haugen
  2016-10-13 18:36 ` Jeff Law
  2016-10-15  3:28 ` Andrew Pinski
  2 siblings, 0 replies; 6+ messages in thread
From: Pat Haugen @ 2016-10-04 20:26 UTC (permalink / raw)
  To: gcc-patches

Ping for the following patch https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01612.html

-Pat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
  2016-09-22 19:30 [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end Pat Haugen
  2016-10-04 20:26 ` PING " Pat Haugen
@ 2016-10-13 18:36 ` Jeff Law
  2016-10-15  3:28 ` Andrew Pinski
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2016-10-13 18:36 UTC (permalink / raw)
  To: Pat Haugen, GCC Patches

On 09/22/2016 01:10 PM, Pat Haugen wrote:
> I noticed the loop unroller peels an extra copy of the loop before it enters the switch block code to round the iteration count to a multiple of the unroll factor. This peeled copy is only needed for the case where the exit test is at the beginning of the loop since in that case it inserts the test for zero peel iterations before that peeled copy.
>
> This patch bumps the iteration count by 1 for loops with the exit at the end so that it represents the number of times the loop body is executed, and therefore removes the need to always execute that first peeled copy. With this change, when the number of executions of the loop is an even multiple of the unroll factor then the code will jump to the unrolled loop immediately instead of executing all the switch code and peeled copies of the loop and then falling into the unrolled loop. This change also reduces code size by removing a peeled copy of the loop.
>
> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
>
>
>
> 2016-09-22  Pat Haugen  <pthaugen@us.ibm.com>
>
> 	* loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
> 	loop peel to loops with exit test at the beginning.
>
>
OK.
jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
  2016-09-22 19:30 [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end Pat Haugen
  2016-10-04 20:26 ` PING " Pat Haugen
  2016-10-13 18:36 ` Jeff Law
@ 2016-10-15  3:28 ` Andrew Pinski
  2016-10-15  3:29   ` Andrew Pinski
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2016-10-15  3:28 UTC (permalink / raw)
  To: Pat Haugen; +Cc: GCC Patches

On Thu, Sep 22, 2016 at 12:10 PM, Pat Haugen
<pthaugen@linux.vnet.ibm.com> wrote:
> I noticed the loop unroller peels an extra copy of the loop before it enters the switch block code to round the iteration count to a multiple of the unroll factor. This peeled copy is only needed for the case where the exit test is at the beginning of the loop since in that case it inserts the test for zero peel iterations before that peeled copy.
>
> This patch bumps the iteration count by 1 for loops with the exit at the end so that it represents the number of times the loop body is executed, and therefore removes the need to always execute that first peeled copy. With this change, when the number of executions of the loop is an even multiple of the unroll factor then the code will jump to the unrolled loop immediately instead of executing all the switch code and peeled copies of the loop and then falling into the unrolled loop. This change also reduces code size by removing a peeled copy of the loop.
>
> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

This patch or
PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
frequency when computing scale factor for peeled copies.
* loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
values for switch/peel blocks/edges.

Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.

Thanks,
Andrew

>
>
>
> 2016-09-22  Pat Haugen  <pthaugen@us.ibm.com>
>
>         * loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
>         loop peel to loops with exit test at the beginning.
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
  2016-10-15  3:28 ` Andrew Pinski
@ 2016-10-15  3:29   ` Andrew Pinski
  2016-10-17 14:03     ` Pat Haugen
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2016-10-15  3:29 UTC (permalink / raw)
  To: Pat Haugen; +Cc: GCC Patches

On Fri, Oct 14, 2016 at 8:28 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 12:10 PM, Pat Haugen
> <pthaugen@linux.vnet.ibm.com> wrote:
>> I noticed the loop unroller peels an extra copy of the loop before it enters the switch block code to round the iteration count to a multiple of the unroll factor. This peeled copy is only needed for the case where the exit test is at the beginning of the loop since in that case it inserts the test for zero peel iterations before that peeled copy.
>>
>> This patch bumps the iteration count by 1 for loops with the exit at the end so that it represents the number of times the loop body is executed, and therefore removes the need to always execute that first peeled copy. With this change, when the number of executions of the loop is an even multiple of the unroll factor then the code will jump to the unrolled loop immediately instead of executing all the switch code and peeled copies of the loop and then falling into the unrolled loop. This change also reduces code size by removing a peeled copy of the loop.
>>
>> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
>
> This patch or
> PR rtl-optimization/68212
> * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
> frequency when computing scale factor for peeled copies.
> * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
> values for switch/peel blocks/edges.
>
> Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.

I should say on ThunderX (aarch64-linux-gnu).

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>>
>>
>> 2016-09-22  Pat Haugen  <pthaugen@us.ibm.com>
>>
>>         * loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
>>         loop peel to loops with exit test at the beginning.
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end
  2016-10-15  3:29   ` Andrew Pinski
@ 2016-10-17 14:03     ` Pat Haugen
  0 siblings, 0 replies; 6+ messages in thread
From: Pat Haugen @ 2016-10-17 14:03 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: GCC Patches

On 10/14/2016 10:29 PM, Andrew Pinski wrote:
>>> >> This patch bumps the iteration count by 1 for loops with the exit at the end so that it represents the number of times the loop body is executed, and therefore removes the need to always execute that first peeled copy. With this change, when the number of executions of the loop is an even multiple of the unroll factor then the code will jump to the unrolled loop immediately instead of executing all the switch code and peeled copies of the loop and then falling into the unrolled loop. This change also reduces code size by removing a peeled copy of the loop.
>>> >>
>>> >> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
>> >
>> > This patch or
>> > PR rtl-optimization/68212
>> > * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
>> > frequency when computing scale factor for peeled copies.
>> > * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
>> > values for switch/peel blocks/edges.
>> >
>> > Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.
> I should say on ThunderX (aarch64-linux-gnu).

Sorry to hear about the degradation. Do you have more details on which patch and/or what specifically causes the degradation? This patch should only affect the execution path outside the unrolled loop (worst case is probably for loops that execute once). The pr68212 patch is just correcting some of the block frequency/count issues, so they're not as screwed up as what they were.

Thanks,
Pat

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-17 14:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 19:30 [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end Pat Haugen
2016-10-04 20:26 ` PING " Pat Haugen
2016-10-13 18:36 ` Jeff Law
2016-10-15  3:28 ` Andrew Pinski
2016-10-15  3:29   ` Andrew Pinski
2016-10-17 14:03     ` Pat Haugen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).