* [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
@ 2013-11-20 19:03 Gopalasubramanian, Ganesh
2013-11-20 19:08 ` H.J. Lu
2013-11-22 10:32 ` Uros Bizjak
0 siblings, 2 replies; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-11-20 19:03 UTC (permalink / raw)
To: gcc-patches
Cc: Uros Bizjak (ubizjak@gmail.com),
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, H.J. Lu (hjl.tools@gmail.com),
Jakub Jelinek (jakub@redhat.com)
[-- Attachment #1: Type: text/plain, Size: 512 bytes --]
Hi,
Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
This patch has some noise in SPEC 2006 results.
Bootstrapping passes.
I would like to know your comments before committing.
Regards
Ganesh
[-- Attachment #2: loop_unroll_bdver3.patch --]
[-- Type: application/octet-stream, Size: 2666 bytes --]
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7ae9f57..d02cbdd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3. If not see
#include "cgraph.h"
#include "gimple.h"
#include "gimplify.h"
+#include "cfgloop.h"
#include "dwarf2.h"
#include "df.h"
#include "tm-constrs.h"
@@ -43683,6 +43684,60 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
return val;
}
+/* This function gives out the number of memory references.
+ This value determines the unrolling factor for
+ bdver3 and bdver4 architectures. */
+
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+{
+ if (*x != NULL_RTX && MEM_P (*x))
+ {
+ enum machine_mode mode;
+ unsigned int n_words;
+
+ mode = GET_MODE (*x);
+ n_words = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+
+ if (n_words > 4)
+ (*mem_count)+=2;
+ else
+ (*mem_count)+=1;
+ }
+ return 0;
+}
+
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
+{
+ basic_block *bbs;
+ rtx insn;
+ unsigned i;
+ unsigned mem_count = 0;
+
+ if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4)
+ {
+ return nunroll;
+ }
+
+ /* Count the number of memory references within the loop body. */
+ bbs = get_loop_body (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ for (insn = BB_HEAD (bbs[i]); insn != BB_END (bbs[i]); insn = NEXT_INSN (insn))
+ if (INSN_P (insn) && INSN_CODE (insn) != -1)
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count);
+ }
+ free (bbs);
+
+ if ( (mem_count*nunroll) <= 32)
+ return nunroll;
+ else if (mem_count <=32)
+ return (32/mem_count);
+
+ return nunroll;
+}
+
/* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
static bool
@@ -44168,6 +44223,9 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
#define TARGET_INIT_LIBFUNCS darwin_rename_builtins
#endif
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST ix86_loop_unroll_adjust
+
#undef TARGET_SPILL_CLASS
#define TARGET_SPILL_CLASS ix86_spill_class
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 1240f7c..961f124 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -667,6 +667,9 @@ decide_unroll_constant_iterations (struct loop *loop, int flags)
if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+ if (targetm.loop_unroll_adjust)
+ nunroll = targetm.loop_unroll_adjust (nunroll, loop);
+
/* Skip big loops. */
if (nunroll <= 1)
{
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-20 19:03 [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 Gopalasubramanian, Ganesh
@ 2013-11-20 19:08 ` H.J. Lu
2013-11-21 9:59 ` Gopalasubramanian, Ganesh
2013-11-22 9:32 ` Gopalasubramanian, Ganesh
2013-11-22 10:32 ` Uros Bizjak
1 sibling, 2 replies; 14+ messages in thread
From: H.J. Lu @ 2013-11-20 19:08 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh
Cc: gcc-patches, Uros Bizjak (ubizjak@gmail.com),
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, Jakub Jelinek (jakub@redhat.com)
On Wed, Nov 20, 2013 at 10:26 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> Hi,
>
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
>
I suggest you add this to x86-tune.def and enable it for
bdver3 and bdver4.
--
H.J.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-20 19:08 ` H.J. Lu
@ 2013-11-21 9:59 ` Gopalasubramanian, Ganesh
2013-11-22 9:32 ` Gopalasubramanian, Ganesh
1 sibling, 0 replies; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-11-21 9:59 UTC (permalink / raw)
To: H.J. Lu
Cc: gcc-patches, Uros Bizjak (ubizjak@gmail.com),
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, Jakub Jelinek (jakub@redhat.com)
> I suggest you add this to x86-tune.def and enable it for
> bdver3 and bdver4.
The macro TARGET_LOOP_UNROLL_ADJUST is not new. It is already available and is used by target s390.
Since it is not an "x86 only" feature I didn't add that in x86-tune.def.
Regards
Ganesh
-----Original Message-----
From: H.J. Lu [mailto:hjl.tools@gmail.com]
Sent: Thursday, November 21, 2013 12:02 AM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubizjak@gmail.com); Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; Jakub Jelinek (jakub@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Nov 20, 2013 at 10:26 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Hi,
>
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
>
I suggest you add this to x86-tune.def and enable it for
bdver3 and bdver4.
--
H.J.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-20 19:08 ` H.J. Lu
2013-11-21 9:59 ` Gopalasubramanian, Ganesh
@ 2013-11-22 9:32 ` Gopalasubramanian, Ganesh
1 sibling, 0 replies; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-11-22 9:32 UTC (permalink / raw)
To: gcc-patches
Cc: Uros Bizjak (ubizjak@gmail.com),
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, Jakub Jelinek (jakub@redhat.com),
H.J. Lu
Ping!
-----Original Message-----
From: Gopalasubramanian, Ganesh
Sent: Thursday, November 21, 2013 10:35 AM
To: 'H.J. Lu'
Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubizjak@gmail.com); Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; Jakub Jelinek (jakub@redhat.com)
Subject: RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
> I suggest you add this to x86-tune.def and enable it for
> bdver3 and bdver4.
The macro TARGET_LOOP_UNROLL_ADJUST is not new. It is already available and is used by target s390.
Since it is not an "x86 only" feature I didn't add that in x86-tune.def.
Regards
Ganesh
-----Original Message-----
From: H.J. Lu [mailto:hjl.tools@gmail.com]
Sent: Thursday, November 21, 2013 12:02 AM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubizjak@gmail.com); Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; Jakub Jelinek (jakub@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Nov 20, 2013 at 10:26 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Hi,
>
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
>
I suggest you add this to x86-tune.def and enable it for
bdver3 and bdver4.
--
H.J.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-20 19:03 [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 Gopalasubramanian, Ganesh
2013-11-20 19:08 ` H.J. Lu
@ 2013-11-22 10:32 ` Uros Bizjak
2013-11-28 14:07 ` Gopalasubramanian, Ganesh
2013-12-04 8:39 ` Gopalasubramanian, Ganesh
1 sibling, 2 replies; 14+ messages in thread
From: Uros Bizjak @ 2013-11-22 10:32 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh
Cc: gcc-patches,
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, H.J. Lu (hjl.tools@gmail.com),
Jakub Jelinek (jakub@redhat.com)
On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
Please split the patch to target-dependant and target-independant
part, and get target-idependant part reviewed first.
This part:
+ if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4)
+ {
+ return nunroll;
+ }
is wrong. You should introduce tune variable (as H.J. suggested) and
check that variable here. Target dependant tuning options should be in
x86-tune.def, so everything regarding tuning can be found in one
place.
+ if (INSN_P (insn) && INSN_CODE (insn) != -1)
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
if (NONDEBUG_INSN_P (insn))
for_each_rtx (&PATTERN(insn), ...);
otherwise your heuristics will depend on -g compile option.
+ if ( (mem_count*nunroll) <= 32)
Extra parenthesis.
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+{
+ if (*x != NULL_RTX && MEM_P (*x))
*x will never be null for active insns.
Uros.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-22 10:32 ` Uros Bizjak
@ 2013-11-28 14:07 ` Gopalasubramanian, Ganesh
2013-11-28 16:31 ` Richard Biener
2013-12-04 8:39 ` Gopalasubramanian, Ganesh
1 sibling, 1 reply; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-11-28 14:07 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com),
borntraeger, H.J. Lu (hjl.tools@gmail.com),
Jakub Jelinek (jakub@redhat.com),
Uros Bizjak
This patch adds influence of macro TARGET_LOOP_UNROLL_ADJUST during constant iterations (decide_unroll_constant_iterations).
The macro has been already checked for runtime iterations (decide_unroll_runtime_iterations), and for unroll stupid (decide_unroll_stupid).
Bootstrapping and test passes.
Would like to know your comments before committing.
Regards
Ganesh
2013-11-28 Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
* loop-unroll.c (decide_unroll_constant_iterations): Check macro
TARGET_LOOP_UNROLL_ADJUST while deciding unroll factor.
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 9c87167..557915f 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -664,6 +664,9 @@ decide_unroll_constant_iterations (struct loop *loop, int flags)
if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+ if (targetm.loop_unroll_adjust)
+ nunroll = targetm.loop_unroll_adjust (nunroll, loop);
+
/* Skip big loops. */
if (nunroll <= 1)
{
-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com]
Sent: Friday, November 22, 2013 1:46 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; H.J. Lu (hjl.tools@gmail.com); Jakub Jelinek (jakub@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
Please split the patch to target-dependant and target-independant part, and get target-idependant part reviewed first.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-28 14:07 ` Gopalasubramanian, Ganesh
@ 2013-11-28 16:31 ` Richard Biener
0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2013-11-28 16:31 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh
Cc: gcc-patches, borntraeger, H.J. Lu (hjl.tools@gmail.com),
Jakub Jelinek (jakub@redhat.com),
Uros Bizjak
On Thu, Nov 28, 2013 at 12:23 PM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> This patch adds influence of macro TARGET_LOOP_UNROLL_ADJUST during constant iterations (decide_unroll_constant_iterations).
> The macro has been already checked for runtime iterations (decide_unroll_runtime_iterations), and for unroll stupid (decide_unroll_stupid).
>
> Bootstrapping and test passes.
>
> Would like to know your comments before committing.
Quickly checked the only port using the hook (s390) and the patch
looks ok.
Thus, ok.
Thanks,
Richard.
> Regards
> Ganesh
>
> 2013-11-28 Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
>
> * loop-unroll.c (decide_unroll_constant_iterations): Check macro
> TARGET_LOOP_UNROLL_ADJUST while deciding unroll factor.
>
>
> diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
> index 9c87167..557915f 100644
> --- a/gcc/loop-unroll.c
> +++ b/gcc/loop-unroll.c
> @@ -664,6 +664,9 @@ decide_unroll_constant_iterations (struct loop *loop, int flags)
> if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
> nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
>
> + if (targetm.loop_unroll_adjust)
> + nunroll = targetm.loop_unroll_adjust (nunroll, loop);
> +
> /* Skip big loops. */
> if (nunroll <= 1)
> {
>
> -----Original Message-----
> From: Uros Bizjak [mailto:ubizjak@gmail.com]
> Sent: Friday, November 22, 2013 1:46 PM
> To: Gopalasubramanian, Ganesh
> Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; H.J. Lu (hjl.tools@gmail.com); Jakub Jelinek (jakub@redhat.com)
> Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
>
> On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
>
>> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
>> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>>
>> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
>> This patch has some noise in SPEC 2006 results.
>>
>> Bootstrapping passes.
>>
>> I would like to know your comments before committing.
>
> Please split the patch to target-dependant and target-independant part, and get target-idependant part reviewed first.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-11-22 10:32 ` Uros Bizjak
2013-11-28 14:07 ` Gopalasubramanian, Ganesh
@ 2013-12-04 8:39 ` Gopalasubramanian, Ganesh
2013-12-04 9:46 ` Uros Bizjak
1 sibling, 1 reply; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-12-04 8:39 UTC (permalink / raw)
To: Uros Bizjak
Cc: gcc-patches,
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com)
[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]
Hi Uros!
Attached is the revised patch.
The target independent part has been already approved and added.
This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor.
Accommodated the comments given by you except one.
> *x will never be null for active insns.
Since every rtx in the insn is checked for memory references, the NULL_RTX check is required.
Regards
Ganesh
-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com]
Sent: Friday, November 22, 2013 1:46 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com); borntraeger@de.ibm.com; H.J. Lu (hjl.tools@gmail.com); Jakub Jelinek (jakub@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.
Please split the patch to target-dependant and target-independant part, and get target-idependant part reviewed first.
This part:
+ if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4)
+ {
+ return nunroll;
+ }
is wrong. You should introduce tune variable (as H.J. suggested) and check that variable here. Target dependant tuning options should be in x86-tune.def, so everything regarding tuning can be found in one place.
+ if (INSN_P (insn) && INSN_CODE (insn) != -1)
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
if (NONDEBUG_INSN_P (insn))
for_each_rtx (&PATTERN(insn), ...);
otherwise your heuristics will depend on -g compile option.
+ if ( (mem_count*nunroll) <= 32)
Extra parenthesis.
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count) {
+ if (*x != NULL_RTX && MEM_P (*x))
*x will never be null for active insns.
Uros.
[-- Attachment #2: unroll-adjust.patch --]
[-- Type: application/octet-stream, Size: 2740 bytes --]
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c (revision 205654)
+++ gcc/config/i386/i386.c (working copy)
@@ -64,6 +64,7 @@
#include "is-a.h"
#include "gimple.h"
#include "gimplify.h"
+#include "cfgloop.h"
#include "dwarf2.h"
#include "df.h"
#include "tm-constrs.h"
@@ -43867,6 +43868,57 @@
}
}
+/* This function gives out the number of memory references.
+ This value determines the unrolling factor for
+ bdver3 and bdver4 architectures. */
+
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+{
+ if (*x != NULL_RTX && MEM_P (*x))
+ {
+ enum machine_mode mode;
+ unsigned int n_words;
+
+ mode = GET_MODE (*x);
+ n_words = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+
+ if (n_words > 4)
+ (*mem_count)+=2;
+ else
+ (*mem_count)+=1;
+ }
+ return 0;
+}
+
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
+{
+ basic_block *bbs;
+ rtx insn;
+ unsigned i;
+ unsigned mem_count = 0;
+
+ if (!ix86_tune_features [X86_TUNE_ADJUST_UNROLL])
+ return nunroll;
+
+ /* Count the number of memory references within the loop body. */
+ bbs = get_loop_body (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ for (insn = BB_HEAD (bbs[i]); insn != BB_END (bbs[i]); insn = NEXT_INSN (insn))
+ if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1)
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count);
+ }
+ free (bbs);
+
+ if (mem_count <=32)
+ return 32/mem_count;
+
+ return nunroll;
+}
+
+
/* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
static bool
@@ -44352,6 +44404,9 @@
#define TARGET_INIT_LIBFUNCS darwin_rename_builtins
#endif
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST ix86_loop_unroll_adjust
+
#undef TARGET_SPILL_CLASS
#define TARGET_SPILL_CLASS ix86_spill_class
Index: gcc/config/i386/x86-tune.def
===================================================================
--- gcc/config/i386/x86-tune.def (revision 205654)
+++ gcc/config/i386/x86-tune.def (working copy)
@@ -503,3 +503,9 @@
arithmetic to 32bit via PROMOTE_MODE macro. This code generation scheme
is usually used for RISC targets. */
DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
+
+/* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
+ on hardware capabilities. Bdver3 hardware has a loop buffer which makes
+ unrolling small loop less important. For, such architectures we adjust
+ the unroll factor so that the unrolled loop fits the loop buffer. */
+DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-04 8:39 ` Gopalasubramanian, Ganesh
@ 2013-12-04 9:46 ` Uros Bizjak
2013-12-04 10:32 ` Gopalasubramanian, Ganesh
2013-12-11 10:27 ` Gopalasubramanian, Ganesh
0 siblings, 2 replies; 14+ messages in thread
From: Uros Bizjak @ 2013-12-04 9:46 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh
Cc: gcc-patches,
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com)
On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> Attached is the revised patch.
> The target independent part has been already approved and added.
>
> This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor.
>
> Accommodated the comments given by you except one.
>
>> *x will never be null for active insns.
> Since every rtx in the insn is checked for memory references, the NULL_RTX check is required.
Yes you are correct. for_each_rtx also passes NULL_RTX, I was
distracted by "There are no sub-expressions." comment.
+ if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1)
Do you need to check for INSN_CODE here? IIRC, checking for
NONDEBUG_INSN_P is enough.
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
+ }
+ free (bbs);
+
+ if (mem_count <=32)
+ return 32/mem_count;
Ouch... mem_count can be zero. Is there a reason to change this part
from previous patch?
Uros.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-04 9:46 ` Uros Bizjak
@ 2013-12-04 10:32 ` Gopalasubramanian, Ganesh
2013-12-11 10:27 ` Gopalasubramanian, Ganesh
1 sibling, 0 replies; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-12-04 10:32 UTC (permalink / raw)
To: Uros Bizjak
Cc: gcc-patches,
Richard Guenther <richard.guenther@gmail.com>
(richard.guenther@gmail.com)
> Ouch... mem_count can be zero. Is there a reason to change this part from previous patch?
Oops! You're right. I will correct this.
The idea is to count the memory references and decide on the unrolling factor.
Previous patch does that in two steps I thought of doing that in a single step. (I think I missed my step here ;) )
Regards
Ganesh
-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com]
Sent: Wednesday, December 04, 2013 3:17 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Attached is the revised patch.
> The target independent part has been already approved and added.
>
> This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor.
>
> Accommodated the comments given by you except one.
>
>> *x will never be null for active insns.
> Since every rtx in the insn is checked for memory references, the NULL_RTX check is required.
Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by "There are no sub-expressions." comment.
+ if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1)
Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is enough.
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
+ }
+ free (bbs);
+
+ if (mem_count <=32)
+ return 32/mem_count;
Ouch... mem_count can be zero. Is there a reason to change this part from previous patch?
Uros.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-04 9:46 ` Uros Bizjak
2013-12-04 10:32 ` Gopalasubramanian, Ganesh
@ 2013-12-11 10:27 ` Gopalasubramanian, Ganesh
2013-12-11 10:41 ` Uros Bizjak
1 sibling, 1 reply; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-12-11 10:27 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]
Hi Uros!
Accommodated the changes that you mentioned.
Completed the bootstrap testing too.
Regards
Ganesh
-----Original Message-----
From: Uros Bizjak [mailto:ubizjak@gmail.com]
Sent: Wednesday, December 04, 2013 3:17 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guenther@gmail.com> (richard.guenther@gmail.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote:
> Attached is the revised patch.
> The target independent part has been already approved and added.
>
> This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor.
>
> Accommodated the comments given by you except one.
>
>> *x will never be null for active insns.
> Since every rtx in the insn is checked for memory references, the NULL_RTX check is required.
Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by "There are no sub-expressions." comment.
+ if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1)
Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is enough.
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);
+ }
+ free (bbs);
+
+ if (mem_count <=32)
+ return 32/mem_count;
Ouch... mem_count can be zero. Is there a reason to change this part from previous patch?
Uros.
[-- Attachment #2: unroll-adjust.patch --]
[-- Type: application/octet-stream, Size: 2727 bytes --]
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c (revision 205654)
+++ gcc/config/i386/i386.c (working copy)
@@ -64,6 +64,7 @@
#include "is-a.h"
#include "gimple.h"
#include "gimplify.h"
+#include "cfgloop.h"
#include "dwarf2.h"
#include "df.h"
#include "tm-constrs.h"
@@ -43867,6 +43868,57 @@
}
}
+/* This function gives out the number of memory references.
+ This value determines the unrolling factor for
+ bdver3 and bdver4 architectures. */
+
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+{
+ if (*x != NULL_RTX && MEM_P (*x))
+ {
+ enum machine_mode mode;
+ unsigned int n_words;
+
+ mode = GET_MODE (*x);
+ n_words = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+
+ if (n_words > 4)
+ (*mem_count)+=2;
+ else
+ (*mem_count)+=1;
+ }
+ return 0;
+}
+
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
+{
+ basic_block *bbs;
+ rtx insn;
+ unsigned i;
+ unsigned mem_count = 0;
+
+ if (!ix86_tune_features [X86_TUNE_ADJUST_UNROLL])
+ return nunroll;
+
+ /* Count the number of memory references within the loop body. */
+ bbs = get_loop_body (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ for (insn = BB_HEAD (bbs[i]); insn != BB_END (bbs[i]); insn = NEXT_INSN (insn))
+ if (NONDEBUG_INSN_P (insn))
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count);
+ }
+ free (bbs);
+
+ if (mem_count && mem_count <=32)
+ return 32/mem_count;
+
+ return nunroll;
+}
+
+
/* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
static bool
@@ -44352,6 +44404,9 @@
#define TARGET_INIT_LIBFUNCS darwin_rename_builtins
#endif
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST ix86_loop_unroll_adjust
+
#undef TARGET_SPILL_CLASS
#define TARGET_SPILL_CLASS ix86_spill_class
Index: gcc/config/i386/x86-tune.def
===================================================================
--- gcc/config/i386/x86-tune.def (revision 205654)
+++ gcc/config/i386/x86-tune.def (working copy)
@@ -503,3 +503,9 @@
arithmetic to 32bit via PROMOTE_MODE macro. This code generation scheme
is usually used for RISC targets. */
DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
+
+/* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
+ on hardware capabilities. Bdver3 hardware has a loop buffer which makes
+ unrolling small loop less important. For, such architectures we adjust
+ the unroll factor so that the unrolled loop fits the loop buffer. */
+DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-11 10:27 ` Gopalasubramanian, Ganesh
@ 2013-12-11 10:41 ` Uros Bizjak
2013-12-19 10:01 ` Gopalasubramanian, Ganesh
0 siblings, 1 reply; 14+ messages in thread
From: Uros Bizjak @ 2013-12-11 10:41 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh; +Cc: gcc-patches
On Wed, Dec 11, 2013 at 11:27 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
> Accommodated the changes that you mentioned.
> Completed the bootstrap testing too.
Please provide updated ChangeLog.
The function comment is missing. Maybe you should also describe magic
number 32 here?
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
+ if (!ix86_tune_features [X86_TUNE_ADJUST_UNROLL])
+ return nunroll;
Please add
#define TARGET_ADJUST_UNROLL \
ix86_tune_features[X86_TUNE_ADJUST_UNROLL]
to i386.h and use definition here.
Otherwise, the patch looks OK.
BTW: Please avoid top-posting, see e.g. [1] for reasons...
[1] http://gcc.gnu.org/ml/gcc/2008-08/msg00133.html
Uros.
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-11 10:41 ` Uros Bizjak
@ 2013-12-19 10:01 ` Gopalasubramanian, Ganesh
2013-12-19 10:15 ` Uros Bizjak
0 siblings, 1 reply; 14+ messages in thread
From: Gopalasubramanian, Ganesh @ 2013-12-19 10:01 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 834 bytes --]
> Please provide updated ChangeLog.
--- gcc/ChangeLog (revision 206106)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,14 @@
+2013-12-19 Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
+
+ * config/i386/i386.c: Include cfgloop.h.
+ (ix86_loop_memcount): New function.
+ (ix86_loop_unroll_adjust): New function.
+ (TARGET_LOOP_UNROLL_ADJUST): Define.
+ * config/i386/i386.h
+ (TARGET_ADJUST_UNROLL): Define.
+ * config/i386/x86-tune.def
+ (X86_TUNE_ADJUST_UNROLL): Define.
+
> The function comment is missing. Maybe you should also describe magic number 32 here?
Added the function comment.
> Otherwise, the patch looks OK.
Thanks. Bootstrapping passes. Is it OK for upstream?
> BTW: Please avoid top-posting, see e.g. [1] for reasons...
Sorry for the lapse. Will comply.
Regards
Ganesh
[-- Attachment #2: unroll-adjust.patch --]
[-- Type: application/octet-stream, Size: 4213 bytes --]
Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog (revision 206106)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,14 @@
+2013-12-19 Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
+
+ * config/i386/i386.c: Include cfgloop.h.
+ (ix86_loop_memcount): New function.
+ (ix86_loop_unroll_adjust): New function.
+ (TARGET_LOOP_UNROLL_ADJUST): Define.
+ * config/i386/i386.h
+ (TARGET_ADJUST_UNROLL): Define.
+ * config/i386/x86-tune.def
+ (X86_TUNE_ADJUST_UNROLL): Define.
+
2013-12-19 Monk Chiang <sh.chiang04@gmail.com>
* common/config/nds32/nds32-common.c (TARGET_DEFAULT_TARGET_FLAGS):
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c (revision 206106)
+++ gcc/config/i386/i386.c (working copy)
@@ -64,6 +64,7 @@
#include "is-a.h"
#include "gimple.h"
#include "gimplify.h"
+#include "cfgloop.h"
#include "dwarf2.h"
#include "df.h"
#include "tm-constrs.h"
@@ -44020,6 +44021,64 @@
}
}
+/* This function gives out the number of memory references.
+ This value determines the unrolling factor for
+ bdver3 and bdver4 architectures. */
+
+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+{
+ if (*x != NULL_RTX && MEM_P (*x))
+ {
+ enum machine_mode mode;
+ unsigned int n_words;
+
+ mode = GET_MODE (*x);
+ n_words = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+
+ if (n_words > 4)
+ (*mem_count)+=2;
+ else
+ (*mem_count)+=1;
+ }
+ return 0;
+}
+
+/* This function adjusts the unroll factor based on
+ the hardware capabilities. For ex, bdver3 has
+ a loop buffer which makes unrolling of smaller
+ loops less important. This function decides the
+ unroll factor using number of memory references
+ (value 32 is used) as a heuristic. */
+
+static unsigned
+ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
+{
+ basic_block *bbs;
+ rtx insn;
+ unsigned i;
+ unsigned mem_count = 0;
+
+ if (!TARGET_ADJUST_UNROLL)
+ return nunroll;
+
+ /* Count the number of memory references within the loop body. */
+ bbs = get_loop_body (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ for (insn = BB_HEAD (bbs[i]); insn != BB_END (bbs[i]); insn = NEXT_INSN (insn))
+ if (NONDEBUG_INSN_P (insn))
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count);
+ }
+ free (bbs);
+
+ if (mem_count && mem_count <=32)
+ return 32/mem_count;
+
+ return nunroll;
+}
+
+
/* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
static bool
@@ -44505,6 +44564,9 @@
#define TARGET_INIT_LIBFUNCS darwin_rename_builtins
#endif
+#undef TARGET_LOOP_UNROLL_ADJUST
+#define TARGET_LOOP_UNROLL_ADJUST ix86_loop_unroll_adjust
+
#undef TARGET_SPILL_CLASS
#define TARGET_SPILL_CLASS ix86_spill_class
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h (revision 206106)
+++ gcc/config/i386/i386.h (working copy)
@@ -443,6 +443,8 @@
ix86_tune_features[X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE]
#define TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS \
ix86_tune_features[X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS]
+#define TARGET_ADJUST_UNROLL \
+ ix86_tune_features[X86_TUNE_ADJUST_UNROLL]
/* Feature tests against the various architecture variations. */
enum ix86_arch_indices {
Index: gcc/config/i386/x86-tune.def
===================================================================
--- gcc/config/i386/x86-tune.def (revision 206106)
+++ gcc/config/i386/x86-tune.def (working copy)
@@ -503,3 +503,9 @@
arithmetic to 32bit via PROMOTE_MODE macro. This code generation scheme
is usually used for RISC targets. */
DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0)
+
+/* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based
+ on hardware capabilities. Bdver3 hardware has a loop buffer which makes
+ unrolling small loop less important. For, such architectures we adjust
+ the unroll factor so that the unrolled loop fits the loop buffer. */
+DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
2013-12-19 10:01 ` Gopalasubramanian, Ganesh
@ 2013-12-19 10:15 ` Uros Bizjak
0 siblings, 0 replies; 14+ messages in thread
From: Uros Bizjak @ 2013-12-19 10:15 UTC (permalink / raw)
To: Gopalasubramanian, Ganesh; +Cc: gcc-patches
On Thu, Dec 19, 2013 at 11:01 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
>> Please provide updated ChangeLog.
>
> --- gcc/ChangeLog (revision 206106)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,14 @@
> +2013-12-19 Ganesh Gopalasubramanian <Ganesh.Gopalasubramanian@amd.com>
> +
> + * config/i386/i386.c: Include cfgloop.h.
> + (ix86_loop_memcount): New function.
> + (ix86_loop_unroll_adjust): New function.
> + (TARGET_LOOP_UNROLL_ADJUST): Define.
> + * config/i386/i386.h
> + (TARGET_ADJUST_UNROLL): Define.
> + * config/i386/x86-tune.def
> + (X86_TUNE_ADJUST_UNROLL): Define.
> +
>
>> The function comment is missing. Maybe you should also describe magic number 32 here?
> Added the function comment.
>
>> Otherwise, the patch looks OK.
> Thanks. Bootstrapping passes. Is it OK for upstream?
OK.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-12-19 10:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-20 19:03 [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 Gopalasubramanian, Ganesh
2013-11-20 19:08 ` H.J. Lu
2013-11-21 9:59 ` Gopalasubramanian, Ganesh
2013-11-22 9:32 ` Gopalasubramanian, Ganesh
2013-11-22 10:32 ` Uros Bizjak
2013-11-28 14:07 ` Gopalasubramanian, Ganesh
2013-11-28 16:31 ` Richard Biener
2013-12-04 8:39 ` Gopalasubramanian, Ganesh
2013-12-04 9:46 ` Uros Bizjak
2013-12-04 10:32 ` Gopalasubramanian, Ganesh
2013-12-11 10:27 ` Gopalasubramanian, Ganesh
2013-12-11 10:41 ` Uros Bizjak
2013-12-19 10:01 ` Gopalasubramanian, Ganesh
2013-12-19 10:15 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).