* [Patch, avr] Provide correct memory move costs
@ 2015-12-16 7:09 Senthil Kumar Selvaraj
2015-12-17 16:56 ` Denis Chertykov
0 siblings, 1 reply; 2+ messages in thread
From: Senthil Kumar Selvaraj @ 2015-12-16 7:09 UTC (permalink / raw)
To: gcc-patches; +Cc: chertykov, avr
Hi,
When analyzing code size regressions for AVR for top-of-trunk, I
found a few cases where aggresive inlining (by the middle-end)
of functions containing calls to memcpy was bloating up the code.
Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the
original commit), when it really should be 1, as the AVRs can only
move a single byte between reg and memory in a single instruction.
Setting it to 4 causes the middle end to underestimate the
cost of memcopys with a compile time constant length parameter, as it
thinks a 4 byte copy's cost is only a single instruction.
Just setting MOVE_MAX to 1 makes the middle end too conservative
though, and causes a bunch of regression tests to fail, as lots of
optimizations fail to pass the code size increase threshold check,
even when not optimizing for size.
Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
target hook that tells the middle-end to use load/store insns for
memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
optimizing for speed, so that moves upto 4 bytes will occur through
load/store sequences, like it does now.
With this, only a couple of regression tests fail. uninit-19.c fails
because it thinks only non-pic code won't inline a function, but the
cost computation prevents inlining for AVRs. The test passes if
the optimization level is increased to -O3.
strlenopt-8.c has an XPASS and a FAIL because a previous pass issued
a builtin_memcpy instead of a MEM assignment. Execution still passes.
I'll continue running more tests to see if there are other performance
related consequences.
Is this ok? If ok, could someone commit please? I don't have commit
access.
Regards
Senthil
gcc/ChangeLog
2015-12-16 Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com>
* config/avr/avr.h (MOVE_MAX): Set value to 1.
(MOVE_MAX_PIECES): Define.
(MOVE_RATIO): Define.
* config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
Provide target hook.
(avr_use_by_pieces_infrastructure_p): New function.
diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 609a42b..9cc95db 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -2431,6 +2431,27 @@ avr_print_operand (FILE *file, rtx x, int code)
}
+/* Implement TARGET_USE_BY_PIECES_INFRASTRUCTURE_P. */
+
+/* Prefer sequence of loads/stores for moves of size upto
+ two - two pairs of load/store instructions are always better
+ than the 5 instruction sequence for a loop (1 instruction
+ for loop counter setup, and 4 for the body of the loop). */
+
+static bool
+avr_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
+ unsigned int align ATTRIBUTE_UNUSED,
+ enum by_pieces_operation op,
+ bool speed_p)
+{
+
+ if (op != MOVE_BY_PIECES || (speed_p && (size > (MOVE_MAX_PIECES))))
+ return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
+
+ return size <= (MOVE_MAX_PIECES);
+}
+
+
/* Worker function for `NOTICE_UPDATE_CC'. */
/* Update the condition code in the INSN. */
@@ -13763,6 +13784,10 @@ avr_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg,
#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P
#define TARGET_PRINT_OPERAND_PUNCT_VALID_P avr_print_operand_punct_valid_p
+#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
+#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
+ avr_use_by_pieces_infrastructure_p
+
struct gcc_target targetm = TARGET_INITIALIZER;
\f
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 7439964..ebfb8ed 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -453,7 +453,22 @@ typedef struct avr_args
#undef WORD_REGISTER_OPERATIONS
-#define MOVE_MAX 4
+/* Can move only a single byte from memory to reg in a
+ single instruction. */
+
+#define MOVE_MAX 1
+
+/* Allow upto two bytes moves to occur using by_pieces
+ infrastructure */
+
+#define MOVE_MAX_PIECES 2
+
+/* Set MOVE_RATIO to 3 to allow memory moves upto 4 bytes to happen
+ by pieces when optimizing for speed, like it did when MOVE_MAX_PIECES
+ was 4. When optimizing for size, allow memory moves upto 2 bytes.
+ Also see avr_use_by_pieces_infrastructure_p. */
+
+#define MOVE_RATIO(speed) ((speed) ? 3 : 2)
#define TRULY_NOOP_TRUNCATION(OUTPREC, INPREC) 1
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Patch, avr] Provide correct memory move costs
2015-12-16 7:09 [Patch, avr] Provide correct memory move costs Senthil Kumar Selvaraj
@ 2015-12-17 16:56 ` Denis Chertykov
0 siblings, 0 replies; 2+ messages in thread
From: Denis Chertykov @ 2015-12-17 16:56 UTC (permalink / raw)
To: Senthil Kumar Selvaraj; +Cc: gcc-patches, Georg Johann Lay
2015-12-16 10:08 GMT+03:00 Senthil Kumar Selvaraj
<senthil_kumar.selvaraj@atmel.com>:
> Hi,
>
> When analyzing code size regressions for AVR for top-of-trunk, I
> found a few cases where aggresive inlining (by the middle-end)
> of functions containing calls to memcpy was bloating up the code.
>
> Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the
> original commit), when it really should be 1, as the AVRs can only
> move a single byte between reg and memory in a single instruction.
> Setting it to 4 causes the middle end to underestimate the
> cost of memcopys with a compile time constant length parameter, as it
> thinks a 4 byte copy's cost is only a single instruction.
>
> Just setting MOVE_MAX to 1 makes the middle end too conservative
> though, and causes a bunch of regression tests to fail, as lots of
> optimizations fail to pass the code size increase threshold check,
> even when not optimizing for size.
>
> Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
> target hook that tells the middle-end to use load/store insns for
> memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
> optimizing for speed, so that moves upto 4 bytes will occur through
> load/store sequences, like it does now.
>
> With this, only a couple of regression tests fail. uninit-19.c fails
> because it thinks only non-pic code won't inline a function, but the
> cost computation prevents inlining for AVRs. The test passes if
> the optimization level is increased to -O3.
>
> strlenopt-8.c has an XPASS and a FAIL because a previous pass issued
> a builtin_memcpy instead of a MEM assignment. Execution still passes.
>
> I'll continue running more tests to see if there are other performance
> related consequences.
>
> Is this ok? If ok, could someone commit please? I don't have commit
> access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2015-12-16 Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com>
>
> * config/avr/avr.h (MOVE_MAX): Set value to 1.
> (MOVE_MAX_PIECES): Define.
> (MOVE_RATIO): Define.
> * config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
> Provide target hook.
> (avr_use_by_pieces_infrastructure_p): New function.
Committed.
Denis.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-12-17 16:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16 7:09 [Patch, avr] Provide correct memory move costs Senthil Kumar Selvaraj
2015-12-17 16:56 ` Denis Chertykov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).