* [PATCH, RS6000] improve builtin expansion of memcmp for p7
@ 2016-10-06 21:12 Aaron Sawdey
2016-10-06 21:40 ` Segher Boessenkool
0 siblings, 1 reply; 2+ messages in thread
From: Aaron Sawdey @ 2016-10-06 21:12 UTC (permalink / raw)
To: gcc-patches List; +Cc: segher
[-- Attachment #1: Type: text/plain, Size: 732 bytes --]
I've improved the builtin memcmp expansion so it avoids a couple of
things that p7 and previous processors don't like. Performance on
p7 is now never worse than glibc memcmp(). Bootstrap/regtest in progress
on power7 ppc64 BE.Â
OK for trunk if testing passes?
gcc/ChangeLog:
2016-10-06 Aaron Sawdey <acsawdey@linux.vnet.ibm.com>
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
Add macro to say we can efficiently handle overlapping unaligned
loads.
* config/rs6000/rs6000.c (expand_block_compare): Avoid generating
poor code for processors older than p8.
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
[-- Attachment #2: memcmp_p7.patch3 --]
[-- Type: text/x-patch, Size: 2648 bytes --]
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 240816)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -18687,6 +18687,14 @@
if (bytes <= 0)
return true;
+ /* The code generated for p7 and older is not faster than glibc
+ memcmp if alignment is small and length is not short, so bail
+ out to avoid those conditions. */
+ if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+ && ((base_align == 1 && bytes > 16)
+ || (base_align == 2 && bytes > 32)))
+ return false;
+
rtx tmp_reg_src1 = gen_reg_rtx (word_mode);
rtx tmp_reg_src2 = gen_reg_rtx (word_mode);
@@ -18736,13 +18744,18 @@
while (bytes > 0)
{
int align = compute_current_alignment (base_align, offset);
- load_mode = select_block_compare_mode(offset, bytes, align, word_mode_ok);
+ if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+ load_mode = select_block_compare_mode(offset, bytes, align,
+ word_mode_ok);
+ else
+ load_mode = select_block_compare_mode(0, bytes, align, word_mode_ok);
load_mode_size = GET_MODE_SIZE (load_mode);
if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
- else
+ else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
{
- /* Move this load back so it doesn't go past the end. */
+ /* Move this load back so it doesn't go past the end.
+ P8/P9 can do this efficiently. */
int extra_bytes = load_mode_size - bytes;
cmp_bytes = bytes;
if (extra_bytes < offset)
@@ -18752,7 +18765,12 @@
bytes = cmp_bytes;
}
}
-
+ else
+ /* P7 and earlier can't do the overlapping load trick fast,
+ so this forces a non-overlapping load and a shift to get
+ rid of the extra bytes. */
+ cmp_bytes = bytes;
+
src1 = adjust_address (orig_src1, load_mode, offset);
src2 = adjust_address (orig_src2, load_mode, offset);
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h (revision 240816)
+++ gcc/config/rs6000/rs6000.h (working copy)
@@ -603,6 +603,9 @@
&& TARGET_POWERPC64)
#define TARGET_VEXTRACTUB (TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \
&& TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
+/* This wants to be set for p8 and newer. On p7, overlapping unaligned
+ loads are slow. */
+#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
/* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features. TImode syncs need quad
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH, RS6000] improve builtin expansion of memcmp for p7
2016-10-06 21:12 [PATCH, RS6000] improve builtin expansion of memcmp for p7 Aaron Sawdey
@ 2016-10-06 21:40 ` Segher Boessenkool
0 siblings, 0 replies; 2+ messages in thread
From: Segher Boessenkool @ 2016-10-06 21:40 UTC (permalink / raw)
To: Aaron Sawdey; +Cc: gcc-patches List
Hi Aaron,
On Thu, Oct 06, 2016 at 04:12:31PM -0500, Aaron Sawdey wrote:
> I've improved the builtin memcmp expansion so it avoids a couple of
> things that p7 and previous processors don't like. Performance on
> p7 is now never worse than glibc memcmp(). Bootstrap/regtest in progress
> on power7 ppc64 BE.Â
>
> OK for trunk if testing passes?
Okay, thanks. Just a few formatting nits...
> 2016-10-06 Aaron Sawdey <acsawdey@linux.vnet.ibm.com>
>
> * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
Needs a colon at the end of line here.
> Add macro to say we can efficiently handle overlapping unaligned
> loads.
> @@ -18736,13 +18744,18 @@
> while (bytes > 0)
> {
> int align = compute_current_alignment (base_align, offset);
> - load_mode = select_block_compare_mode(offset, bytes, align, word_mode_ok);
> + if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
> + load_mode = select_block_compare_mode(offset, bytes, align,
> + word_mode_ok);
Space before paren.
Thanks,
Segher
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-10-06 21:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06 21:12 [PATCH, RS6000] improve builtin expansion of memcmp for p7 Aaron Sawdey
2016-10-06 21:40 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).