public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
From: Maxim Kuvyrkov <maxim@codesourcery.com>
To: Steve Ellcey <sellcey@mips.com>
Cc: Andrew T Pinski <pinskia@gmail.com>,
	"Joseph S. Myers"	<joseph@codesourcery.com>,
	<libc-ports@sourceware.org>
Subject: Re: [PATCH] Optimize MIPS memcpy
Date: Mon, 08 Oct 2012 22:31:00 -0000	[thread overview]
Message-ID: <954E9625-0D5C-4295-9229-C16A3F5C200D@codesourcery.com> (raw)
In-Reply-To: <1349715796.30194.131.camel@ubuntu-sellcey>

On 9/10/2012, at 6:03 AM, Steve Ellcey wrote:

> On Sat, 2012-10-06 at 17:43 +1300, Maxim Kuvyrkov wrote:
> 
>> Steve and I have debugged these failures and they now seem to be resolved.  I'll let Steve to followup with analysis and a new patch.
>> 
>> Meanwhile, I've benchmarked Steve's patch against mine.  On the benchmark that I use both implementations provide equal performance for N64 ABI, but on N32 ABI Steve's patch is only half as fast.  This is, probably, due to using 4-byte operations instead of 8-byte operations for N32 ABI:
>> 
>> #if _MIPS_SIM == _ABI64
>> #define USE_DOUBLE
>> #endif
>> 
>> It should be easy to improve Steve's patch for N32 ABI.  Steve, will you look into that?
>> 
>> I would also appreciate if you look into making your version of memcpy memmove-safe, if it is not already.
>> 
>> Thank you,
>> 
>> --
>> Maxim Kuvyrkov
>> CodeSourcery / Mentor Graphics
> 
> Maxim, do you know if your test is doing a memcpy on overlapping memory?
> While our analysis showed that the problem was due to the use of the
> 'prepare to store' prefetch hint, the code I sent earlier should have
> worked fine for any code that was not doing an overlapping memcpy.

The test does not use overlapping memcpy.

> 
> For anyone who may be interested, the 'prepare for store' prefetch hint
> is different then other 'safe' prefetches which can be executed or
> ignored without affecting the results of the code being executed. 
> 
> Instead of bringing a chunk of memory into the cache, it simply
> allocates a line of cache for use and zeros it out.  If you write to
> every byte of that line of cache, you are OK.  But if you use the
> 'prepare to store' cache hint and do not write to the entire cache line
> then the bytes you don't write to get written back to memory as zeros,
> overwriting whatever was there before.  The code in my memcpy routine
> accounts for this, by checking the length of the buffer before doing the
> 'prepare to store' prefetches and only using them when it knows that it
> is going to write to the entire cache line.

Can there be a bug in logic that decides that a prepare-for-store prefetch is safe?

I've checked documentation for XLP (which is the target I'm using for testing) and it specifies 32-byte prefetch.

> 
> The other issue though is if the source and destination of the memcpy
> overlap and if you use the prepare to store prefetch on a memory address
> that is also part of the source of the memcpy you will get incorrect
> results.  That means that if we want to have memcpy be 'memmove-safe'
> we cannot use the 'prepare to store' hint.

I don't think this is a concern.  Memmove will use memcpy only if the memory locations don't overlap.  And for the record's sake, I'm testing without the memcpy-in-memmove patch.

> 
> I will fix the code to use double loads and stores with the N32 ABI
> and add comments about the 'prepare to store' hint.  I hate to give up
> on using the 'prepare for store' prefetch hint, since it does result in
> the best peformance,  but given the various issues maybe it is not the
> best idea for glibc.

I too want to keep prepare-for-store prefetches is possible.  For debugging purposes you could amend prepare-for-store prefetch macros to trigger a loop that would unconditionally clobber memory locations that prepare-for-store is expected to zero-out.  Or add some other assertions to help out with debugging.

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


  reply	other threads:[~2012-10-08 22:31 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-01  6:16 Maxim Kuvyrkov
2012-09-01 16:37 ` Joseph S. Myers
2012-09-03  9:12 ` Andrew T Pinski
2012-09-03 17:12   ` Maxim Kuvyrkov
2012-09-04 15:09   ` Steve Ellcey
2012-09-04 15:14     ` Carlos O'Donell
2012-09-04 17:03       ` Steve Ellcey
2012-09-04 17:28         ` Carlos O'Donell
2012-09-05  0:43     ` Maxim Kuvyrkov
2012-09-06 16:25       ` Steve Ellcey
2012-09-06 18:43         ` Roland McGrath
2012-09-06 19:37           ` Steve Ellcey
2012-09-07 21:24         ` Maxim Kuvyrkov
2012-09-11  4:35         ` Maxim Kuvyrkov
2012-09-11 15:18           ` Steve Ellcey
2012-09-20  9:05             ` Maxim Kuvyrkov
2012-09-20 18:38               ` Steve Ellcey
2012-09-28  3:48                 ` Maxim Kuvyrkov
2012-10-06  4:43                   ` Maxim Kuvyrkov
2012-10-08 17:04                     ` Steve Ellcey
2012-10-08 22:31                       ` Maxim Kuvyrkov [this message]
2012-10-09 20:50                         ` Steve Ellcey
2012-10-15 17:49                         ` Steve Ellcey
2012-10-15 20:20                           ` Andrew Pinski
2012-10-15 20:34                             ` Steve Ellcey
2012-10-15 20:42                               ` Andrew Pinski
2012-10-15 20:50                                 ` Andrew Pinski
2012-10-15 21:36                                   ` Steve Ellcey
2012-10-15 21:47                                     ` Maxim Kuvyrkov
2012-10-17 17:30                                       ` Steve Ellcey
2012-10-29 18:00                                         ` Steve Ellcey
2012-10-29 18:03                                           ` Maxim Kuvyrkov
2012-10-30  7:16                                           ` Maxim Kuvyrkov
2012-10-30  7:19                                             ` Maxim Kuvyrkov
2012-10-30 17:46                                             ` Steve Ellcey
2012-10-30 21:56                                               ` Maxim Kuvyrkov
2012-10-30 22:19                                                 ` Steve Ellcey
2012-12-19  1:51                                                   ` Maxim Kuvyrkov
2012-12-19 16:59                                                     ` Steve Ellcey
2012-10-31 19:27                                         ` Andreas Jaeger
2012-10-31 20:04                                           ` Steve Ellcey
2012-10-15 22:10                                     ` Joseph S. Myers
2012-10-15 21:29                               ` Maciej W. Rozycki
2012-10-15 22:05                           ` Maxim Kuvyrkov
2012-09-21 18:47               ` Steve Ellcey
2012-09-21 18:57                 ` Joseph S. Myers
2012-09-21 20:41                   ` [PATCH] Optimize MIPS memcpy (mips glibc test results) Steve Ellcey
2012-09-21 20:49                     ` Joseph S. Myers
2012-09-21 20:56                       ` Steve Ellcey
2012-09-21 19:12                 ` [PATCH] Optimize MIPS memcpy Maxim Kuvyrkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=954E9625-0D5C-4295-9229-C16A3F5C200D@codesourcery.com \
    --to=maxim@codesourcery.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-ports@sourceware.org \
    --cc=pinskia@gmail.com \
    --cc=sellcey@mips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).