From: "Ondřej Bílka" <neleai@seznam.cz>
To: "Ryan S. Arnold" <ryan.arnold@gmail.com>
Cc: Siddhesh Poyarekar <siddhesh@redhat.com>,
Carlos O'Donell <carlos@redhat.com>,
Will Newton <will.newton@linaro.org>,
"libc-ports@sourceware.org" <libc-ports@sourceware.org>,
Patch Tracking <patches@linaro.org>
Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
Date: Thu, 05 Sep 2013 11:07:00 -0000 [thread overview]
Message-ID: <20130905110657.GB5401@domone.kolej.mff.cuni.cz> (raw)
In-Reply-To: <CAAKybw87cyx67bpX=qjedrfjKxDmtgOfi_zCiaCfHGgx328Bsw@mail.gmail.com>
On Wed, Sep 04, 2013 at 12:35:46PM -0500, Ryan S. Arnold wrote:
> On Wed, Sep 4, 2013 at 2:30 AM, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> > 3. Provide acceptable performance for unaligned sizes without
> > penalizing the aligned case
>
> There are cases where the user can't control the alignment of the data
> being fed into string functions, and we shouldn't penalize them for
> these situations if possible, but in reality if a string routine shows
> up hot in a profile this is a likely culprit and there's not much that
> can be done once the unaligned case is made as stream-lined as
> possible.
>
> Simply testing for alignment (not presuming aligned data) itself slows
> down the processing of aligned-data, but that's an unavoidable
> reality.
How expensive are unaligned loads on powerpc? On x64 a penalty for
using them is smaller than alternatives(increased branch
misprediction...)
> I've chatted with some compiler folks about the possibility
> of branching directly to aligned case labels in string routines if the
> compiler is able to detect aligned data.. but was informed that this
> suggestion might get me burned at the stake.
>
You would need to improve gcc detection of alignments first. Now gcc
misses most of opportunities, even in following code gcc issues
retundant alignment checks:
#include <stdint.h>
char *foo(long *x){
if (((uintptr_t)x)%16)
return x+4;
else {
__builtin_memset(x,0,512);
return x;
}
}
If gcc guys fix that then we do not have to ask them anything. We could
just change headers to recognize aligned case like
#define strchr(x,c) ({ char *__x=x;\
if (__builtin_constant_p(((uintptr_t)__x)%16) && !((uintptr_t)__x)%16)\
strchr_aligned(__x,c);\
else\
strchr(__x,c);\
})
> > 4. Measure the effect of dcache pressure on function performance
> > 5. Measure effect of icache pressure on function performance.
> >
> > Depending on the actual cost of cache misses on different processors,
> > the icache/dcache miss cost would either have higher or lower weight
> > but for 1-3, I'd go in that order of priorities with little concern
> > for unaligned cases.
>
> I know that icache and dcache miss penalty/costs are known for most
> architectures but not whether they're "published". I suppose we can,
> at least, encourage developers for the CPU manufacturers to indicate
> in the documentation of preconditions which is more expensive,
> relative to the other if they're unable to indicate the exact costs of
> these misses.
>
These cost are relatively difficult to describe, take strlen on main
memory as example.
http://kam.mff.cuni.cz/~ondra/benchmark_string/i7_ivy_bridge/strlen_profile/results_rand_nocache/result.html
Here we see hardware prefetcher in action. A time goes linearly with
size until 512 bytes and remains constant until 4096 bytes(switch to
block view) where it starts increasing at slower rate.
For core2 shape is similar except that plateau starts at 256 bytes and
ends at 1024 bytes.
http://kam.mff.cuni.cz/~ondra/benchmark_string/core2/strlen_profile/results_rand_nocache/result.html
AMD processors are different, phenomII performance is line, and for fx10
there is even area where time decreases with size.
http://kam.mff.cuni.cz/~ondra/benchmark_string/phenomII/strlen_profile/results_rand_nocache/result.html
http://kam.mff.cuni.cz/~ondra/benchmark_string/fx10/strlen_profile/results_rand_nocache/result.html
next prev parent reply other threads:[~2013-09-05 11:07 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-12 7:55 Will Newton
2013-08-27 7:46 ` Will Newton
2013-08-30 17:14 ` Carlos O'Donell
2013-08-30 18:48 ` Will Newton
2013-08-30 19:26 ` Carlos O'Donell
2013-09-02 14:18 ` Will Newton
2013-09-03 16:14 ` Carlos O'Donell
[not found] ` <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw@mail.gmail.com>
2013-09-02 14:18 ` benchmark improvements (Was: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.) Siddhesh Poyarekar
2013-09-03 13:46 ` Will Newton
2013-09-03 17:48 ` Ondřej Bílka
2013-09-02 19:57 ` [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance Ondřej Bílka
2013-09-03 16:18 ` Carlos O'Donell
2013-09-03 17:37 ` Ondřej Bílka
2013-09-03 17:52 ` Carlos O'Donell
2013-09-03 18:57 ` Ondřej Bílka
2013-09-03 19:15 ` Carlos O'Donell
2013-09-04 7:27 ` Siddhesh Poyarekar
2013-09-04 11:03 ` Ondřej Bílka
2013-09-04 11:43 ` Siddhesh Poyarekar
2013-09-04 17:37 ` Ryan S. Arnold
2013-09-05 8:04 ` Ondřej Bílka
2013-09-04 15:30 ` Carlos O'Donell
2013-09-04 17:35 ` Ryan S. Arnold
2013-09-05 11:07 ` Ondřej Bílka [this message]
2013-09-05 11:54 ` Joseph S. Myers
2013-09-03 19:34 ` Ryan S. Arnold
2013-09-07 11:55 ` Ondřej Bílka
2013-09-03 19:31 ` Ryan S. Arnold
2013-09-03 19:54 ` Carlos O'Donell
2013-09-03 20:56 ` Ryan S. Arnold
2013-09-03 23:29 ` Ondřej Bílka
2013-09-03 23:31 ` Carlos O'Donell
2013-09-03 22:27 ` Ondřej Bílka
2013-08-29 23:58 ` Joseph S. Myers
2013-08-30 14:56 ` Will Newton
2013-08-30 15:18 ` Joseph S. Myers
2013-08-30 18:46 ` Will Newton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130905110657.GB5401@domone.kolej.mff.cuni.cz \
--to=neleai@seznam.cz \
--cc=carlos@redhat.com \
--cc=libc-ports@sourceware.org \
--cc=patches@linaro.org \
--cc=ryan.arnold@gmail.com \
--cc=siddhesh@redhat.com \
--cc=will.newton@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).