public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/16830] New: memset performance regression
@ 2014-04-10 16:04 schwab@linux-m68k.org
2014-04-28 8:12 ` [Bug libc/16830] " neleai at seznam dot cz
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: schwab@linux-m68k.org @ 2014-04-10 16:04 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
Bug ID: 16830
Summary: memset performance regression
Product: glibc
Version: 2.18
Status: NEW
Severity: normal
Priority: P2
Component: libc
Assignee: unassigned at sourceware dot org
Reporter: schwab@linux-m68k.org
CC: drepper.fsp at gmail dot com, neleai at seznam dot cz
Host: x86_64-*-*
https://java.net/projects/libmicro
usecs/call usecs/call
2.11 2.18
Intel Harpertown Socket 771
Unit memset_10m 3743.0016 ( 0.00%) 5447.0912 (-45.53%)
Unit memsetP2_10m 7541.9904 ( 0.00%) 11094.2976 (-47.10%)
Xeon(R) CPU E5504 Gainestown
Unit memset_10m 1147.8016 ( 0.00%) 1702.6816 (-48.34%)
Unit memsetP2_10m 2058.1120 ( 0.00%) 2864.3072 (-39.17%)
Opteron 270
Unit memset_10m 2112.1584 ( 0.00%) 4910.7742 (-132.50%)
Unit memsetP2_10m 2145.8658 ( 0.00%) 4957.2592 (-131.01%)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
@ 2014-04-28 8:12 ` neleai at seznam dot cz
2014-05-21 12:55 ` matz at suse dot de
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: neleai at seznam dot cz @ 2014-04-28 8:12 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
--- Comment #1 from Ondrej Bilka <neleai at seznam dot cz> ---
Thats because I did not add nontemporal loops for large sizes yet. Will send a
patch.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
2014-04-28 8:12 ` [Bug libc/16830] " neleai at seznam dot cz
@ 2014-05-21 12:55 ` matz at suse dot de
2014-05-21 14:21 ` matz at suse dot de
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: matz at suse dot de @ 2014-05-21 12:55 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
Michael Matz <matz at suse dot de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at suse dot de
--- Comment #2 from Michael Matz <matz at suse dot de> ---
(In reply to Ondrej Bilka from comment #1)
> Thats because I did not add nontemporal loops for large sizes yet. Will send
> a patch.
Any progress already?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
2014-04-28 8:12 ` [Bug libc/16830] " neleai at seznam dot cz
2014-05-21 12:55 ` matz at suse dot de
@ 2014-05-21 14:21 ` matz at suse dot de
2014-06-12 19:43 ` fweimer at redhat dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: matz at suse dot de @ 2014-05-21 14:21 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
--- Comment #3 from Michael Matz <matz at suse dot de> ---
Created attachment 7611
--> https://sourceware.org/bugzilla/attachment.cgi?id=7611&action=edit
Patch adding non-temporal stores
This patch adds non-temporal stores for large blocksizes. On my machine
(an Opteron) with libmicro benchmark "memset -s 10m" I get:
glibc 2.17:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 2424.64635 97 0 20 10485760
glibc 2.19:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 3539.25120 97 0 20 10485760
glibc 2.19 with patch:
prc thr usecs/call samples errors cnt/samp size
memset 1 1 2524.34610 102 0 20 10485760
So it's indeed the non-temporal stores that improve performance. It's still
a bit slower than it once was, but much more reasonable. The old code
filled 128 bytes per loop iteration, the new code only 64, which might explain
the last little difference. So the real patch should also use a 128 byte
loop.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug libc/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
` (2 preceding siblings ...)
2014-05-21 14:21 ` matz at suse dot de
@ 2014-06-12 19:43 ` fweimer at redhat dot com
2015-08-27 22:21 ` [Bug string/16830] " jsm28 at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2014-06-12 19:43 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug string/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
` (3 preceding siblings ...)
2014-06-12 19:43 ` fweimer at redhat dot com
@ 2015-08-27 22:21 ` jsm28 at gcc dot gnu.org
2015-10-16 3:55 ` cfester at tmriusa dot com
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2015-08-27 22:21 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
Joseph Myers <jsm28 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|libc |string
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug string/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
` (4 preceding siblings ...)
2015-08-27 22:21 ` [Bug string/16830] " jsm28 at gcc dot gnu.org
@ 2015-10-16 3:55 ` cfester at tmriusa dot com
2015-10-16 4:39 ` cfester at tmriusa dot com
2024-05-09 19:16 ` hjl.tools at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: cfester at tmriusa dot com @ 2015-10-16 3:55 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
Chris Fester <cfester at tmriusa dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |cfester at tmriusa dot com
--- Comment #4 from Chris Fester <cfester at tmriusa dot com> ---
Created attachment 8725
--> https://sourceware.org/bugzilla/attachment.cgi?id=8725&action=edit
test illustrating differences in speed between movdqa and movntdq
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug string/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
` (5 preceding siblings ...)
2015-10-16 3:55 ` cfester at tmriusa dot com
@ 2015-10-16 4:39 ` cfester at tmriusa dot com
2024-05-09 19:16 ` hjl.tools at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: cfester at tmriusa dot com @ 2015-10-16 4:39 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
--- Comment #5 from Chris Fester <cfester at tmriusa dot com> ---
We also have seen a performance regression in memset. Our previous
system software used eglibc 2.15. When we were evaluating Yocto 1.6, we
started using eglibc 2.19 and saw the problem. We continued to see it with
glibc 2.21 when using Yocto 1.8.
Our system software frequently uses memset to initialize large areas of
memory. We break up the memory area into chunks and dedicate all 16 cores
to memset-ing a chunk. This likely thrashes the caches quite a bit,
although I'll admit I haven't looked at any performance counters.
For Yocto 1.6 we patched eglibc to revert to the unrolled loop version of
memset.S. For Yocto 1.8, we actually produced a patch strikingly similar
to Michael Matz's patch to fix the regression.
Some hardware details:
CPU - 2 Sandy Bridge based Xeon CPUs, 16 cores total
Memory - 32 GB on each node
I attached a c source file to do some cycle counting for memset with
multiple threads. If I have time tomorrow I will attach an ODS spreadsheet
with a graph illustrating the data we collected, as well as the patch we're
currently using to work around the problem.
Please let us know how we can help with this issue.
Thanks,
Chris Fester
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug string/16830] memset performance regression
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
` (6 preceding siblings ...)
2015-10-16 4:39 ` cfester at tmriusa dot com
@ 2024-05-09 19:16 ` hjl.tools at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: hjl.tools at gmail dot com @ 2024-05-09 19:16 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=16830
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl.tools at gmail dot com,
| |skpgkp2 at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-09 19:16 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 16:04 [Bug libc/16830] New: memset performance regression schwab@linux-m68k.org
2014-04-28 8:12 ` [Bug libc/16830] " neleai at seznam dot cz
2014-05-21 12:55 ` matz at suse dot de
2014-05-21 14:21 ` matz at suse dot de
2014-06-12 19:43 ` fweimer at redhat dot com
2015-08-27 22:21 ` [Bug string/16830] " jsm28 at gcc dot gnu.org
2015-10-16 3:55 ` cfester at tmriusa dot com
2015-10-16 4:39 ` cfester at tmriusa dot com
2024-05-09 19:16 ` hjl.tools at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).