public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] reduce size penalty for including C++11 <algorithm> on x86 systems
@ 2015-10-13 17:45 Nathan Froyd
  2015-10-13 18:27 ` Jonathan Wakely
  0 siblings, 1 reply; 2+ messages in thread
From: Nathan Froyd @ 2015-10-13 17:45 UTC (permalink / raw)
  To: gcc-patches, libstdc++

From: Nathan Froyd <froydnj@gmail.com>

Including <algorithm> in C++11 mode (typically done for
std::{min,max,swap}) includes <random>, for
std::uniform_int_distribution.  On x86 platforms, <random> manages to
drag in <x86intrin.h> through x86's opt_random.h header, and
<x86intrin.h> has gotten rather large recently with the addition of AVX
intrinsics.  The comparison between C++03 mode and C++11 mode is not
quite exact, but it gives an idea of the penalty we're talking about
here:

froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++11 | wc
  53460  127553 1401268
froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++03 | wc
   9202   18933  218189

That's approximately a 7x penalty in C++11 mode (granted, C++11 includes
more than just <x86intrin.h>) with GCC 4.9.2 on a Debian system; current
mainline is somewhat worse:

froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc
  84851  210475 2369616
froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++03 | wc
   9383   19402  239676

<x86intrin.h> itself clocks in at 1.3MB+ of preprocessed text.

This patch aims to reduce that size penalty by recognizing that both of
the places that #include <x86intrin.h> do not need the full set of x86
intrinsics, but can get by with a smaller, more focused header in each
case.  <ext/random> needs only <emmintrin.h> to declare __m128i, while
x86's opt_random.h must include <pmmintrin.h> for declarations of
various intrinsic functions.

The net result is that the size of mainline's <algorithm> is significantly reduced:

froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc
  39174   88538 1015281

which seems like a win.

Bootstrapped on x86_64-pc-linux-gnu with --enable-languages=c,c++,
tested with check-target-libstdc++-v3, no regressions.  Also verified
that <algorithm> and <ext/random> pass -fsyntax-check with
-march=native (on a recent Haswell chip); if an -march=native bootstrap
is necessary, I am happy to do that if somebody instructs me in getting
everything properly set up.

OK?

-Nathan

	* config/cpu/i486/opt/bits/opt_random.h: Include pmmintrin.h instead
	of x86intrin.h, and only do so when __SSE3__
	* include/ext/random: Include emmintrin.h instead of x86intrin.h
---
 libstdc++-v3/ChangeLog                             | 6 ++++++
 libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h | 4 +++-
 libstdc++-v3/include/ext/random                    | 2 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index e3061ef..ff0b048 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,9 @@
+2015-10-13  Nathan Froyd  <froydnj@gcc.gnu.org>
+
+	* config/cpu/i486/opt/bits/opt_random.h: Include pmmintrin.h instead
+	of x86intrin.h, and only do so when __SSE3__
+	* include/ext/random: Include emmintrin.h instead of x86intrin.h
+
 2015-10-11  Joseph Myers  <joseph@codesourcery.com>
 
 	* crossconfig.m4 (GLIBCXX_CROSSCONFIG) <*-linux* | *-uclinux* |
diff --git a/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h b/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h
index 4495569..a9f6c13 100644
--- a/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h
+++ b/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h
@@ -30,7 +30,9 @@
 #ifndef _BITS_OPT_RANDOM_H
 #define _BITS_OPT_RANDOM_H 1
 
-#include <x86intrin.h>
+#ifdef __SSE3__
+#include <pmmintrin.h>
+#endif
 
 
 #pragma GCC system_header
diff --git a/libstdc++-v3/include/ext/random b/libstdc++-v3/include/ext/random
index 0bcfa4a..ba363ce 100644
--- a/libstdc++-v3/include/ext/random
+++ b/libstdc++-v3/include/ext/random
@@ -40,7 +40,7 @@
 #include <array>
 #include <ext/cmath>
 #ifdef __SSE2__
-# include <x86intrin.h>
+# include <emmintrin.h>
 #endif
 
 #if defined(_GLIBCXX_USE_C99_STDINT_TR1) && defined(UINT32_C)
-- 
2.1.4

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] reduce size penalty for including C++11 <algorithm> on x86 systems
  2015-10-13 17:45 [PATCH] reduce size penalty for including C++11 <algorithm> on x86 systems Nathan Froyd
@ 2015-10-13 18:27 ` Jonathan Wakely
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Wakely @ 2015-10-13 18:27 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: gcc-patches, libstdc++

On 13/10/15 21:44 -0400, Nathan Froyd wrote:
>Including <algorithm> in C++11 mode (typically done for
>std::{min,max,swap}) includes <random>, for
>std::uniform_int_distribution.  On x86 platforms, <random> manages to
>drag in <x86intrin.h> through x86's opt_random.h header, and
><x86intrin.h> has gotten rather large recently with the addition of AVX
>intrinsics.  The comparison between C++03 mode and C++11 mode is not
>quite exact, but it gives an idea of the penalty we're talking about
>here:
>
>froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++11 | wc
>  53460  127553 1401268
>froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++03 | wc
>   9202   18933  218189
>
>That's approximately a 7x penalty in C++11 mode (granted, C++11 includes
>more than just <x86intrin.h>) with GCC 4.9.2 on a Debian system; current
>mainline is somewhat worse:
>
>froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc
>  84851  210475 2369616
>froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++03 | wc
>   9383   19402  239676
>
><x86intrin.h> itself clocks in at 1.3MB+ of preprocessed text.

Yep, that's been bothering me for a while.

>This patch aims to reduce that size penalty by recognizing that both of
>the places that #include <x86intrin.h> do not need the full set of x86
>intrinsics, but can get by with a smaller, more focused header in each
>case.  <ext/random> needs only <emmintrin.h> to declare __m128i, while
>x86's opt_random.h must include <pmmintrin.h> for declarations of
>various intrinsic functions.
>
>The net result is that the size of mainline's <algorithm> is significantly reduced:
>
>froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc
>  39174   88538 1015281
>
>which seems like a win.

Indeed!

>Bootstrapped on x86_64-pc-linux-gnu with --enable-languages=c,c++,
>tested with check-target-libstdc++-v3, no regressions.  Also verified
>that <algorithm> and <ext/random> pass -fsyntax-check with
>-march=native (on a recent Haswell chip); if an -march=native bootstrap
>is necessary, I am happy to do that if somebody instructs me in getting
>everything properly set up.
>
>OK?

OK, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-13 18:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13 17:45 [PATCH] reduce size penalty for including C++11 <algorithm> on x86 systems Nathan Froyd
2015-10-13 18:27 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).