public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Miller <davem@davemloft.net>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Speed up LEX line cleaning a bit...
Date: Sun, 14 Mar 2010 11:48:00 -0000	[thread overview]
Message-ID: <20100314.002434.98876488.davem@davemloft.net> (raw)


I was watching a symbol profile of a gcc bootstrap using 'perf' and
noticed the usual libcpp symbols hanging around the top of the profile
during certain phases.

I noticed that the _cpp_clean_line fast path works a byte at a time,
comparing that byte against four character values, and it loops until
one of them matches.

This can be vectorized to work a word at a time by using the classic
Alan Mycroft 'find zero byte in word' scheme, after xor'ing each word
we read with these comparison characters replicated into the full
word.  This is similar to how memchr() and strchr() are implemented in
many C libraries.

We can work with any alignment by rewinding the string pointer
to be aligned, and computing a mask which we force into the
alignment bytes so that the comparison fails on them.  This only
needs to be done on the first word.

So it's like a parallel strchr() against 4 different character values
instead of just one.

I just tossed this together and made sure it works, I haven't done any
real substantial performance analysis.  And of course one would need
to do this before even beginning to consider committing something like
this :-)

In particular I am a bit concerned that 4 byte or shorter lines might
make up a substantial portion of the lines CPP sees overall.  The
benefit of the vectorized loop only kicks in for longer lines.

I also noticed that the line cleaner unconditionally stores a '\n' at
the end of the line, even if that was the character which was already
there.  That is the common case (except on DOS I guess), and I wonder
how much excessive cpu store buffer and dirty cache line activity we
generate because of this?

I'll try to find some time to conditionalize that "*d = '\n';" store
and see if it helps any.

libcpp/

2010-03-13  David S. Miller  <davem@davemloft.net>

	* configure.ac: Add AC_C_BIGENDIAN.
	* configure: Rebuild.
	* config.in: Likewise.
	* lex.c (REPLICATE): Define.
	(clean_xor_1, clean_xor_2, clean_xor_3, clean_xor_4): New
	constants.
	(clean_line_fast): New function.  Fast scanning using Alan
	Mycroft's strlen word at a time comparisons.
	(_cpp_clean_line): Call it.

diff --git a/libcpp/config.in b/libcpp/config.in
index 4c71ec3..7dd980b 100644
--- a/libcpp/config.in
+++ b/libcpp/config.in
@@ -266,6 +266,10 @@
 /* Define to 1 if your <sys/time.h> declares `struct tm'. */
 #undef TM_IN_SYS_TIME
 
+/* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most
+   significant byte first (like Motorola and SPARC, unlike Intel). */
+#undef WORDS_BIGENDIAN
+
 /* Define to empty if `const' does not conform to ANSI C. */
 #undef const
 
diff --git a/libcpp/configure b/libcpp/configure
index 00cbd7a..fa40e48 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -1846,6 +1846,48 @@ fi
 
 } # ac_fn_cxx_check_header_mongrel
 
+# ac_fn_cxx_try_run LINENO
+# ------------------------
+# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
+# that executables *can* be run.
+ac_fn_cxx_try_run ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_link"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_link") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
+  { { case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_try") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: program exited with status $ac_status" >&5
+       $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+       ac_retval=$ac_status
+fi
+  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_cxx_try_run
+
 # ac_fn_cxx_try_link LINENO
 # -------------------------
 # Try to link conftest.$ac_ext, and return whether this succeeded.
@@ -1946,48 +1988,6 @@ $as_echo "$ac_res" >&6; }
 
 } # ac_fn_cxx_check_type
 
-# ac_fn_cxx_try_run LINENO
-# ------------------------
-# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
-# that executables *can* be run.
-ac_fn_cxx_try_run ()
-{
-  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
-  if { { ac_try="$ac_link"
-case "(($ac_try" in
-  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
-  *) ac_try_echo=$ac_try;;
-esac
-eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
-$as_echo "$ac_try_echo"; } >&5
-  (eval "$ac_link") 2>&5
-  ac_status=$?
-  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
-  { { case "(($ac_try" in
-  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
-  *) ac_try_echo=$ac_try;;
-esac
-eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
-$as_echo "$ac_try_echo"; } >&5
-  (eval "$ac_try") 2>&5
-  ac_status=$?
-  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }; }; then :
-  ac_retval=0
-else
-  $as_echo "$as_me: program exited with status $ac_status" >&5
-       $as_echo "$as_me: failed program was:" >&5
-sed 's/^/| /' conftest.$ac_ext >&5
-
-       ac_retval=$ac_status
-fi
-  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
-  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
-  return $ac_retval
-
-} # ac_fn_cxx_try_run
-
 # ac_fn_cxx_compute_int LINENO EXPR VAR INCLUDES
 # ----------------------------------------------
 # Tries to find the compile-time value of EXPR in a program that includes
@@ -5144,6 +5144,230 @@ done
 fi
 
 # Checks for typedefs, structures, and compiler characteristics.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether byte ordering is bigendian" >&5
+$as_echo_n "checking whether byte ordering is bigendian... " >&6; }
+if test "${ac_cv_c_bigendian+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_cv_c_bigendian=unknown
+    # See if we're dealing with a universal compiler.
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifndef __APPLE_CC__
+	       not a universal capable compiler
+	     #endif
+	     typedef int dummy;
+
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+
+	# Check for potential -arch flags.  It is not universal unless
+	# there are at least two -arch flags with different values.
+	ac_arch=
+	ac_prev=
+	for ac_word in $CC $CFLAGS $CPPFLAGS $LDFLAGS; do
+	 if test -n "$ac_prev"; then
+	   case $ac_word in
+	     i?86 | x86_64 | ppc | ppc64)
+	       if test -z "$ac_arch" || test "$ac_arch" = "$ac_word"; then
+		 ac_arch=$ac_word
+	       else
+		 ac_cv_c_bigendian=universal
+		 break
+	       fi
+	       ;;
+	   esac
+	   ac_prev=
+	 elif test "x$ac_word" = "x-arch"; then
+	   ac_prev=arch
+	 fi
+       done
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    if test $ac_cv_c_bigendian = unknown; then
+      # See if sys/param.h defines the BYTE_ORDER macro.
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+	     #include <sys/param.h>
+
+int
+main ()
+{
+#if ! (defined BYTE_ORDER && defined BIG_ENDIAN \
+		     && defined LITTLE_ENDIAN && BYTE_ORDER && BIG_ENDIAN \
+		     && LITTLE_ENDIAN)
+	      bogus endian macros
+	     #endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  # It does; now see whether it defined to BIG_ENDIAN or not.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+		#include <sys/param.h>
+
+int
+main ()
+{
+#if BYTE_ORDER != BIG_ENDIAN
+		 not big endian
+		#endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  ac_cv_c_bigendian=yes
+else
+  ac_cv_c_bigendian=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    fi
+    if test $ac_cv_c_bigendian = unknown; then
+      # See if <limits.h> defines _LITTLE_ENDIAN or _BIG_ENDIAN (e.g., Solaris).
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <limits.h>
+
+int
+main ()
+{
+#if ! (defined _LITTLE_ENDIAN || defined _BIG_ENDIAN)
+	      bogus endian macros
+	     #endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  # It does; now see whether it defined to _BIG_ENDIAN or not.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <limits.h>
+
+int
+main ()
+{
+#ifndef _BIG_ENDIAN
+		 not big endian
+		#endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  ac_cv_c_bigendian=yes
+else
+  ac_cv_c_bigendian=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    fi
+    if test $ac_cv_c_bigendian = unknown; then
+      # Compile a test program.
+      if test "$cross_compiling" = yes; then :
+  # Try to guess by grepping values from an object file.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+short int ascii_mm[] =
+		  { 0x4249, 0x4765, 0x6E44, 0x6961, 0x6E53, 0x7953, 0 };
+		short int ascii_ii[] =
+		  { 0x694C, 0x5454, 0x656C, 0x6E45, 0x6944, 0x6E61, 0 };
+		int use_ascii (int i) {
+		  return ascii_mm[i] + ascii_ii[i];
+		}
+		short int ebcdic_ii[] =
+		  { 0x89D3, 0xE3E3, 0x8593, 0x95C5, 0x89C4, 0x9581, 0 };
+		short int ebcdic_mm[] =
+		  { 0xC2C9, 0xC785, 0x95C4, 0x8981, 0x95E2, 0xA8E2, 0 };
+		int use_ebcdic (int i) {
+		  return ebcdic_mm[i] + ebcdic_ii[i];
+		}
+		extern int foo;
+
+int
+main ()
+{
+return use_ascii (foo) == use_ebcdic (foo);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  if grep BIGenDianSyS conftest.$ac_objext >/dev/null; then
+	      ac_cv_c_bigendian=yes
+	    fi
+	    if grep LiTTleEnDian conftest.$ac_objext >/dev/null ; then
+	      if test "$ac_cv_c_bigendian" = unknown; then
+		ac_cv_c_bigendian=no
+	      else
+		# finding both strings is unlikely to happen, but who knows?
+		ac_cv_c_bigendian=unknown
+	      fi
+	    fi
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$ac_includes_default
+int
+main ()
+{
+
+	     /* Are we little or big endian?  From Harbison&Steele.  */
+	     union
+	     {
+	       long int l;
+	       char c[sizeof (long int)];
+	     } u;
+	     u.l = 1;
+	     return u.c[sizeof (long int) - 1] == 1;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_run "$LINENO"; then :
+  ac_cv_c_bigendian=no
+else
+  ac_cv_c_bigendian=yes
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+    fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_bigendian" >&5
+$as_echo "$ac_cv_c_bigendian" >&6; }
+ case $ac_cv_c_bigendian in #(
+   yes)
+     $as_echo "#define WORDS_BIGENDIAN 1" >>confdefs.h
+;; #(
+   no)
+      ;; #(
+   universal)
+
+$as_echo "#define AC_APPLE_UNIVERSAL_BUILD 1" >>confdefs.h
+
+     ;; #(
+   *)
+     as_fn_error "unknown endianness
+ presetting ac_cv_c_bigendian=no (or yes) will help" "$LINENO" 5 ;;
+ esac
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for an ANSI C-conforming const" >&5
 $as_echo_n "checking for an ANSI C-conforming const... " >&6; }
 if test "${ac_cv_c_const+set}" = set; then :
@@ -7013,6 +7237,7 @@ LTLIBOBJS=$ac_ltlibobjs
 
 
 
+
 : ${CONFIG_STATUS=./config.status}
 ac_write_fail=0
 ac_clean_files_save=$ac_clean_files
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index d520e93..c7e2724 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -66,6 +66,7 @@ else
 fi
 
 # Checks for typedefs, structures, and compiler characteristics.
+AC_C_BIGENDIAN
 AC_C_CONST
 AC_C_INLINE
 AC_FUNC_OBSTACK
diff --git a/libcpp/lex.c b/libcpp/lex.c
index ac28f92..509ba1a 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -96,6 +96,81 @@ add_line_note (cpp_buffer *buffer, const uchar *pos, unsigned int type)
   buffer->notes_used++;
 }
 
+#define REPLICATE(x) \
+	(((x) << 24) | ((x) << 16) | ((x) << 8) | (x))
+
+const unsigned int clean_xor_1 = REPLICATE((unsigned int)'\n');
+const unsigned int clean_xor_2 = REPLICATE((unsigned int)'\r');
+const unsigned int clean_xor_3 = REPLICATE((unsigned int)'\\');
+const unsigned int clean_xor_4 = REPLICATE((unsigned int)'?');
+const unsigned int mycroft_magic_1 = 0x80808080;
+const unsigned int mycroft_magic_2 = 0x01010101;
+const unsigned int first_word_magic = 0x7f7f7f7f;
+
+static const uchar *
+clean_line_fast (const uchar *_s)
+{
+  unsigned int first_word_mask = first_word_magic;
+  unsigned long align_bits;
+  unsigned int *p, val, t;
+  const uchar *ret, *tp;
+
+  align_bits = (unsigned long) _s;
+  p = (unsigned int *) (align_bits & ~(unsigned long)3);
+  val = *p++;
+  align_bits &= 3;
+  if (align_bits)
+    {
+      if (WORDS_BIGENDIAN)
+	first_word_mask <<= ((4 - align_bits) * 8);
+      else
+	first_word_mask >>= ((4 - align_bits) * 8);
+    }
+  else
+    first_word_mask = 0;
+
+  val |= first_word_mask;
+  t = ((val ^ clean_xor_1) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_2) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_3) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_4) - mycroft_magic_2) & mycroft_magic_1;
+  while (__builtin_expect (!t, true)) {
+    val = *p++;
+    t = ((val ^ clean_xor_1) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_2) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_3) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_4) - mycroft_magic_2) & mycroft_magic_1;
+  }
+
+  ret = tp = ((const uchar *) (p - 1)) + 3;
+  if (WORDS_BIGENDIAN)
+    {
+      t = (val >> 8) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 1;
+      t = (val >> 16) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 2;
+      t = (val >> 24) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 3;
+    }
+  else
+    {
+      t = (val >> 16) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 1;
+      t = (val >> 8) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 2;
+      t = val & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 3;
+    }
+    
+  return ret;
+}
+
 /* Returns with a logical line that contains no escaped newlines or
    trigraphs.  This is a time-critical inner loop.  */
 void
@@ -115,6 +190,8 @@ _cpp_clean_line (cpp_reader *pfile)
     {
       const uchar *pbackslash = NULL;
 
+      s = clean_line_fast (s + 1) - 1;
+
       /* Short circuit for the common case of an un-escaped line with
 	 no trigraphs.  The primary win here is by not writing any
 	 data back to memory until we have to.  */

             reply	other threads:[~2010-03-14  8:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-14 11:48 David Miller [this message]
2010-03-14 16:28 ` Joseph S. Myers
2010-03-14 23:16   ` David Miller
2010-03-14 23:16     ` Joseph S. Myers
2010-03-14 23:25       ` David Miller
2010-03-14 23:34         ` Joseph S. Myers
2010-03-15  0:15           ` David Miller
2010-03-15  0:13         ` Chris Lattner
2010-03-15  0:14           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100314.002434.98876488.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).