[PATCH] Speed up LEX line cleaning a bit...

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Speed up LEX line cleaning a bit...
@ 2010-03-14 11:48 David Miller
  2010-03-14 16:28 ` Joseph S. Myers
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-03-14 11:48 UTC (permalink / raw)
  To: gcc-patches


I was watching a symbol profile of a gcc bootstrap using 'perf' and
noticed the usual libcpp symbols hanging around the top of the profile
during certain phases.

I noticed that the _cpp_clean_line fast path works a byte at a time,
comparing that byte against four character values, and it loops until
one of them matches.

This can be vectorized to work a word at a time by using the classic
Alan Mycroft 'find zero byte in word' scheme, after xor'ing each word
we read with these comparison characters replicated into the full
word.  This is similar to how memchr() and strchr() are implemented in
many C libraries.

We can work with any alignment by rewinding the string pointer
to be aligned, and computing a mask which we force into the
alignment bytes so that the comparison fails on them.  This only
needs to be done on the first word.

So it's like a parallel strchr() against 4 different character values
instead of just one.

I just tossed this together and made sure it works, I haven't done any
real substantial performance analysis.  And of course one would need
to do this before even beginning to consider committing something like
this :-)

In particular I am a bit concerned that 4 byte or shorter lines might
make up a substantial portion of the lines CPP sees overall.  The
benefit of the vectorized loop only kicks in for longer lines.

I also noticed that the line cleaner unconditionally stores a '\n' at
the end of the line, even if that was the character which was already
there.  That is the common case (except on DOS I guess), and I wonder
how much excessive cpu store buffer and dirty cache line activity we
generate because of this?

I'll try to find some time to conditionalize that "*d = '\n';" store
and see if it helps any.

libcpp/

2010-03-13  David S. Miller  <davem@davemloft.net>

	* configure.ac: Add AC_C_BIGENDIAN.
	* configure: Rebuild.
	* config.in: Likewise.
	* lex.c (REPLICATE): Define.
	(clean_xor_1, clean_xor_2, clean_xor_3, clean_xor_4): New
	constants.
	(clean_line_fast): New function.  Fast scanning using Alan
	Mycroft's strlen word at a time comparisons.
	(_cpp_clean_line): Call it.

diff --git a/libcpp/config.in b/libcpp/config.in
index 4c71ec3..7dd980b 100644
--- a/libcpp/config.in
+++ b/libcpp/config.in
@@ -266,6 +266,10 @@
 /* Define to 1 if your <sys/time.h> declares `struct tm'. */
 #undef TM_IN_SYS_TIME
 
+/* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most
+   significant byte first (like Motorola and SPARC, unlike Intel). */
+#undef WORDS_BIGENDIAN
+
 /* Define to empty if `const' does not conform to ANSI C. */
 #undef const
 
diff --git a/libcpp/configure b/libcpp/configure
index 00cbd7a..fa40e48 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -1846,6 +1846,48 @@ fi
 
 } # ac_fn_cxx_check_header_mongrel
 
+# ac_fn_cxx_try_run LINENO
+# ------------------------
+# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
+# that executables *can* be run.
+ac_fn_cxx_try_run ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { { ac_try="$ac_link"
+case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_link") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
+  { { case "(($ac_try" in
+  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+  *) ac_try_echo=$ac_try;;
+esac
+eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
+$as_echo "$ac_try_echo"; } >&5
+  (eval "$ac_try") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then :
+  ac_retval=0
+else
+  $as_echo "$as_me: program exited with status $ac_status" >&5
+       $as_echo "$as_me: failed program was:" >&5
+sed 's/^/| /' conftest.$ac_ext >&5
+
+       ac_retval=$ac_status
+fi
+  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
+  return $ac_retval
+
+} # ac_fn_cxx_try_run
+
 # ac_fn_cxx_try_link LINENO
 # -------------------------
 # Try to link conftest.$ac_ext, and return whether this succeeded.
@@ -1946,48 +1988,6 @@ $as_echo "$ac_res" >&6; }
 
 } # ac_fn_cxx_check_type
 
-# ac_fn_cxx_try_run LINENO
-# ------------------------
-# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes
-# that executables *can* be run.
-ac_fn_cxx_try_run ()
-{
-  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
-  if { { ac_try="$ac_link"
-case "(($ac_try" in
-  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
-  *) ac_try_echo=$ac_try;;
-esac
-eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
-$as_echo "$ac_try_echo"; } >&5
-  (eval "$ac_link") 2>&5
-  ac_status=$?
-  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; } && { ac_try='./conftest$ac_exeext'
-  { { case "(($ac_try" in
-  *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
-  *) ac_try_echo=$ac_try;;
-esac
-eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\""
-$as_echo "$ac_try_echo"; } >&5
-  (eval "$ac_try") 2>&5
-  ac_status=$?
-  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }; }; then :
-  ac_retval=0
-else
-  $as_echo "$as_me: program exited with status $ac_status" >&5
-       $as_echo "$as_me: failed program was:" >&5
-sed 's/^/| /' conftest.$ac_ext >&5
-
-       ac_retval=$ac_status
-fi
-  rm -rf conftest.dSYM conftest_ipa8_conftest.oo
-  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;}
-  return $ac_retval
-
-} # ac_fn_cxx_try_run
-
 # ac_fn_cxx_compute_int LINENO EXPR VAR INCLUDES
 # ----------------------------------------------
 # Tries to find the compile-time value of EXPR in a program that includes
@@ -5144,6 +5144,230 @@ done
 fi
 
 # Checks for typedefs, structures, and compiler characteristics.
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether byte ordering is bigendian" >&5
+$as_echo_n "checking whether byte ordering is bigendian... " >&6; }
+if test "${ac_cv_c_bigendian+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_cv_c_bigendian=unknown
+    # See if we're dealing with a universal compiler.
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#ifndef __APPLE_CC__
+	       not a universal capable compiler
+	     #endif
+	     typedef int dummy;
+
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+
+	# Check for potential -arch flags.  It is not universal unless
+	# there are at least two -arch flags with different values.
+	ac_arch=
+	ac_prev=
+	for ac_word in $CC $CFLAGS $CPPFLAGS $LDFLAGS; do
+	 if test -n "$ac_prev"; then
+	   case $ac_word in
+	     i?86 | x86_64 | ppc | ppc64)
+	       if test -z "$ac_arch" || test "$ac_arch" = "$ac_word"; then
+		 ac_arch=$ac_word
+	       else
+		 ac_cv_c_bigendian=universal
+		 break
+	       fi
+	       ;;
+	   esac
+	   ac_prev=
+	 elif test "x$ac_word" = "x-arch"; then
+	   ac_prev=arch
+	 fi
+       done
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    if test $ac_cv_c_bigendian = unknown; then
+      # See if sys/param.h defines the BYTE_ORDER macro.
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+	     #include <sys/param.h>
+
+int
+main ()
+{
+#if ! (defined BYTE_ORDER && defined BIG_ENDIAN \
+		     && defined LITTLE_ENDIAN && BYTE_ORDER && BIG_ENDIAN \
+		     && LITTLE_ENDIAN)
+	      bogus endian macros
+	     #endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  # It does; now see whether it defined to BIG_ENDIAN or not.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <sys/types.h>
+		#include <sys/param.h>
+
+int
+main ()
+{
+#if BYTE_ORDER != BIG_ENDIAN
+		 not big endian
+		#endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  ac_cv_c_bigendian=yes
+else
+  ac_cv_c_bigendian=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    fi
+    if test $ac_cv_c_bigendian = unknown; then
+      # See if <limits.h> defines _LITTLE_ENDIAN or _BIG_ENDIAN (e.g., Solaris).
+      cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <limits.h>
+
+int
+main ()
+{
+#if ! (defined _LITTLE_ENDIAN || defined _BIG_ENDIAN)
+	      bogus endian macros
+	     #endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  # It does; now see whether it defined to _BIG_ENDIAN or not.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <limits.h>
+
+int
+main ()
+{
+#ifndef _BIG_ENDIAN
+		 not big endian
+		#endif
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  ac_cv_c_bigendian=yes
+else
+  ac_cv_c_bigendian=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    fi
+    if test $ac_cv_c_bigendian = unknown; then
+      # Compile a test program.
+      if test "$cross_compiling" = yes; then :
+  # Try to guess by grepping values from an object file.
+	 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+short int ascii_mm[] =
+		  { 0x4249, 0x4765, 0x6E44, 0x6961, 0x6E53, 0x7953, 0 };
+		short int ascii_ii[] =
+		  { 0x694C, 0x5454, 0x656C, 0x6E45, 0x6944, 0x6E61, 0 };
+		int use_ascii (int i) {
+		  return ascii_mm[i] + ascii_ii[i];
+		}
+		short int ebcdic_ii[] =
+		  { 0x89D3, 0xE3E3, 0x8593, 0x95C5, 0x89C4, 0x9581, 0 };
+		short int ebcdic_mm[] =
+		  { 0xC2C9, 0xC785, 0x95C4, 0x8981, 0x95E2, 0xA8E2, 0 };
+		int use_ebcdic (int i) {
+		  return ebcdic_mm[i] + ebcdic_ii[i];
+		}
+		extern int foo;
+
+int
+main ()
+{
+return use_ascii (foo) == use_ebcdic (foo);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  if grep BIGenDianSyS conftest.$ac_objext >/dev/null; then
+	      ac_cv_c_bigendian=yes
+	    fi
+	    if grep LiTTleEnDian conftest.$ac_objext >/dev/null ; then
+	      if test "$ac_cv_c_bigendian" = unknown; then
+		ac_cv_c_bigendian=no
+	      else
+		# finding both strings is unlikely to happen, but who knows?
+		ac_cv_c_bigendian=unknown
+	      fi
+	    fi
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$ac_includes_default
+int
+main ()
+{
+
+	     /* Are we little or big endian?  From Harbison&Steele.  */
+	     union
+	     {
+	       long int l;
+	       char c[sizeof (long int)];
+	     } u;
+	     u.l = 1;
+	     return u.c[sizeof (long int) - 1] == 1;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_run "$LINENO"; then :
+  ac_cv_c_bigendian=no
+else
+  ac_cv_c_bigendian=yes
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam conftest.$ac_ext
+fi
+
+    fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_bigendian" >&5
+$as_echo "$ac_cv_c_bigendian" >&6; }
+ case $ac_cv_c_bigendian in #(
+   yes)
+     $as_echo "#define WORDS_BIGENDIAN 1" >>confdefs.h
+;; #(
+   no)
+      ;; #(
+   universal)
+
+$as_echo "#define AC_APPLE_UNIVERSAL_BUILD 1" >>confdefs.h
+
+     ;; #(
+   *)
+     as_fn_error "unknown endianness
+ presetting ac_cv_c_bigendian=no (or yes) will help" "$LINENO" 5 ;;
+ esac
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for an ANSI C-conforming const" >&5
 $as_echo_n "checking for an ANSI C-conforming const... " >&6; }
 if test "${ac_cv_c_const+set}" = set; then :
@@ -7013,6 +7237,7 @@ LTLIBOBJS=$ac_ltlibobjs
 
 
 
+
 : ${CONFIG_STATUS=./config.status}
 ac_write_fail=0
 ac_clean_files_save=$ac_clean_files
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index d520e93..c7e2724 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -66,6 +66,7 @@ else
 fi
 
 # Checks for typedefs, structures, and compiler characteristics.
+AC_C_BIGENDIAN
 AC_C_CONST
 AC_C_INLINE
 AC_FUNC_OBSTACK
diff --git a/libcpp/lex.c b/libcpp/lex.c
index ac28f92..509ba1a 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -96,6 +96,81 @@ add_line_note (cpp_buffer *buffer, const uchar *pos, unsigned int type)
   buffer->notes_used++;
 }
 
+#define REPLICATE(x) \
+	(((x) << 24) | ((x) << 16) | ((x) << 8) | (x))
+
+const unsigned int clean_xor_1 = REPLICATE((unsigned int)'\n');
+const unsigned int clean_xor_2 = REPLICATE((unsigned int)'\r');
+const unsigned int clean_xor_3 = REPLICATE((unsigned int)'\\');
+const unsigned int clean_xor_4 = REPLICATE((unsigned int)'?');
+const unsigned int mycroft_magic_1 = 0x80808080;
+const unsigned int mycroft_magic_2 = 0x01010101;
+const unsigned int first_word_magic = 0x7f7f7f7f;
+
+static const uchar *
+clean_line_fast (const uchar *_s)
+{
+  unsigned int first_word_mask = first_word_magic;
+  unsigned long align_bits;
+  unsigned int *p, val, t;
+  const uchar *ret, *tp;
+
+  align_bits = (unsigned long) _s;
+  p = (unsigned int *) (align_bits & ~(unsigned long)3);
+  val = *p++;
+  align_bits &= 3;
+  if (align_bits)
+    {
+      if (WORDS_BIGENDIAN)
+	first_word_mask <<= ((4 - align_bits) * 8);
+      else
+	first_word_mask >>= ((4 - align_bits) * 8);
+    }
+  else
+    first_word_mask = 0;
+
+  val |= first_word_mask;
+  t = ((val ^ clean_xor_1) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_2) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_3) - mycroft_magic_2) & mycroft_magic_1;
+  t |= ((val ^ clean_xor_4) - mycroft_magic_2) & mycroft_magic_1;
+  while (__builtin_expect (!t, true)) {
+    val = *p++;
+    t = ((val ^ clean_xor_1) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_2) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_3) - mycroft_magic_2) & mycroft_magic_1;
+    t |= ((val ^ clean_xor_4) - mycroft_magic_2) & mycroft_magic_1;
+  }
+
+  ret = tp = ((const uchar *) (p - 1)) + 3;
+  if (WORDS_BIGENDIAN)
+    {
+      t = (val >> 8) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 1;
+      t = (val >> 16) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 2;
+      t = (val >> 24) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 3;
+    }
+  else
+    {
+      t = (val >> 16) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 1;
+      t = (val >> 8) & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 2;
+      t = val & 0xff;
+      if (t == '\n' || t == '\r' || t == '\\' || t == '?')
+	ret = tp - 3;
+    }
+    
+  return ret;
+}
+
 /* Returns with a logical line that contains no escaped newlines or
    trigraphs.  This is a time-critical inner loop.  */
 void
@@ -115,6 +190,8 @@ _cpp_clean_line (cpp_reader *pfile)
     {
       const uchar *pbackslash = NULL;
 
+      s = clean_line_fast (s + 1) - 1;
+
       /* Short circuit for the common case of an un-escaped line with
 	 no trigraphs.  The primary win here is by not writing any
 	 data back to memory until we have to.  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 11:48 [PATCH] Speed up LEX line cleaning a bit David Miller
@ 2010-03-14 16:28 ` Joseph S. Myers
  2010-03-14 23:16   ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2010-03-14 16:28 UTC (permalink / raw)
  To: David Miller; +Cc: gcc-patches

On Sun, 14 Mar 2010, David Miller wrote:

> I just tossed this together and made sure it works, I haven't done any
> real substantial performance analysis.  And of course one would need
> to do this before even beginning to consider committing something like
> this :-)
> 
> In particular I am a bit concerned that 4 byte or shorter lines might
> make up a substantial portion of the lines CPP sees overall.  The
> benefit of the vectorized loop only kicks in for longer lines.

Depending on line length, it might or might not also be beneficial to work 
8 bytes at a time on 64-bit hosts.  With vector instructions it may be 
possible to check 16 bytes at a time on some hosts.  If empty lines are 
very common it may be useful to check for initial newline before trying 
the vectorized loop.

See also Zack's ideas on speeding up _cpp_clean_line that I posted in 
<http://gcc.gnu.org/ml/gcc/2007-05/msg00741.html>.  It's not clear if they 
could be effectively combined with vectorization, or what would help 
performance more, or whether several simple passes or one more complicated 
combined pass would actually be better.

You shouldn't really need to check for backslash for every character; if 
you find a newline you could then check if the line was nonempty and what 
came before was a backslash.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 23:16   ` David Miller
@ 2010-03-14 23:16     ` Joseph S. Myers
  2010-03-14 23:25       ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2010-03-14 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: gcc-patches

On Sun, 14 Mar 2010, David Miller wrote:

> > See also Zack's ideas on speeding up _cpp_clean_line that I posted in 
> > <http://gcc.gnu.org/ml/gcc/2007-05/msg00741.html>.  It's not clear if they 
> > could be effectively combined with vectorization, or what would help 
> > performance more, or whether several simple passes or one more complicated 
> > combined pass would actually be better.
> 
> I think the state machine would prevent being able to use
> a vectorization optimization like that being described here.

I imagine it could be used for comment skipping or finding the end of a 
string, where you can skip all characters except a limited set.  But it's 
not at all clear which phases should be combined, and losing 
vectorization opportunities may be a disadvantage of combining too many.

> > you find a newline you could then check if the line was nonempty and what 
> > came before was a backslash.
> 
> Hmmm, the code seems to backtrack over any number of whitespace
> characters:
> 
> 	p = d;
> 	while (is_nvspace (p[-1]))
> 	  --p;
> 	if (p - 1 != pbackslash)
> 	  goto done;
> 
> so I don't think checking just one character behind the found newline
> would work.

I thought there was agreement to get rid of the special handling of 
whitespace between backslash and newline (see 
<http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01502.html>).  But you'd 
still need to check to give warnings, so that wouldn't actually help here.  
There are lots of places where the standard preprocessing could be done 
quicker than preprocessing with warnings, tracking line and column 
numbers, etc., can be done.

> And this would incur more loads decreasing the effectiveness of the
> vectorization, which aims to minimize the number of loads.

The idea would be to reduce the number of operations done on each word 
while processing a line, so reducing the total number of instructions in 
the inner loop.  (Additionally, backslashes are common in strings, so it 
would be good not to have to leave the vectorized loop for them.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 16:28 ` Joseph S. Myers
@ 2010-03-14 23:16   ` David Miller
  2010-03-14 23:16     ` Joseph S. Myers
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-03-14 23:16 UTC (permalink / raw)
  To: joseph; +Cc: gcc-patches

From: "Joseph S. Myers" <joseph@codesourcery.com>
Date: Sun, 14 Mar 2010 16:23:39 +0000 (UTC)

> If empty lines are very common it may be useful to check for initial
> newline before trying the vectorized loop.

I'm not too hot on this idea.  Part of the win of the vectorized code
is that we have a bit of integer setup code with which to buffer the
load latency for that critical first load.

So it's probably essentially free.

But yes, something to investigate for sure.

> See also Zack's ideas on speeding up _cpp_clean_line that I posted in 
> <http://gcc.gnu.org/ml/gcc/2007-05/msg00741.html>.  It's not clear if they 
> could be effectively combined with vectorization, or what would help 
> performance more, or whether several simple passes or one more complicated 
> combined pass would actually be better.

I think the state machine would prevent being able to use
a vectorization optimization like that being described here.

Unless you want to use a state machine that can transition
on four character at a time, which I don't think can fit in
main memory all at once :-)

> You shouldn't really need to check for backslash for every character; if 
> you find a newline you could then check if the line was nonempty and what 
> came before was a backslash.

Hmmm, the code seems to backtrack over any number of whitespace
characters:

	p = d;
	while (is_nvspace (p[-1]))
	  --p;
	if (p - 1 != pbackslash)
	  goto done;

so I don't think checking just one character behind the found newline
would work.

And this would incur more loads decreasing the effectiveness of the
vectorization, which aims to minimize the number of loads.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 23:16     ` Joseph S. Myers
@ 2010-03-14 23:25       ` David Miller
  2010-03-14 23:34         ` Joseph S. Myers
  2010-03-15  0:13         ` Chris Lattner
  0 siblings, 2 replies; 9+ messages in thread
From: David Miller @ 2010-03-14 23:25 UTC (permalink / raw)
  To: joseph; +Cc: gcc-patches

From: "Joseph S. Myers" <joseph@codesourcery.com>
Date: Sun, 14 Mar 2010 23:16:21 +0000 (UTC)

> The idea would be to reduce the number of operations done on each word 
> while processing a line, so reducing the total number of instructions in 
> the inner loop.  (Additionally, backslashes are common in strings, so it 
> would be good not to have to leave the vectorized loop for them.)

Understood, but it's really not possible here since as you note we
either need to continue supporting spaces amidst backslash escaped
newlines, or warn about it if support is removed.

I was actually surprised that this construct was supported,
to be honest.  Who will really notice if we remove it? :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 23:25       ` David Miller
@ 2010-03-14 23:34         ` Joseph S. Myers
  2010-03-15  0:15           ` David Miller
  2010-03-15  0:13         ` Chris Lattner
  1 sibling, 1 reply; 9+ messages in thread
From: Joseph S. Myers @ 2010-03-14 23:34 UTC (permalink / raw)
  To: David Miller; +Cc: gcc-patches

On Sun, 14 Mar 2010, David Miller wrote:

> From: "Joseph S. Myers" <joseph@codesourcery.com>
> Date: Sun, 14 Mar 2010 23:16:21 +0000 (UTC)
> 
> > The idea would be to reduce the number of operations done on each word 
> > while processing a line, so reducing the total number of instructions in 
> > the inner loop.  (Additionally, backslashes are common in strings, so it 
> > would be good not to have to leave the vectorized loop for them.)
> 
> Understood, but it's really not possible here since as you note we
> either need to continue supporting spaces amidst backslash escaped
> newlines, or warn about it if support is removed.

It's possible by searching backwards for a backslash or for non-whitespace 
if the vectorized loop finds \r or \n (the vectorized loop would then 
effectively return both a pointer to \r, \n or ?, and a pointer to the 
previous backslash if one found before newline).  Whether this is better 
or worse in practice than checking for backslash in the vectorized loop 
would require benchmarking to tell (just as benchmarking will be needed to 
see if the vectorization actually helps in practice on typical compiler 
input).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 23:25       ` David Miller
  2010-03-14 23:34         ` Joseph S. Myers
@ 2010-03-15  0:13         ` Chris Lattner
  2010-03-15  0:14           ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Lattner @ 2010-03-15  0:13 UTC (permalink / raw)
  To: David Miller; +Cc: joseph, gcc-patches


On Mar 14, 2010, at 4:20 PM, David Miller wrote:

> From: "Joseph S. Myers" <joseph@codesourcery.com>
> Date: Sun, 14 Mar 2010 23:16:21 +0000 (UTC)
> 
>> The idea would be to reduce the number of operations done on each word 
>> while processing a line, so reducing the total number of instructions in 
>> the inner loop.  (Additionally, backslashes are common in strings, so it 
>> would be good not to have to leave the vectorized loop for them.)
> 
> Understood, but it's really not possible here since as you note we
> either need to continue supporting spaces amidst backslash escaped
> newlines, or warn about it if support is removed.
> 
> I was actually surprised that this construct was supported,
> to be honest.  Who will really notice if we remove it? :-)

Unfortunately, a lot of real code depends on it, as horrifying as that is.

-Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-15  0:13         ` Chris Lattner
@ 2010-03-15  0:14           ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-03-15  0:14 UTC (permalink / raw)
  To: clattner; +Cc: joseph, gcc-patches

From: Chris Lattner <clattner@apple.com>
Date: Sun, 14 Mar 2010 16:34:50 -0700

> On Mar 14, 2010, at 4:20 PM, David Miller wrote:
> 
>> Understood, but it's really not possible here since as you note we
>> either need to continue supporting spaces amidst backslash escaped
>> newlines, or warn about it if support is removed.
>> 
>> I was actually surprised that this construct was supported,
>> to be honest.  Who will really notice if we remove it? :-)
> 
> Unfortunately, a lot of real code depends on it, as horrifying as that is.

Ok, fair enough, we'll have to live with this restriction.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Speed up LEX line cleaning a bit...
  2010-03-14 23:34         ` Joseph S. Myers
@ 2010-03-15  0:15           ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2010-03-15  0:15 UTC (permalink / raw)
  To: joseph; +Cc: gcc-patches

From: "Joseph S. Myers" <joseph@codesourcery.com>
Date: Sun, 14 Mar 2010 23:25:28 +0000 (UTC)

> Whether this is better or worse in practice than checking for
> backslash in the vectorized loop would require benchmarking to tell
> (just as benchmarking will be needed to see if the vectorization
> actually helps in practice on typical compiler input).

Indeed.

I'll try to do some benchmarking and analysis if I find some time.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-15  0:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-14 11:48 [PATCH] Speed up LEX line cleaning a bit David Miller
2010-03-14 16:28 ` Joseph S. Myers
2010-03-14 23:16   ` David Miller
2010-03-14 23:16     ` Joseph S. Myers
2010-03-14 23:25       ` David Miller
2010-03-14 23:34         ` Joseph S. Myers
2010-03-15  0:15           ` David Miller
2010-03-15  0:13         ` Chris Lattner
2010-03-15  0:14           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).