public inbox for java-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [JAVA PATCH] Enable more array bounds check elimination
@ 2016-02-22 18:10 roger
  2016-02-22 18:13 ` Andrew Haley
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: roger @ 2016-02-22 18:10 UTC (permalink / raw)
  To: java-patches, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6877 bytes --]


It has been a while since my last contribution.  The following patch allows GCC's optimizers
to more aggressively eliminate and optimize java array bounds checks.  The results are
quite impressive, for example producing a 26% performance improvement on the sieve.java
benchmark given at http://keithlea.com/javabench/ on x86_64-pc-linux-gnu, reducing the
runtime for 1 million iterations from 31.5 seconds on trunk, to 25.0s with this patch, in fact
eliminating all array bounds checks.  This is close to the 22.3s of an equivalent C/C++
implementation, and significantly closes the gap to Java HotSpot(TM) JIT at 23.0 seconds.

The approach is to provide sufficient information in the gimple generated by the gcj front-end
to allow the optimizers to do their thing.  For array allocations of constant length, I propose
generating an additional (cheap) write to the array length field returned from _Jv_NewPrimArray,
which is then sufficient to allow this value to propagate throughout the optimizers.

This is probably best explained by a simple example.  Consider the array initializer below:

    private static int mk1[] = { 71,85,95 };

which gets compiled into the java byte code sequence below:

       0: iconst_3
       1: newarray       int
       3: dup
       4: iconst_0
       5: bipush        71
       7: iastore
       8: dup
       9: iconst_1
      10: bipush        85
      12: iastore
      13: dup
      14: iconst_2
      15: bipush        95
      17: iastore
      18: putstatic     #3                  // Field mk1:[I
      21: return

Currently, the .004t.gimple generated by gcj for the array allocation is the cryptic:

    #slot#0#0 = 3;
    #ref#0#2 = _Jv_NewPrimArray (&_Jv_intClass, #slot#0#0);
    #ref#1#4 = #ref#0#2;
    _ref_1_4.6 = #ref#1#4;

which unfortunately doesn't provide many clues for the middle-end, so we end up
generating the following .210t.optimized:

  void * _3 = _Jv_NewPrimArray (&_Jv_intClass, 3);
  int _4 = MEM[(struct int[] *)_3].length;
  unsigned int _5 = (unsigned int) _4;
  if (_4 == 0)
    goto <bb 3>;
  else
    goto <bb 4>;
  <bb 3>:
  _Jv_ThrowBadArrayIndex (0);
  <bb 4>:
  MEM[(int *)_3 + 12B] = 71;
  if (_5 == 1)
    goto <bb 5>;
  else
    goto <bb 6>;
  <bb 5>:
  _Jv_ThrowBadArrayIndex (1);
  <bb 6>:
  MEM[(int *)_3 + 16B] = 85;
  if (_5 == 2)
    goto <bb 7>;
  else
    goto <bb 8>;
  <bb 7>:
  _Jv_ThrowBadArrayIndex (2);
  <bb 8>:
  MEM[(int *)_3 + 20B] = 95;
  mk1 = _3;
  return;

which obviously contains three run-time array bounds checks.  These same checks appear
in the x86_64 assembly language:

	subq	$8, %rsp
	xorl	%eax, %eax
	movl	$3, %esi
	movl	$_Jv_intClass, %edi
	call	_Jv_NewPrimArray
	movl	8(%rax), %edx
	testl	%edx, %edx
	je	.L13
	cmpl	$1, %edx
	movl	$71, 12(%rax)
	je	.L14
	cmpl	$2, %edx
	movl	$85, 16(%rax)
	je	.L15
	movl	$95, 20(%rax)
	movq	%rax, _ZN9TestArray3mk1E(%rip)
	addq	$8, %rsp
	ret
.L13:	xorl	%edi, %edi
	xorl	%eax, %eax
	call	_Jv_ThrowBadArrayIndex
.L15:	movl	$2, %edi
	xorl	%eax, %eax
	call	_Jv_ThrowBadArrayIndex
.L14:	movl	$1, %edi
	xorl	%eax, %eax
	call	_Jv_ThrowBadArrayIndex


With the patch below, we now generate the much more informative .004t.gimple for this:

    D.926 = _Jv_NewPrimArray (&_Jv_intClass, 3);
    D.926->length = 3;

This additional write to the array length field of the newly allocated array enables much
more simplification.  The resulting .210t.optimized for our array initialization now becomes:

  struct int[3] * _3;
  _3 = _Jv_NewPrimArray (&_Jv_intClass, 3);
  MEM[(int *)_3 + 8B] = { 3, 71, 85, 95 };
  mk1 = _3;
  return;

And the x86_64 assembly code is also much prettier:

	subq	$8, %rsp
        movl    $3, %esi
        movl    $_Jv_intClass, %edi
        xorl    %eax, %eax
        call    _Jv_NewPrimArray
        movdqa  .LC0(%rip), %xmm0
        movq    %rax, _ZN9TestArray3mk1E(%rip)
        movups  %xmm0, 8(%rax)
	addq	$8, %rsp
	ret
.LC0:	.long 3
	.long 71
	.long 85
	.long 95


Achieving this result required two minor tweaks.   The first is to allow the array length constant
to reach the newarray call, by allowing constants to remain on the quickstack.  This allows the
call to _Jv_NewPrimArray to have a constant integer argument instead of the opaque #slot#0#0.
Then in the code that constructs the call to _Jv_NewPrimArray we wrap it in a COMPOUND_EXPR
allowing us to insert the superfluous, but helpful, write to the length field.

Whilst working on this improvement I also noticed that the array bounds checks we were
initially generating could also be improved.  Currently, an array bound check in 004t.gimple
looks like:

    D.925 = MEM[(struct int[] *)_ref_1_4.6].length;
    D.926 = (unsigned int) D.925;
    if (_slot_2_5.9 >= D.926) goto <D.927>; else goto <D.921>;
    <D.927>:
    _Jv_ThrowBadArrayIndex (_slot_2_5.8);
    if (0 != 0) goto <D.928>; else goto <D.921>;
    <D.928>:
    iftmp.7 = 1;
    goto <D.922>;
    <D.921>:
    iftmp.7 = 0;
    <D.922>:

Notice the unnecessary "0 != 0" and the dead assignments to iftmp.7 (which is unused).

With the patch below, we now not only avoid this conditional but also use __builtin_expect
to inform the compiler that throwing an BadArrayIndex exception is typically unlikely.  i.e.

    D.930 = MEM[(struct int[] *)_ref_1_4.4].length;
    D.931 = D.930 <= 1;
    D.932 = __builtin_expect (D.931, 0);
    if (D.932 != 0) goto <D.933>; else goto <D.934>;
    <D.933>:
    _Jv_ThrowBadArrayIndex (0);
    <D.934>:


The following patch has been tested on x86_64-pc-linux-gnu with a full make bootstrap
and make check, with no new failures/regressions.

Please let me know what you think (for stage 1 once it reopens)?

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cambridge CB4 0EY

2016-02-21  Roger Sayle  <roger@nextmovesoftware.com>

	* expr.c (push_value): Only call flush_quick_stack for non-constant
	arguments.
	(build_java_throw_out_of_bounds_exception): No longer wrap calls
	to _Jv_ThowBadArrayIndex in a COMPOUND_EXPR as no longer needed.
	(java_check_reference): Annotate COND_EXPR with __builtin_expect
	to indicate that calling _Jv_ThrowNullPointerException is unlikely.
	(build_java_arrayaccess): Construct an unlikely COND_EXPR instead
	of a TRUTH_ANDIF_EXPR in a COMPOUND_EXPR.  Only generate array
	index MULT_EXPR when size_exp is not unity.
	(build_array_length_annotation): When optimizing, generate a write
	to the allocated array's length field to expose constant lengths
	to GCC's optimizers.
	(build_newarray): Call new build_array_length_annotation.
	(build_anewarray): Likewise.


[-- Attachment #2: patch7a.txt --]
[-- Type: text/plain, Size: 7574 bytes --]

Index: expr.c
===================================================================
--- expr.c	(revision 233190)
+++ expr.c	(working copy)
@@ -37,6 +37,7 @@
 #include "jcf.h"
 #include "parse.h"
 #include "tree-iterator.h"
+#include "tree-eh.h"
 
 static void flush_quick_stack (void);
 static void push_value (tree);
@@ -54,6 +55,7 @@
 static void expand_java_pushc (int, tree);
 static void expand_java_return (tree);
 static void expand_load_internal (int, tree, int);
+static void expand_store_internal (tree, int, int);
 static void expand_java_NEW (tree);
 static void expand_java_INSTANCEOF (tree);
 static void expand_java_CHECKCAST (tree);
@@ -273,10 +275,12 @@
   /* If the value has a side effect, then we need to evaluate it
      whether or not the result is used.  If the value ends up on the
      quick stack and is then popped, this won't happen -- so we flush
-     the quick stack.  It is safest to simply always flush, though,
-     since TREE_SIDE_EFFECTS doesn't capture COMPONENT_REF, and for
-     the latter we may need to strip conversions.  */
-  flush_quick_stack ();
+     the quick stack.  It is safest to always flush non-constant
+     operands.  */
+  if (! TREE_CONSTANT (value)
+      || TREE_SIDE_EFFECTS (value)
+      || tree_could_trap_p (value))
+    flush_quick_stack ();
 }
 
 /* Pop a type from the type stack.
@@ -778,19 +782,13 @@
 {
   tree node;
 
-  /* We need to build a COMPOUND_EXPR because _Jv_ThrowBadArrayIndex()
-     has void return type.  We cannot just set the type of the CALL_EXPR below
-     to int_type_node because we would lose it during gimplification.  */
+  /* _Jv_ThrowBadArrayIndex() has void return type.  */
   gcc_assert (VOID_TYPE_P (TREE_TYPE (TREE_TYPE (soft_badarrayindex_node))));
   node = build_call_nary (void_type_node,
-			       build_address_of (soft_badarrayindex_node),
-			       1, index);
+			  build_address_of (soft_badarrayindex_node),
+			  1, index);
   TREE_SIDE_EFFECTS (node) = 1;
-
-  node = build2 (COMPOUND_EXPR, int_type_node, node, integer_zero_node);
-  TREE_SIDE_EFFECTS (node) = 1;	/* Allows expansion within ANDIF */
-
-  return (node);
+  return node;
 }
 
 /* Return the length of an array. Doesn't perform any checking on the nature
@@ -833,10 +831,12 @@
 {
   if (!flag_syntax_only && check)
     {
+      tree test;
       expr = save_expr (expr);
-      expr = build3 (COND_EXPR, TREE_TYPE (expr),
-		     build2 (EQ_EXPR, boolean_type_node,
-			     expr, null_pointer_node),
+      test = build2 (EQ_EXPR, boolean_type_node, expr, null_pointer_node);
+      test = build_call_expr (builtin_decl_implicit (BUILT_IN_EXPECT), 2,
+			      test, boolean_false_node);
+      expr = build3 (COND_EXPR, TREE_TYPE (expr), test,
 		     build_call_nary (void_type_node, 
 				      build_address_of (soft_nullpointer_node),
 				      0),
@@ -865,7 +865,7 @@
 tree
 build_java_arrayaccess (tree array, tree type, tree index)
 {
-  tree node, throw_expr = NULL_TREE;
+  tree node;
   tree data_field;
   tree ref;
   tree array_type = TREE_TYPE (TREE_TYPE (array));
@@ -882,9 +882,9 @@
     {
       /* Generate:
        * (unsigned jint) INDEX >= (unsigned jint) LEN
-       *    && throw ArrayIndexOutOfBoundsException.
+       *    ? throw ArrayIndexOutOfBoundsException : INDEX.
        * Note this is equivalent to and more efficient than:
-       * INDEX < 0 || INDEX >= LEN && throw ... */
+       * INDEX < 0 || INDEX >= LEN ? throw ... : INDEX. */
       tree test;
       tree len = convert (unsigned_int_type_node,
 			  build_java_array_length_access (array));
@@ -893,19 +893,14 @@
 			  len);
       if (! integer_zerop (test))
 	{
-	  throw_expr
-	    = build2 (TRUTH_ANDIF_EXPR, int_type_node, test,
-		      build_java_throw_out_of_bounds_exception (index));
-	  /* allows expansion within COMPOUND */
-	  TREE_SIDE_EFFECTS( throw_expr ) = 1;
+	  test = build_call_expr (builtin_decl_implicit (BUILT_IN_EXPECT), 2,
+				  test, boolean_false_node);
+          index = build3(COND_EXPR, int_type_node, test,
+			 build_java_throw_out_of_bounds_exception (index),
+                         index);
 	}
     }
 
-  /* If checking bounds, wrap the index expr with a COMPOUND_EXPR in order
-     to have the bounds check evaluated first. */
-  if (throw_expr != NULL_TREE)
-    index = build2 (COMPOUND_EXPR, int_type_node, throw_expr, index);
-
   data_field = lookup_field (&array_type, get_identifier ("data"));
 
   ref = build3 (COMPONENT_REF, TREE_TYPE (data_field),    
@@ -919,9 +914,11 @@
 
   /* Multiply the index by the size of an element to obtain a byte
      offset.  Convert the result to a pointer to the element type.  */
-  index = build2 (MULT_EXPR, sizetype, 
-		  fold_convert (sizetype, index), 
-		  size_exp);
+  index = fold_convert (sizetype, index);
+  if (! integer_onep (size_exp))
+    {
+      index = build2 (MULT_EXPR, sizetype, index, size_exp);
+    }
 
   /* Sum the byte offset and the address of the data field.  */
   node = fold_build_pointer_plus (node, index);
@@ -1026,6 +1023,34 @@
   return indexed_type;
 }
 
+/* When optimizing, wrap calls to array allocation functions taking
+   constant length arguments, in a COMPOUND_EXPR, containing an
+   explict assignment of the .length field, for GCC's optimizers.  */
+
+static tree
+build_array_length_annotation (tree call, tree length)
+{
+  if (optimize
+      && TREE_CONSTANT (length)
+      && is_array_type_p (TREE_TYPE (call)))
+    {
+      tree type, note;
+      type = TREE_TYPE (call);
+      call = save_expr(call);
+      note = build3 (COMPONENT_REF, int_type_node,
+		     build1 (INDIRECT_REF, TREE_TYPE (type), call),
+		     lookup_field (&TREE_TYPE (type),
+				   get_identifier ("length")),
+		     NULL_TREE);
+      note = build2 (MODIFY_EXPR, int_type_node, note, length);
+      TREE_SIDE_EFFECTS (note) = 1;
+      call = build2 (COMPOUND_EXPR, TREE_TYPE (call), note, call);
+      TREE_SIDE_EFFECTS (call) = 1;
+    }
+  return call;
+}
+
+
 /* newarray triggers a call to _Jv_NewPrimArray. This function should be 
    called with an integer code (the type of array to create), and the length
    of the array to create.  */
@@ -1033,7 +1058,7 @@
 tree
 build_newarray (int atype_value, tree length)
 {
-  tree type_arg;
+  tree type_arg, call;
 
   tree prim_type = decode_newarray_type (atype_value);
   tree type
@@ -1045,9 +1070,10 @@
      some work.  */
   type_arg = build_class_ref (prim_type);
 
-  return build_call_nary (promote_type (type),
+  call = build_call_nary (promote_type (type),
 			  build_address_of (soft_newarray_node),
 			  2, type_arg, length);
+  return build_array_length_annotation (call, length);
 }
 
 /* Generates anewarray from a given CLASS_TYPE. Gets from the stack the size
@@ -1061,12 +1087,14 @@
 			     tree_fits_shwi_p (length)
 			     ? tree_to_shwi (length) : -1);
 
-  return build_call_nary (promote_type (type),
-			  build_address_of (soft_anewarray_node),
-			  3,
-			  length,
-			  build_class_ref (class_type),
-			  null_pointer_node);
+  tree call = build_call_nary (promote_type (type),
+			       build_address_of (soft_anewarray_node),
+			       3,
+			       length,
+			       build_class_ref (class_type),
+			       null_pointer_node);
+
+  return build_array_length_annotation (call, length);
 }
 
 /* Return a node the evaluates 'new TYPE[LENGTH]'. */

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [JAVA PATCH] Enable more array bounds check elimination
@ 2016-02-22 23:02 Roger Sayle
  2016-02-23  9:56 ` Andrew Haley
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Sayle @ 2016-02-22 23:02 UTC (permalink / raw)
  To: Andrew Haley; +Cc: java-patches


Hi Andrew,

On 02/22/2016, Andrew Haley wrote:
> I take it that this only really works with new arrays?

Alas yes, this patch only addresses the case that the array allocation is
visible (in the same method after inlining) as the array bounds check,
which fortunately is fairly frequent, especially for initializers.

I'd assumed that the middle-end optimizers were already doing a reasonable
job for the harder cases, but inspired by your question above, I've done some
more investigating and noticed other areas where improvements are possible.

For example

	int len = arr.length;
	for (int i=0; i<arr.length; i++)
	  arr[i] = arr[i]+1;

eliminates all bounds checks when arr is double[], but disappointingly
not when arr is int[].  I suspect that the middle-end optimizers assume
the worst, perhaps that arr.data[-1] potentially aliases arr.length, or
similar.  Obviously, java doesn't allow negative array indices, but I
suspect the gcj front-end's gimple doesn't manage to convey this.

I'm personally impressed that GCC recognizes that although

	for (int i=0; i<3; i++)
	  ... arr[i] ...

typically needs to perform index checking on every iteration,
that the reversed loop

	for (int i=2; i>=0; i--)
	  ... arr[i] ...

only needs to perform array bounds checking on the first iteration,
i.e. optimizes away the later checks [in unrolled loops].  This might
seem obvious, but there's a lot of analysis required to recover this
from java's bytecodes.


Please point me towards any relevant postings (of yours) on the subject of
gcj bounds check elimination, as I'd love to catch up on current thinking.

Cheers,

Roger
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-14 17:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 18:10 [JAVA PATCH] Enable more array bounds check elimination roger
2016-02-22 18:13 ` Andrew Haley
2016-02-22 18:18 ` Jeff Law
2016-07-13 20:07 ` Jeff Law
2016-07-14 17:12   ` Andrew Hughes
2016-07-14 17:43     ` Andrew Haley
2016-02-22 23:02 Roger Sayle
2016-02-23  9:56 ` Andrew Haley
2016-02-23 17:45   ` Andrew Hughes
2016-02-24  1:11     ` Roger Sayle
2016-02-24  3:23       ` Andrew Hughes
2016-02-24  9:53       ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).