From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19960 invoked by alias); 22 Feb 2016 18:10:12 -0000 Mailing-List: contact java-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: java-patches-owner@gcc.gnu.org Received: (qmail 19928 invoked by uid 89); 22 Feb 2016 18:10:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: Yes, score=5.6 required=5.0 tests=AWL,BAYES_50,RP_MATCHES_RCVD,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=Sum, 85, 95, HTo:U*java-patches X-Spam-User: qpsmtpd, 2 recipients X-HELO: server.nextmovesoftware.com Received: from server.nextmovesoftware.com (HELO server.nextmovesoftware.com) (162.254.253.69) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 22 Feb 2016 18:10:08 +0000 Received: from host109-147-72-25.range109-147.btcentralplus.com ([109.147.72.25]:61159 helo=macbookpro.home) by server.nextmovesoftware.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.85) (envelope-from ) id 1aXuvs-0005x8-Ox; Mon, 22 Feb 2016 13:10:05 -0500 From: "roger@nextmovesoftware.com" Content-Type: multipart/mixed; boundary="Apple-Mail=_A61AA292-C4E8-4571-BED0-6D206FB078DC" Subject: [JAVA PATCH] Enable more array bounds check elimination Message-Id: <4F362C11-3DAA-4A9A-AEAB-089C20B3590C@nextmovesoftware.com> Date: Mon, 22 Feb 2016 18:10:00 -0000 To: java-patches@gcc.gnu.org, gcc-patches@gcc.gnu.org Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-SW-Source: 2016-q1/txt/msg00014.txt.bz2 --Apple-Mail=_A61AA292-C4E8-4571-BED0-6D206FB078DC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Content-length: 6799 It has been a while since my last contribution. The following patch allows= GCC's optimizers to more aggressively eliminate and optimize java array bounds checks. The = results are quite impressive, for example producing a 26% performance improvement on th= e sieve.java benchmark given at http://keithlea.com/javabench/ on x86_64-pc-linux-gnu, r= educing the runtime for 1 million iterations from 31.5 seconds on trunk, to 25.0s with = this patch, in fact eliminating all array bounds checks. This is close to the 22.3s of an equi= valent C/C++ implementation, and significantly closes the gap to Java HotSpot(TM) JIT at= 23.0 seconds. The approach is to provide sufficient information in the gimple generated b= y the gcj front-end to allow the optimizers to do their thing. For array allocations of consta= nt length, I propose generating an additional (cheap) write to the array length field returned f= rom _Jv_NewPrimArray, which is then sufficient to allow this value to propagate throughout the op= timizers. This is probably best explained by a simple example. Consider the array in= itializer below: private static int mk1[] =3D { 71,85,95 }; which gets compiled into the java byte code sequence below: 0: iconst_3 1: newarray int 3: dup 4: iconst_0 5: bipush 71 7: iastore 8: dup 9: iconst_1 10: bipush 85 12: iastore 13: dup 14: iconst_2 15: bipush 95 17: iastore 18: putstatic #3 // Field mk1:[I 21: return Currently, the .004t.gimple generated by gcj for the array allocation is th= e cryptic: #slot#0#0 =3D 3; #ref#0#2 =3D _Jv_NewPrimArray (&_Jv_intClass, #slot#0#0); #ref#1#4 =3D #ref#0#2; _ref_1_4.6 =3D #ref#1#4; which unfortunately doesn't provide many clues for the middle-end, so we en= d up generating the following .210t.optimized: void * _3 =3D _Jv_NewPrimArray (&_Jv_intClass, 3); int _4 =3D MEM[(struct int[] *)_3].length; unsigned int _5 =3D (unsigned int) _4; if (_4 =3D=3D 0) goto ; else goto ; : _Jv_ThrowBadArrayIndex (0); : MEM[(int *)_3 + 12B] =3D 71; if (_5 =3D=3D 1) goto ; else goto ; : _Jv_ThrowBadArrayIndex (1); : MEM[(int *)_3 + 16B] =3D 85; if (_5 =3D=3D 2) goto ; else goto ; : _Jv_ThrowBadArrayIndex (2); : MEM[(int *)_3 + 20B] =3D 95; mk1 =3D _3; return; which obviously contains three run-time array bounds checks. These same ch= ecks appear in the x86_64 assembly language: subq $8, %rsp xorl %eax, %eax movl $3, %esi movl $_Jv_intClass, %edi call _Jv_NewPrimArray movl 8(%rax), %edx testl %edx, %edx je .L13 cmpl $1, %edx movl $71, 12(%rax) je .L14 cmpl $2, %edx movl $85, 16(%rax) je .L15 movl $95, 20(%rax) movq %rax, _ZN9TestArray3mk1E(%rip) addq $8, %rsp ret .L13: xorl %edi, %edi xorl %eax, %eax call _Jv_ThrowBadArrayIndex .L15: movl $2, %edi xorl %eax, %eax call _Jv_ThrowBadArrayIndex .L14: movl $1, %edi xorl %eax, %eax call _Jv_ThrowBadArrayIndex With the patch below, we now generate the much more informative .004t.gimpl= e for this: D.926 =3D _Jv_NewPrimArray (&_Jv_intClass, 3); D.926->length =3D 3; This additional write to the array length field of the newly allocated arra= y enables much more simplification. The resulting .210t.optimized for our array initializ= ation now becomes: struct int[3] * _3; _3 =3D _Jv_NewPrimArray (&_Jv_intClass, 3); MEM[(int *)_3 + 8B] =3D { 3, 71, 85, 95 }; mk1 =3D _3; return; And the x86_64 assembly code is also much prettier: subq $8, %rsp movl $3, %esi movl $_Jv_intClass, %edi xorl %eax, %eax call _Jv_NewPrimArray movdqa .LC0(%rip), %xmm0 movq %rax, _ZN9TestArray3mk1E(%rip) movups %xmm0, 8(%rax) addq $8, %rsp ret .LC0: .long 3 .long 71 .long 85 .long 95 Achieving this result required two minor tweaks. The first is to allow th= e array length constant to reach the newarray call, by allowing constants to remain on the quicksta= ck. This allows the call to _Jv_NewPrimArray to have a constant integer argument instead of the= opaque #slot#0#0. Then in the code that constructs the call to _Jv_NewPrimArray we wrap it in= a COMPOUND_EXPR allowing us to insert the superfluous, but helpful, write to the length fie= ld. Whilst working on this improvement I also noticed that the array bounds che= cks we were initially generating could also be improved. Currently, an array bound che= ck in 004t.gimple looks like: D.925 =3D MEM[(struct int[] *)_ref_1_4.6].length; D.926 =3D (unsigned int) D.925; if (_slot_2_5.9 >=3D D.926) goto ; else goto ; : _Jv_ThrowBadArrayIndex (_slot_2_5.8); if (0 !=3D 0) goto ; else goto ; : iftmp.7 =3D 1; goto ; : iftmp.7 =3D 0; : Notice the unnecessary "0 !=3D 0" and the dead assignments to iftmp.7 (whic= h is unused). With the patch below, we now not only avoid this conditional but also use _= _builtin_expect to inform the compiler that throwing an BadArrayIndex exception is typicall= y unlikely. i.e. D.930 =3D MEM[(struct int[] *)_ref_1_4.4].length; D.931 =3D D.930 <=3D 1; D.932 =3D __builtin_expect (D.931, 0); if (D.932 !=3D 0) goto ; else goto ; : _Jv_ThrowBadArrayIndex (0); : The following patch has been tested on x86_64-pc-linux-gnu with a full make= bootstrap and make check, with no new failures/regressions. Please let me know what you think (for stage 1 once it reopens)? Roger -- Roger Sayle, Ph.D. CEO and founder NextMove Software Limited Registered in England No. 07588305 Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cam= bridge CB4 0EY 2016-02-21 Roger Sayle * expr.c (push_value): Only call flush_quick_stack for non-constant arguments. (build_java_throw_out_of_bounds_exception): No longer wrap calls to _Jv_ThowBadArrayIndex in a COMPOUND_EXPR as no longer needed. (java_check_reference): Annotate COND_EXPR with __builtin_expect to indicate that calling _Jv_ThrowNullPointerException is unlikely. (build_java_arrayaccess): Construct an unlikely COND_EXPR instead of a TRUTH_ANDIF_EXPR in a COMPOUND_EXPR. Only generate array index MULT_EXPR when size_exp is not unity. (build_array_length_annotation): When optimizing, generate a write to the allocated array's length field to expose constant lengths to GCC's optimizers. (build_newarray): Call new build_array_length_annotation. (build_anewarray): Likewise. --Apple-Mail=_A61AA292-C4E8-4571-BED0-6D206FB078DC Content-Disposition: attachment; filename=patch7a.txt Content-Type: text/plain; x-unix-mode=0644; name="patch7a.txt" Content-Transfer-Encoding: quoted-printable Content-length: 7629 Index: expr.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- expr.c (revision 233190) +++ expr.c (working copy) @@ -37,6 +37,7 @@ #include "jcf.h" #include "parse.h" #include "tree-iterator.h" +#include "tree-eh.h" =20 static void flush_quick_stack (void); static void push_value (tree); @@ -54,6 +55,7 @@ static void expand_java_pushc (int, tree); static void expand_java_return (tree); static void expand_load_internal (int, tree, int); +static void expand_store_internal (tree, int, int); static void expand_java_NEW (tree); static void expand_java_INSTANCEOF (tree); static void expand_java_CHECKCAST (tree); @@ -273,10 +275,12 @@ /* If the value has a side effect, then we need to evaluate it whether or not the result is used. If the value ends up on the quick stack and is then popped, this won't happen -- so we flush - the quick stack. It is safest to simply always flush, though, - since TREE_SIDE_EFFECTS doesn't capture COMPONENT_REF, and for - the latter we may need to strip conversions. */ - flush_quick_stack (); + the quick stack. It is safest to always flush non-constant + operands. */ + if (! TREE_CONSTANT (value) + || TREE_SIDE_EFFECTS (value) + || tree_could_trap_p (value)) + flush_quick_stack (); } =20 /* Pop a type from the type stack. @@ -778,19 +782,13 @@ { tree node; =20 - /* We need to build a COMPOUND_EXPR because _Jv_ThrowBadArrayIndex() - has void return type. We cannot just set the type of the CALL_EXPR b= elow - to int_type_node because we would lose it during gimplification. */ + /* _Jv_ThrowBadArrayIndex() has void return type. */ gcc_assert (VOID_TYPE_P (TREE_TYPE (TREE_TYPE (soft_badarrayindex_node))= )); node =3D build_call_nary (void_type_node, - build_address_of (soft_badarrayindex_node), - 1, index); + build_address_of (soft_badarrayindex_node), + 1, index); TREE_SIDE_EFFECTS (node) =3D 1; - - node =3D build2 (COMPOUND_EXPR, int_type_node, node, integer_zero_node); - TREE_SIDE_EFFECTS (node) =3D 1; /* Allows expansion within ANDIF */ - - return (node); + return node; } =20 /* Return the length of an array. Doesn't perform any checking on the natu= re @@ -833,10 +831,12 @@ { if (!flag_syntax_only && check) { + tree test; expr =3D save_expr (expr); - expr =3D build3 (COND_EXPR, TREE_TYPE (expr), - build2 (EQ_EXPR, boolean_type_node, - expr, null_pointer_node), + test =3D build2 (EQ_EXPR, boolean_type_node, expr, null_pointer_node= ); + test =3D build_call_expr (builtin_decl_implicit (BUILT_IN_EXPECT), 2, + test, boolean_false_node); + expr =3D build3 (COND_EXPR, TREE_TYPE (expr), test, build_call_nary (void_type_node,=20 build_address_of (soft_nullpointer_node), 0), @@ -865,7 +865,7 @@ tree build_java_arrayaccess (tree array, tree type, tree index) { - tree node, throw_expr =3D NULL_TREE; + tree node; tree data_field; tree ref; tree array_type =3D TREE_TYPE (TREE_TYPE (array)); @@ -882,9 +882,9 @@ { /* Generate: * (unsigned jint) INDEX >=3D (unsigned jint) LEN - * && throw ArrayIndexOutOfBoundsException. + * ? throw ArrayIndexOutOfBoundsException : INDEX. * Note this is equivalent to and more efficient than: - * INDEX < 0 || INDEX >=3D LEN && throw ... */ + * INDEX < 0 || INDEX >=3D LEN ? throw ... : INDEX. */ tree test; tree len =3D convert (unsigned_int_type_node, build_java_array_length_access (array)); @@ -893,19 +893,14 @@ len); if (! integer_zerop (test)) { - throw_expr - =3D build2 (TRUTH_ANDIF_EXPR, int_type_node, test, - build_java_throw_out_of_bounds_exception (index)); - /* allows expansion within COMPOUND */ - TREE_SIDE_EFFECTS( throw_expr ) =3D 1; + test =3D build_call_expr (builtin_decl_implicit (BUILT_IN_EXPECT), 2, + test, boolean_false_node); + index =3D build3(COND_EXPR, int_type_node, test, + build_java_throw_out_of_bounds_exception (index), + index); } } =20 - /* If checking bounds, wrap the index expr with a COMPOUND_EXPR in order - to have the bounds check evaluated first. */ - if (throw_expr !=3D NULL_TREE) - index =3D build2 (COMPOUND_EXPR, int_type_node, throw_expr, index); - data_field =3D lookup_field (&array_type, get_identifier ("data")); =20 ref =3D build3 (COMPONENT_REF, TREE_TYPE (data_field),=20=20=20=20 @@ -919,9 +914,11 @@ =20 /* Multiply the index by the size of an element to obtain a byte offset. Convert the result to a pointer to the element type. */ - index =3D build2 (MULT_EXPR, sizetype,=20 - fold_convert (sizetype, index),=20 - size_exp); + index =3D fold_convert (sizetype, index); + if (! integer_onep (size_exp)) + { + index =3D build2 (MULT_EXPR, sizetype, index, size_exp); + } =20 /* Sum the byte offset and the address of the data field. */ node =3D fold_build_pointer_plus (node, index); @@ -1026,6 +1023,34 @@ return indexed_type; } =20 +/* When optimizing, wrap calls to array allocation functions taking + constant length arguments, in a COMPOUND_EXPR, containing an + explict assignment of the .length field, for GCC's optimizers. */ + +static tree +build_array_length_annotation (tree call, tree length) +{ + if (optimize + && TREE_CONSTANT (length) + && is_array_type_p (TREE_TYPE (call))) + { + tree type, note; + type =3D TREE_TYPE (call); + call =3D save_expr(call); + note =3D build3 (COMPONENT_REF, int_type_node, + build1 (INDIRECT_REF, TREE_TYPE (type), call), + lookup_field (&TREE_TYPE (type), + get_identifier ("length")), + NULL_TREE); + note =3D build2 (MODIFY_EXPR, int_type_node, note, length); + TREE_SIDE_EFFECTS (note) =3D 1; + call =3D build2 (COMPOUND_EXPR, TREE_TYPE (call), note, call); + TREE_SIDE_EFFECTS (call) =3D 1; + } + return call; +} + + /* newarray triggers a call to _Jv_NewPrimArray. This function should be=20 called with an integer code (the type of array to create), and the leng= th of the array to create. */ @@ -1033,7 +1058,7 @@ tree build_newarray (int atype_value, tree length) { - tree type_arg; + tree type_arg, call; =20 tree prim_type =3D decode_newarray_type (atype_value); tree type @@ -1045,9 +1070,10 @@ some work. */ type_arg =3D build_class_ref (prim_type); =20 - return build_call_nary (promote_type (type), + call =3D build_call_nary (promote_type (type), build_address_of (soft_newarray_node), 2, type_arg, length); + return build_array_length_annotation (call, length); } =20 /* Generates anewarray from a given CLASS_TYPE. Gets from the stack the si= ze @@ -1061,12 +1087,14 @@ tree_fits_shwi_p (length) ? tree_to_shwi (length) : -1); =20 - return build_call_nary (promote_type (type), - build_address_of (soft_anewarray_node), - 3, - length, - build_class_ref (class_type), - null_pointer_node); + tree call =3D build_call_nary (promote_type (type), + build_address_of (soft_anewarray_node), + 3, + length, + build_class_ref (class_type), + null_pointer_node); + + return build_array_length_annotation (call, length); } =20 /* Return a node the evaluates 'new TYPE[LENGTH]'. */ --Apple-Mail=_A61AA292-C4E8-4571-BED0-6D206FB078DC Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Content-length: 2 --Apple-Mail=_A61AA292-C4E8-4571-BED0-6D206FB078DC--