From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15757 invoked by alias); 8 Jan 2007 04:15:41 -0000 Received: (qmail 15740 invoked by uid 22791); 8 Jan 2007 04:15:39 -0000 X-Spam-Check-By: sourceware.org Received: from eyesopen.com (HELO mail.eyesopen.com) (208.41.78.163) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 08 Jan 2007 04:15:32 +0000 Received: from mail.eyesopen.com (localhost.localdomain [127.0.0.1]) by mail.eyesopen.com (Postfix) with ESMTP id D70CBD; Sun, 7 Jan 2007 21:15:27 -0700 (MST) Received: from 68.35.10.103 (SquirrelMail authenticated user roger) by mail.eyesopen.com with HTTP; Sun, 7 Jan 2007 21:15:27 -0700 (MST) Message-ID: <12690.68.35.10.103.1168229727.squirrel@mail.eyesopen.com> Date: Mon, 08 Jan 2007 04:15:00 -0000 Subject: [FORTRAN PATCH] Improved scalarization of constant array constructors From: "Roger Sayle" To: fortran@gcc.gnu.org, gcc-patches@gcc.gnu.org User-Agent: SquirrelMail/1.4.8-2.el4.centos4 MIME-Version: 1.0 Content-Type: multipart/mixed;boundary="----=_20070107211527_59630" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-01/txt/msg00538.txt.bz2 ------=_20070107211527_59630 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Content-length: 4373 This patch improves the way that gfortran expands constant array constructors to GIMPLE. It's probably best explained via a simple example; consider the following code: program v integer :: x(4) x(:) = (/ 3, 1, 4, 1 /) end program Currently, this is expanded by the gfortran front-end as: MAIN__ () { int4 x[4]; _gfortran_set_std (70, 127, 0); { int4 A.1[4]; struct array1_int4 atmp.0; atmp.0.dtype = 265; atmp.0.dim[0].stride = 1; atmp.0.dim[0].lbound = 0; atmp.0.dim[0].ubound = 3; atmp.0.data = (void *) &A.1; atmp.0.offset = 0; { static int4 data.3[4] = {3, 1, 4, 1}; __builtin_memcpy (&(*(int4[0:] *) atmp.0.data)[0], &data.3, 16); } { int8 S.4; S.4 = 0; while (1) { if (S.4 > 3) goto L.1; x[NON_LVALUE_EXPR ] = (*(int4[0:] *) atmp.0.data)[NON_LVALUE_EXPR ]; S.4 = S.4 + 1; } L.1:; } } } Notice that we create a temporary array A, that requires an array descriptor, that we then populate via a memcpy call, and then use this in the scalarization of the assignment. Hence we require three arrays; "A", "data" and "x", and two copies. With the patch below, we now generate the much simpler: MAIN__ () { int4 x[4]; _gfortran_set_std (70, 127, 0); { static int4 A.0[4] = {3, 1, 4, 1}; { int8 S.1; S.1 = 1; while (1) { if (S.1 > 4) goto L.1; x[S.1 + -1] = A.0[S.1 + -1]; S.1 = S.1 + 1; } L.1:; } } } where the array A, is initialized as a "static const", and then used directly during the scalarization. This requires only two arrays and a single copy. When the array constructor is large, this can result in significant time and space savings, for the common case. The optimization is that when the array constructor consists entirely of constant elements, that specify it's entire size, it's possible to pre-initialize "A" if we use gfortran's notion of a non-descriptor array type. Some additional comments. [1] The code is structured so that follow-up patches can re-use this code to lower EXPR_ARRAY expressions outside of the scalarizer, for example allowing us to use __builtin_memcpy to move "A" into "x" above. [2] The patch also contains a small, but unrelated clean-up, to gfc_trans_constructor, that tidies up the scope/use of the variable CONST_STRING. Investigation discovered that we set const_string even when the assignment was dead, and reorganizing things slightly makes the role of this variable slightly clearer. [3] This patch also contains another NON_LVALUE_EXPR tweak, which explains the elimination/simplificiation of the "NON_LVALUE_EXPR " subexpressions in the example code above. [4] This change actually allows us to do slightly better on the testcase gfortran.dg/vect/vect-5.f90. Previously, we expected to find two unaligned array accesses, though now, because we scalarize references as "A" rather than via "atmp.0.data", the vectorizer can tell that they are suitably aligned, and hence we only find one unaligned array access. The following patch has been tested against mainline with a full "make bootstrap", including gfortran, and regression tested with a top-level "make -k check" with no new failures. Ok for mainline? 2007-01-07 Roger Sayle * trans-array.c (constant_array_constructor_p): New function to determine whether an array constructor consists only of constant elements, and if so return its size. (gfc_build_constant_array_constructor): Construct a statically initialized gfortran array for a given EXPR_ARRAY. (gfc_trans_constant_array_constructor): Efficiently scalarize a constant array constructor. (gfc_trans_array_constructor): Tidy up use of CONST_STRING. Special case scalarization of constant array constructors, all of whose elements are specified, using constant_array_constructor_p and gfc_trans_constant_array_constructor. (gfc_conv_scalarized_array_ref): Check whetger info->offset is zero before adding it to index, to avoid creating a NON_LVALUE_EXPR. * gfortran.dg/array_constructor_14.f90: New test case. * gfortran.dg/vect/vect-5.f90: Update test for improved alignment. Roger -- ------=_20070107211527_59630 Content-Type: text/plain; name="patchf.txt" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="patchf.txt" Content-length: 6661 Index: trans-array.c =================================================================== *** trans-array.c (revision 120503) --- trans-array.c (working copy) *************** get_array_ctor_strlen (gfc_constructor * *** 1463,1468 **** --- 1463,1581 ---- return is_const; } + /* Check whether the array constructor C consists entirely of constant + elements, and if so returns the number of those elements, otherwise + return zero. Note, an empty or NULL array constructor returns zero. */ + + static unsigned HOST_WIDE_INT + constant_array_constructor_p (gfc_constructor * c) + { + unsigned HOST_WIDE_INT nelem = 0; + + while (c) + { + if (c->iterator + || c->expr->rank > 0 + || c->expr->expr_type != EXPR_CONSTANT) + return 0; + c = c->next; + nelem++; + } + return nelem; + } + + + /* Given EXPR, the constant array constructor specified by an EXPR_ARRAY, + and the tree type of it's elements, TYPE, return a static constant + variable that is compile-time initialized. */ + + static tree + gfc_build_constant_array_constructor (gfc_expr * expr, tree type) + { + tree tmptype, list, init, tmp; + HOST_WIDE_INT nelem; + gfc_constructor *c; + gfc_array_spec as; + gfc_se se; + + + /* First traverse the constructor list, converting the constants + to tree to build an initializer. */ + nelem = 0; + list = NULL_TREE; + c = expr->value.constructor; + while (c) + { + gfc_init_se (&se, NULL); + gfc_conv_constant (&se, c->expr); + if (c->expr->ts.type == BT_CHARACTER + && POINTER_TYPE_P (type)) + se.expr = gfc_build_addr_expr (pchar_type_node, se.expr); + list = tree_cons (NULL_TREE, se.expr, list); + c = c->next; + nelem++; + } + + /* Next detemine the tree type for the array. We use the gfortran + front-end's gfc_get_nodesc_array_type in order to create a suitable + GFC_ARRAY_TYPE_P that may be used by the scalarizer. */ + + memset (&as, 0, sizeof (gfc_array_spec)); + + as.rank = 1; + as.type = AS_EXPLICIT; + as.lower[0] = gfc_int_expr (0); + as.upper[0] = gfc_int_expr (nelem - 1); + tmptype = gfc_get_nodesc_array_type (type, &as, 3); + + init = build_constructor_from_list (tmptype, nreverse (list)); + + TREE_CONSTANT (init) = 1; + TREE_INVARIANT (init) = 1; + TREE_STATIC (init) = 1; + + tmp = gfc_create_var (tmptype, "A"); + TREE_STATIC (tmp) = 1; + TREE_CONSTANT (tmp) = 1; + TREE_INVARIANT (tmp) = 1; + TREE_READONLY (tmp) = 1; + DECL_INITIAL (tmp) = init; + + return tmp; + } + + + /* Translate a constant EXPR_ARRAY array constructor for the scalarizer. + This mostly initializes the scalarizer state info structure with the + appropriate values to directly use the array created by the function + gfc_build_constant_array_constructor. */ + + static void + gfc_trans_constant_array_constructor (gfc_loopinfo * loop, + gfc_ss * ss, tree type) + { + gfc_ss_info *info; + tree tmp; + + tmp = gfc_build_constant_array_constructor (ss->expr, type); + + info = &ss->data.info; + + info->descriptor = tmp; + info->data = build_fold_addr_expr (tmp); + info->offset = fold_build1 (NEGATE_EXPR, gfc_array_index_type, + loop->from[0]); + + info->delta[0] = gfc_index_zero_node; + info->start[0] = gfc_index_zero_node; + info->end[0] = gfc_index_zero_node; + info->stride[0] = gfc_index_one_node; + info->dim[0] = 0; + + if (info->dimen > loop->temp_dim) + loop->temp_dim = info->dimen; + } + /* Array constructors are handled by constructing a temporary, then using that within the scalarization loop. This is not optimal, but seems by far the *************** gfc_trans_array_constructor (gfc_loopinf *** 1476,1482 **** tree offsetvar; tree desc; tree type; - bool const_string; bool dynamic; ss->data.info.dimen = loop->dimen; --- 1589,1594 ---- *************** gfc_trans_array_constructor (gfc_loopinf *** 1484,1490 **** c = ss->expr->value.constructor; if (ss->expr->ts.type == BT_CHARACTER) { ! const_string = get_array_ctor_strlen (c, &ss->string_length); if (!ss->string_length) gfc_todo_error ("complex character array constructors"); --- 1596,1602 ---- c = ss->expr->value.constructor; if (ss->expr->ts.type == BT_CHARACTER) { ! bool const_string = get_array_ctor_strlen (c, &ss->string_length); if (!ss->string_length) gfc_todo_error ("complex character array constructors"); *************** gfc_trans_array_constructor (gfc_loopinf *** 1493,1502 **** type = build_pointer_type (type); } else ! { ! const_string = TRUE; ! type = gfc_typenode_for_spec (&ss->expr->ts); ! } /* See if the constructor determines the loop bounds. */ dynamic = false; --- 1605,1611 ---- type = build_pointer_type (type); } else ! type = gfc_typenode_for_spec (&ss->expr->ts); /* See if the constructor determines the loop bounds. */ dynamic = false; *************** gfc_trans_array_constructor (gfc_loopinf *** 1518,1523 **** --- 1627,1651 ---- mpz_clear (size); } + /* Special case constant array constructors. */ + if (!dynamic + && loop->dimen == 1 + && INTEGER_CST_P (loop->from[0]) + && INTEGER_CST_P (loop->to[0])) + { + unsigned HOST_WIDE_INT nelem = constant_array_constructor_p (c); + if (nelem > 0) + { + tree diff = fold_build2 (MINUS_EXPR, gfc_array_index_type, + loop->to[0], loop->from[0]); + if (compare_tree_int (diff, nelem - 1) == 0) + { + gfc_trans_constant_array_constructor (loop, ss, type); + return; + } + } + } + gfc_trans_create_temp_array (&loop->pre, &loop->post, loop, &ss->data.info, type, dynamic, true, false, false); *************** gfc_conv_scalarized_array_ref (gfc_se * *** 2045,2051 **** info->stride0); /* Add the offset for this dimension to the stored offset for all other dimensions. */ ! index = fold_build2 (PLUS_EXPR, gfc_array_index_type, index, info->offset); tmp = build_fold_indirect_ref (info->data); se->expr = gfc_build_array_ref (tmp, index); --- 2173,2180 ---- info->stride0); /* Add the offset for this dimension to the stored offset for all other dimensions. */ ! if (!integer_zerop (info->offset)) ! index = fold_build2 (PLUS_EXPR, gfc_array_index_type, index, info->offset); tmp = build_fold_indirect_ref (info->data); se->expr = gfc_build_array_ref (tmp, index); ------=_20070107211527_59630 Content-Type: text/plain; name="array_constructor_14.f90" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="array_constructor_14.f90" Content-length: 325 ! { dg-do compile } ! { dg-options "-O2 -fdump-tree-original" } subroutine foo(x) integer :: x(4) x(:) = (/ 3, 1, 4, 1 /) end subroutine subroutine bar(x) integer :: x(4) x = (/ 3, 1, 4, 1 /) end subroutine ! { dg-final { scan-tree-dump-times "data" 0 "original" } } ! { dg-final { cleanup-tree-dump "original" } } ------=_20070107211527_59630 Content-Type: text/plain; name="testf.txt" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="testf.txt" Content-length: 1289 Index: gfortran.dg/vect/vect-5.f90 =================================================================== *** gfortran.dg/vect/vect-5.f90 (revision 120503) --- gfortran.dg/vect/vect-5.f90 (working copy) *************** *** 37,43 **** ! { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } ! { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { vect_no_align } } } } ! ! { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align } } } } ! { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 3 "vect" { target { ilp32 && vect_no_align } } } } ! We also expect to vectorize one loop for lp64 targets that support --- 37,43 ---- ! { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } ! { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { vect_no_align } } } } ! ! { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { xfail { vect_no_align } } } } ! { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 3 "vect" { target { ilp32 && vect_no_align } } } } ! We also expect to vectorize one loop for lp64 targets that support ------=_20070107211527_59630--