[FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible
@ 2007-01-04 23:25 Roger Sayle
  2007-01-05  6:06 ` Steve Kargl
  2007-01-05  7:12 ` Paul Thomas
  0 siblings, 2 replies; 5+ messages in thread
From: Roger Sayle @ 2007-01-04 23:25 UTC (permalink / raw)
  To: fortran; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

The patch below builds upon the recent gfortran enhancement that implements
"a(:) = 0" using __builtin_memset, by now implementing suitable full array
assignments of the form "a(:) = b(:)" using __builtin_memcpy.  The use of
GCC's internal builtin allows the backend to utilize the most efficient
method, perhaps inlined intrinsic, to perform the move.  Whilst at some
point in the future, tree-ssa optimizers may be able to recognize these
idioms, and are required for source codes using DO ... CONTINUE loops,
it's much more efficient to lower F90's array operations intelligently
than to do pattern matching on trees or RTL.

As with the previous patch, this optimization only affects assignments
that involve entire arrays that don't require a descriptor (i.e. whose
bounds are known at compile-time).  Additionally, I've discovered that
gfortran supports "arrays of derived types containing allocatable
components" which can't be directly copied using a block move, but require
what the C++ front-end might call a copy constructor.  Hence the
optimization below applies only to arrays of numeric/logical types.

Included in this patch is a minor reorganization of gfc_trans_assignment
which splits out the scalarization code into it's own helper function.
I intend to reuse this in later patches, where it can now be used to
create conditionals of the form "if (condition) optimized_code(); else
original_runtime_scalarized_loops();".


The following patch has been tested on x86_64-unknown-linux-gnu, including
the middle-end patch committed as revision 120451, with a full "make
bootstrap", all default languages including gfortran, and regression
tested with a top-level "make -k check" with no new failures.

Ok for mainline?


2007-01-04  Roger Sayle  <roger@eyesopen.com>

        * trans-expr.c (gfc_trans_assignment_1): New subroutine to scalarize
        array assignments split out from gfc_trans_assignment.
        (gfc_trans_array_copy): New function to implement array to array
        copies via calls to __builtin_memcpy.
        (copyable_array_p): New helper function to identify an array of
        simple/POD types, that may be copied/assigned using memcpy.
        (gfc_trans_assignment): Use gfc_trans_array_copy to handle simple
        whole array assignments considered suitable by copyable_array_p.
        Invoke gfc_trans_assignment_1 to perform the fallback scalarization.

        * gfortran.dg/array_memcpy_1.f90: New test case.
        * gfortran.dg/array_memcpy_2.f90: Likewise.

Roger
--

[-- Attachment #2: patchf3.txt --]
[-- Type: text/plain, Size: 5956 bytes --]

Index: trans-expr.c
===================================================================
*** trans-expr.c	(revision 120354)
--- trans-expr.c	(working copy)
*************** gfc_trans_zero_assign (gfc_expr * expr)
*** 3579,3589 ****
    return fold_convert (void_type_node, tmp);
  }
  
! /* Translate an assignment.  Most of the code is concerned with
!    setting up the scalarizer.  */
  
! tree
! gfc_trans_assignment (gfc_expr * expr1, gfc_expr * expr2, bool init_flag)
  {
    gfc_se lse;
    gfc_se rse;
--- 3579,3654 ----
    return fold_convert (void_type_node, tmp);
  }
  
! /* Try to efficiently translate dst(:) = src(:).  Return NULL if this
!    can't be done.  EXPR1 is the destination/lhs and EXPR2 is the
!    source/rhs, both are gfc_full_array_ref_p which have checked for
!    dependencies.  */
  
! static tree
! gfc_trans_array_copy (gfc_expr * expr1, gfc_expr * expr2)
! {
!   tree dst, dlen, dtype;
!   tree src, slen, stype;
!   tree tmp, args;
! 
!   dst = gfc_get_symbol_decl (expr1->symtree->n.sym);
!   src = gfc_get_symbol_decl (expr2->symtree->n.sym);
! 
!   dtype = TREE_TYPE (dst);
!   if (POINTER_TYPE_P (dtype))
!     dtype = TREE_TYPE (dtype);
!   stype = TREE_TYPE (src);
!   if (POINTER_TYPE_P (stype))
!     stype = TREE_TYPE (stype);
! 
!   if (!GFC_ARRAY_TYPE_P (dtype) || !GFC_ARRAY_TYPE_P (stype))
!     return NULL_TREE;
! 
!   /* Determine the lengths of the arrays.  */
!   dlen = GFC_TYPE_ARRAY_SIZE (dtype);
!   if (!dlen || TREE_CODE (dlen) != INTEGER_CST)
!     return NULL_TREE;
!   dlen = fold_build2 (MULT_EXPR, gfc_array_index_type, dlen,
! 		      TYPE_SIZE_UNIT (gfc_get_element_type (dtype)));
! 
!   slen = GFC_TYPE_ARRAY_SIZE (stype);
!   if (!slen || TREE_CODE (slen) != INTEGER_CST)
!     return NULL_TREE;
!   slen = fold_build2 (MULT_EXPR, gfc_array_index_type, slen,
! 		      TYPE_SIZE_UNIT (gfc_get_element_type (stype)));
! 
!   /* Sanity check that they are the same.  This should always be
!      the case, as we should already have checked for conformance.  */
!   if (!tree_int_cst_equal (slen, dlen))
!     return NULL_TREE;
! 
!   /* Convert arguments to the correct types.  */
!   if (!POINTER_TYPE_P (TREE_TYPE (dst)))
!     dst = gfc_build_addr_expr (pvoid_type_node, dst);
!   else
!     dst = fold_convert (pvoid_type_node, dst);
! 
!   if (!POINTER_TYPE_P (TREE_TYPE (src)))
!     src = gfc_build_addr_expr (pvoid_type_node, src);
!   else
!     src = fold_convert (pvoid_type_node, src);
! 
!   dlen = fold_convert (size_type_node, dlen);
! 
!   /* Construct call to __builtin_memcpy.  */
!   args = build_tree_list (NULL_TREE, dlen);
!   args = tree_cons (NULL_TREE, src, args);
!   args = tree_cons (NULL_TREE, dst, args);
!   tmp = build_function_call_expr (built_in_decls[BUILT_IN_MEMCPY], args);
!   return fold_convert (void_type_node, tmp);
! }
! 
! 
! /* Subroutine of gfc_trans_assignment that actually scalarizes the
!    assignment.  EXPR1 is the destination/RHS and EXPR is the source/LHS.  */
! 
! static tree
! gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag)
  {
    gfc_se lse;
    gfc_se rse;
*************** gfc_trans_assignment (gfc_expr * expr1, 
*** 3596,3621 ****
    stmtblock_t body;
    bool l_is_temp;
  
-   /* Special case a single function returning an array.  */
-   if (expr2->expr_type == EXPR_FUNCTION && expr2->rank > 0)
-     {
-       tmp = gfc_trans_arrayfunc_assign (expr1, expr2);
-       if (tmp)
- 	return tmp;
-     }
- 
-   /* Special case assigning an array to zero.  */
-   if (expr1->expr_type == EXPR_VARIABLE
-       && expr1->rank > 0
-       && expr1->ref
-       && gfc_full_array_ref_p (expr1->ref)
-       && is_zero_initializer_p (expr2))
-     {
-       tmp = gfc_trans_zero_assign (expr1);
-       if (tmp)
-         return tmp;
-     }
- 
    /* Assignment of the form lhs = rhs.  */
    gfc_start_block (&block);
  
--- 3661,3666 ----
*************** gfc_trans_assignment (gfc_expr * expr1, 
*** 3751,3756 ****
--- 3796,3873 ----
    return gfc_finish_block (&block);
  }
  
+ 
+ /* Check whether EXPR, which is an EXPR_VARIABLE, is a copyable array.  */
+ 
+ static bool
+ copyable_array_p (gfc_expr * expr)
+ {
+   /* First check it's an array.  */
+   if (expr->rank < 1 || !expr->ref)
+     return false;
+ 
+   /* Next check that it's of a simple enough type.  */
+   switch (expr->ts.type)
+     {
+     case BT_INTEGER:
+     case BT_REAL:
+     case BT_COMPLEX:
+     case BT_LOGICAL:
+       return true;
+ 
+     default:
+       break;
+     }
+ 
+   return false;
+ }
+ 
+ /* Translate an assignment.  */
+ 
+ tree
+ gfc_trans_assignment (gfc_expr * expr1, gfc_expr * expr2, bool init_flag)
+ {
+   tree tmp;
+ 
+   /* Special case a single function returning an array.  */
+   if (expr2->expr_type == EXPR_FUNCTION && expr2->rank > 0)
+     {
+       tmp = gfc_trans_arrayfunc_assign (expr1, expr2);
+       if (tmp)
+ 	return tmp;
+     }
+ 
+   /* Special case assigning an array to zero.  */
+   if (expr1->expr_type == EXPR_VARIABLE
+       && expr1->rank > 0
+       && expr1->ref
+       && gfc_full_array_ref_p (expr1->ref)
+       && is_zero_initializer_p (expr2))
+     {
+       tmp = gfc_trans_zero_assign (expr1);
+       if (tmp)
+         return tmp;
+     }
+ 
+   /* Special case copying one array to another.  */
+   if (expr1->expr_type == EXPR_VARIABLE
+       && copyable_array_p (expr1)
+       && gfc_full_array_ref_p (expr1->ref)
+       && expr2->expr_type == EXPR_VARIABLE
+       && copyable_array_p (expr2)
+       && gfc_full_array_ref_p (expr2->ref)
+       && gfc_compare_types (&expr1->ts, &expr2->ts)
+       && !gfc_check_dependency (expr1, expr2, 0))
+     {
+       tmp = gfc_trans_array_copy (expr1, expr2);
+       if (tmp)
+         return tmp;
+     }
+ 
+   /* Fallback to the scalarizer to generate explicit loops.  */
+   return gfc_trans_assignment_1 (expr1, expr2, init_flag);
+ }
+ 
  tree
  gfc_trans_init_assign (gfc_code * code)
  {

[-- Attachment #3: array_memcpy_1.f90 --]
[-- Type: text/plain, Size: 507 bytes --]

! { dg-do compile }
! { dg-options "-O2 -fdump-tree-original" }
subroutine testi(a,b)
  integer :: a(20)
  integer :: b(20)
  a = b;
end subroutine

subroutine testr(a,b)
  real :: a(20)
  real :: b(20)
  a = b;
end subroutine

subroutine testz(a,b)
  complex :: a(20)
  complex :: b(20)
  a = b;
end subroutine

subroutine testl(a,b)
  logical :: a(20)
  logical :: b(20)
  a = b;
end subroutine

! { dg-final { scan-tree-dump-times "memcpy" 4 "original" } }
! { dg-final { cleanup-tree-dump "original" } }

[-- Attachment #4: array_memcpy_2.f90 --]
[-- Type: text/plain, Size: 558 bytes --]

! This checks that the "z = y" assignment is not considered copyable, as the
! array is of a derived type containing allocatable components.  Hence, we
! we should expand the scalarized loop, which contains *two* memcpy calls.
! { dg-do compile }
! { dg-options "-O2 -fdump-tree-original" }

  type :: a
    integer, allocatable :: i(:)
  end type a

  type :: b
    type (a), allocatable :: at(:)
  end type b

  type(b) :: y(2), z(2)

  z = y
end
! { dg-final { scan-tree-dump-times "memcpy" 2 "original" } }
! { dg-final { cleanup-tree-dump "original" } }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible
  2007-01-04 23:25 [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible Roger Sayle
@ 2007-01-05  6:06 ` Steve Kargl
  2007-01-05  7:12 ` Paul Thomas
  1 sibling, 0 replies; 5+ messages in thread
From: Steve Kargl @ 2007-01-05  6:06 UTC (permalink / raw)
  To: Roger Sayle; +Cc: fortran, gcc-patches

On Thu, Jan 04, 2007 at 04:25:10PM -0700, Roger Sayle wrote:
> bounds are known at compile-time).  Additionally, I've discovered that
> gfortran supports "arrays of derived types containing allocatable
> components" which can't be directly copied using a block move, but require
> what the C++ front-end might call a copy constructor.  Hence the

This is the allocatable component feature that is commonly 
referred to TR 15581, ie, an addendum to the F95 standard
and part the F2003 standard.


> 2007-01-04  Roger Sayle  <roger@eyesopen.com>
> 
>         * trans-expr.c (gfc_trans_assignment_1): New subroutine to scalarize
>         array assignments split out from gfc_trans_assignment.
>         (gfc_trans_array_copy): New function to implement array to array
>         copies via calls to __builtin_memcpy.
>         (copyable_array_p): New helper function to identify an array of
>         simple/POD types, that may be copied/assigned using memcpy.
>         (gfc_trans_assignment): Use gfc_trans_array_copy to handle simple
>         whole array assignments considered suitable by copyable_array_p.
>         Invoke gfc_trans_assignment_1 to perform the fallback scalarization.
> 
>         * gfortran.dg/array_memcpy_1.f90: New test case.
>         * gfortran.dg/array_memcpy_2.f90: Likewise.


OK with two spelling corrections in comments

>   
> ! /* Try to efficiently translate dst(:) = src(:).  Return NULL if this
> !    can't be done.  EXPR1 is the destination/lhs and EXPR2 is the
> !    source/rhs, both are gfc_full_array_ref_p which have checked for

s/have/have been/

> !    dependencies.  */
>   
> ! 
> ! /* Subroutine of gfc_trans_assignment that actually scalarizes the
> !    assignment.  EXPR1 is the destination/RHS and EXPR is the source/LHS.  */

The 2nd EXPR should be EXPR2.

-- 
Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible
  2007-01-04 23:25 [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible Roger Sayle
  2007-01-05  6:06 ` Steve Kargl
@ 2007-01-05  7:12 ` Paul Thomas
  2007-01-05 17:29   ` Roger Sayle
  1 sibling, 1 reply; 5+ messages in thread
From: Paul Thomas @ 2007-01-05  7:12 UTC (permalink / raw)
  To: Roger Sayle; +Cc: fortran, gcc-patches

Roger,
>
> As with the previous patch, this optimization only affects assignments
> that involve entire arrays that don't require a descriptor (i.e. whose
> bounds are known at compile-time).  Additionally, I've discovered that
> gfortran supports "arrays of derived types containing allocatable
> components" which can't be directly copied using a block move, but require
> what the C++ front-end might call a copy constructor.  Hence the
> optimization below applies only to arrays of numeric/logical types.
>
>   
I would have thought that the derived types with allocatable components 
would make perfect candidates for memcpy.  The library memcpy is already 
used to copy allocated components, since they malloc'ed.  It would be a 
shame to leave derived types out of your scheme IMHO.

Best regards

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when       possible
  2007-01-05  7:12 ` Paul Thomas
@ 2007-01-05 17:29   ` Roger Sayle
  2007-01-05 18:07     ` Paul Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Sayle @ 2007-01-05 17:29 UTC (permalink / raw)
  To: Paul Thomas; +Cc: fortran, gcc-patches

Hi Paul,

On Fri, January 5, 2007 12:11 am, Paul Thomas wrote:
>> As with the previous patch, this optimization only affects assignments
>> that involve entire arrays that don't require a descriptor (i.e. whose
>> bounds are known at compile-time).  Additionally, I've discovered that
>> gfortran supports "arrays of derived types containing allocatable
>> components" which can't be directly copied using a block move, but
>> require what the C++ front-end might call a copy constructor.  Hence the
>> optimization below applies only to arrays of numeric/logical types.
>>
> I would have thought that the derived types with allocatable components
> would make perfect candidates for memcpy.  The library memcpy is already
> used to copy allocated components, since they malloc'ed.  It would be a
> shame to leave derived types out of your scheme IMHO.

Agreed.  The issue is purely my limited understanding of F2003's type
system, and how this is represented in gfortran.  I believe that all that
might be required is to add a case to handle BT_DERIVED to the new
copyable_array_p function, that checks the fields are all copyable.
Does the Fortran spec have the equivalent of C/C++'s POD or primitive
types?  I could find a gfc_numeric_ts, but this didn't include LOGICAL.
And I believe that both ALLOCATABLE types and CHARACTER types can't be
memcpy copied.  So I guess that a type is copyable if its POD type, or a
derived type consisting only of copyable types?

Any help from folks that can actually program in FORTRAN would be much
appreciated :-).  I'm not sure I speak the language well enough to even
compose a testcase, let alone the exceptions that I'm unaware of.

Roger
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when       possible
  2007-01-05 17:29   ` Roger Sayle
@ 2007-01-05 18:07     ` Paul Thomas
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Thomas @ 2007-01-05 18:07 UTC (permalink / raw)
  To: Roger Sayle; +Cc: fortran, gcc-patches

Roger,
> Agreed.  The issue is purely my limited understanding of F2003's type
> system, and how this is represented in gfortran.  I believe that all that
> might be required is to add a case to handle BT_DERIVED to the new
> copyable_array_p function, that checks the fields are all copyable.
> Does the Fortran spec have the equivalent of C/C++'s POD or primitive
> types?  I could find a gfc_numeric_ts, but this didn't include LOGICAL.
>   
PODs? primitive types? You are talking to the C++lexic here.... well 
sort of.
> And I believe that both ALLOCATABLE types and CHARACTER types can't be
> memcpy copied.  So I guess that a type is copyable if its POD type, or a
> derived type consisting only of copyable types?
>   
but the array of structures can; within the structures, the allocatable 
types consist of the descriptor and the character string is there in its 
entirity.  Thus, you need to take the size of the structure from the 
descriptor, multiply by the number of array elements and Bob's your uncle...
> Any help from folks that can actually program in FORTRAN would be much
> appreciated :-).  I'm not sure I speak the language well enough to even
> compose a testcase, let alone the exceptions that I'm unaware of.
>   
I believe that could be arranged :-)   There might already be suitable 
tests in gfortran.dg/alloc_comp*.f90

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-01-05 18:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-04 23:25 [FORTRAN PATCH] Implement a(:) = b(:) using memcpy when possible Roger Sayle
2007-01-05  6:06 ` Steve Kargl
2007-01-05  7:12 ` Paul Thomas
2007-01-05 17:29   ` Roger Sayle
2007-01-05 18:07     ` Paul Thomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).