public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Implement stack arrays even for unknown sizes
@ 2011-04-09 10:08 Dominique Dhumieres
  2011-04-09 12:17 ` Paul Richard Thomas
  0 siblings, 1 reply; 40+ messages in thread
From: Dominique Dhumieres @ 2011-04-09 10:08 UTC (permalink / raw)
  To: fortran; +Cc: gcc-patches, matz

Michael,

I have applied your patch on top of revision 172217 on x86_64-apple-darwin10.7.0.
So far I have only limited tests on the polyhedron test suite.
The test nf.f90 (containing an automatic array) executes in less than 20s, compares
to ~28s without the patch. However capacita.f90 is miscompiled (Segmentation fault
at -O) and fatigue.f90 with -fwhole-program executes in ~7.5s instead of ~6.3s
(with -Ofast).

Cheers,

Dominique

^ permalink raw reply	[flat|nested] 40+ messages in thread
* Implement stack arrays even for unknown sizes
@ 2011-04-08 21:29 Michael Matz
  2011-04-09  8:21 ` N.M. Maclaren
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Matz @ 2011-04-08 21:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: fortran

Hello,

I developed this patch during stage3 of 4.6, so it wasn't appropriate 
then, but now should be a good time.

It adds a new option -fstack-arrays which makes the frontend put 
all local arrays on stack memory.  Even those of non-constant size, by 
using alloca.  Or rather not by explicitely using alloca, but by 
leveraging the middle-ends possibilities of having variable length arrays 
(which that then implements via alloca/stack_save/stack_restore).

Via that we'll get automatic allocation and free (the latter being the 
important part for stack memory) when entering or leaving the scope of the 
so declared arrays.  That means temporary arrays generated for looping 
constructs only need stack memory when the loop is active, not during the 
whole function.

By doing that we save one malloc/free pair per such loop iteration, and 
some benchmarks _heavily_ churn in the libc allocator (tonto for 
instance).  Also in some cases the data layout will be much better 
(e.g. bwaves).  This patch bootstraps fine on x86_64-linux.  Even if I'm 
forcing the use of stack arrays by default the testsuite runs without 
regressions, except for a few cases where regexps need to be changed to 
accept the changed output (e.g. explicitely matching for builtin_free).
The patch as is (i.e. without using stack arrays by default) doesn't 
regress at all.

I haven't rechecked performance now, but four months ago this was the 
result for the fortran benchmarks in cpu2006:

(base is -O3 -g -ffast-math -funroll-loops -fpeel-loops
 peak is + -fstack-arrays)

410.bwaves      13590   1340         10.2   *   13590    813         16.7   *
416.gamess      19580   1400         14.0   *   19580   1410         13.9   *
434.zeusmp       9100    798         11.4   *    9100    799         11.4   *
435.gromacs      7140    622         11.5   *    7140    621         11.5   *
436.cactusADM   11950   1280          9.36  *   11950   1300          9.18 *
437.leslie3d     9400    765         12.3   *    9400    765         12.3   *
454.calculix     8250    734         11.2   *    8250    735         11.2   *
459.GemsFDTD    10610   1070          9.95  *   10610   1060         10.0   *
465.tonto        9840   1090          9.00  *    9840    952         10.3   *
481.wrf         11170    899         12.4   *   11170    794         14.1 *

That is tonto and wrf show nice speedups, and bwaves increases performance 
by 80% (that's data layout, bwaves doesn't heavily call malloc/free).

It should be noted that e.g. ICC does this placement of local arrays on 
stack by default, and that doing it for GCC has similar requirements.  For 
cpu2006 one needs to increase the stack ulimit quite much (around 1G) to 
not segfault.

I would consider enabling stack-arrays for -Ofast, but that would be a 
separate patch.

So, what do you think?


Ciao,
Michael.

	* trans-array.c (toplevel): Include gimple.h.
	(gfc_trans_allocate_array_storage): Check flag_stack_arrays,
	properly expand variable length arrays.
	(gfc_trans_auto_array_allocation): If flag_stack_arrays create
	variable length decls and associate them with their scope.
	* gfortran.h (gfc_option_t): Add flag_stack_arrays member.
	* options.c (gfc_init_options): Handle -fstack_arrays option.
	* lang.opt (fstack-arrays): Add option.
	* invoke.texi (Code Gen Options): Document it.
	* Make-lang.in (trans-array.o): Depend on GIMPLE_H.

Index: trans-array.c
===================================================================
*** trans-array.c	(Revision 172206)
--- trans-array.c	(Arbeitskopie)
*************** along with GCC; see the file COPYING3.
*** 81,86 ****
--- 81,87 ----
  #include "system.h"
  #include "coretypes.h"
  #include "tree.h"
+ #include "gimple.h"
  #include "diagnostic-core.h"	/* For internal_error/fatal_error.  */
  #include "flags.h"
  #include "gfortran.h"
*************** gfc_trans_allocate_array_storage (stmtbl
*** 630,647 ****
      {
        /* Allocate the temporary.  */
        onstack = !dynamic && initial == NULL_TREE
! 			 && gfc_can_put_var_on_stack (size);
  
        if (onstack)
  	{
  	  /* Make a temporary variable to hold the data.  */
  	  tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (nelem),
  				 nelem, gfc_index_one_node);
  	  tmp = build_range_type (gfc_array_index_type, gfc_index_zero_node,
  				  tmp);
  	  tmp = build_array_type (gfc_get_element_type (TREE_TYPE (desc)),
  				  tmp);
  	  tmp = gfc_create_var (tmp, "A");
  	  tmp = gfc_build_addr_expr (NULL_TREE, tmp);
  	  gfc_conv_descriptor_data_set (pre, desc, tmp);
  	}
--- 631,654 ----
      {
        /* Allocate the temporary.  */
        onstack = !dynamic && initial == NULL_TREE
! 			 && (gfc_option.flag_stack_arrays
! 			     || gfc_can_put_var_on_stack (size));
  
        if (onstack)
  	{
  	  /* Make a temporary variable to hold the data.  */
  	  tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (nelem),
  				 nelem, gfc_index_one_node);
+ 	  tmp = gfc_evaluate_now (tmp, pre);
  	  tmp = build_range_type (gfc_array_index_type, gfc_index_zero_node,
  				  tmp);
  	  tmp = build_array_type (gfc_get_element_type (TREE_TYPE (desc)),
  				  tmp);
  	  tmp = gfc_create_var (tmp, "A");
+ 	  gfc_add_expr_to_block (pre,
+ 				 fold_build1_loc (input_location,
+ 						  DECL_EXPR, TREE_TYPE (tmp),
+ 						  tmp));
  	  tmp = gfc_build_addr_expr (NULL_TREE, tmp);
  	  gfc_conv_descriptor_data_set (pre, desc, tmp);
  	}
*************** gfc_trans_auto_array_allocation (tree de
*** 4744,4749 ****
--- 4751,4758 ----
    tree tmp;
    tree size;
    tree offset;
+   tree space;
+   tree inittree;
    bool onstack;
  
    gcc_assert (!(sym->attr.pointer || sym->attr.allocatable));
*************** gfc_trans_auto_array_allocation (tree de
*** 4800,4814 ****
        return;
      }
  
!   /* The size is the number of elements in the array, so multiply by the
!      size of an element to get the total size.  */
!   tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
!   size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
! 			  size, fold_convert (gfc_array_index_type, tmp));
  
!   /* Allocate memory to hold the data.  */
!   tmp = gfc_call_malloc (&init, TREE_TYPE (decl), size);
!   gfc_add_modify (&init, decl, tmp);
  
    /* Set offset of the array.  */
    if (TREE_CODE (GFC_TYPE_ARRAY_OFFSET (type)) == VAR_DECL)
--- 4809,4846 ----
        return;
      }
  
!   if (gfc_option.flag_stack_arrays)
!     {
!       tree addr;
!       gcc_assert (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE);
!       space = build_decl (sym->declared_at.lb->location,
! 			  VAR_DECL, create_tmp_var_name ("A"),
! 			  TREE_TYPE (TREE_TYPE (decl)));
!       gfc_trans_vla_type_sizes (sym, &init);
!       tmp = fold_build1_loc (input_location, DECL_EXPR,
! 			     TREE_TYPE (space), space);
!       gfc_add_expr_to_block (&init, tmp);
!       addr = fold_build1_loc (sym->declared_at.lb->location,
! 			      ADDR_EXPR, TREE_TYPE (decl), space);
!       gfc_add_modify (&init, decl, addr);
!       tmp = NULL_TREE;
!     }
!   else
!     {
!       /* The size is the number of elements in the array, so multiply by the
! 	 size of an element to get the total size.  */
!       tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
!       size = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
! 			      size, fold_convert (gfc_array_index_type, tmp));
! 
!       /* Allocate memory to hold the data.  */
!       tmp = gfc_call_malloc (&init, TREE_TYPE (decl), size);
!       gfc_add_modify (&init, decl, tmp);
  
!       /* Free the temporary.  */
!       tmp = gfc_call_free (convert (pvoid_type_node, decl));
!       space = NULL_TREE;
!     }
  
    /* Set offset of the array.  */
    if (TREE_CODE (GFC_TYPE_ARRAY_OFFSET (type)) == VAR_DECL)
*************** gfc_trans_auto_array_allocation (tree de
*** 4817,4826 ****
    /* Automatic arrays should not have initializers.  */
    gcc_assert (!sym->value);
  
!   /* Free the temporary.  */
!   tmp = gfc_call_free (convert (pvoid_type_node, decl));
! 
!   gfc_add_init_cleanup (block, gfc_finish_block (&init), tmp);
  }
  
  
--- 4849,4858 ----
    /* Automatic arrays should not have initializers.  */
    gcc_assert (!sym->value);
  
!   inittree = gfc_finish_block (&init);
!   if (space)
!     pushdecl (space);
!   gfc_add_init_cleanup (block, inittree, tmp);
  }
  
  
Index: Make-lang.in
===================================================================
*** Make-lang.in	(Revision 172206)
--- Make-lang.in	(Arbeitskopie)
*************** fortran/trans-stmt.o: $(GFORTRAN_TRANS_D
*** 353,359 ****
  fortran/trans-openmp.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-io.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-io.h \
    fortran/ioparm.def
! fortran/trans-array.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-intrinsic.o: $(GFORTRAN_TRANS_DEPS) fortran/mathbuiltins.def \
    gt-fortran-trans-intrinsic.h
  fortran/dependency.o: $(GFORTRAN_TRANS_DEPS) fortran/dependency.h
--- 353,359 ----
  fortran/trans-openmp.o: $(GFORTRAN_TRANS_DEPS)
  fortran/trans-io.o: $(GFORTRAN_TRANS_DEPS) gt-fortran-trans-io.h \
    fortran/ioparm.def
! fortran/trans-array.o: $(GFORTRAN_TRANS_DEPS) $(GIMPLE_H)
  fortran/trans-intrinsic.o: $(GFORTRAN_TRANS_DEPS) fortran/mathbuiltins.def \
    gt-fortran-trans-intrinsic.h
  fortran/dependency.o: $(GFORTRAN_TRANS_DEPS) fortran/dependency.h
Index: gfortran.h
===================================================================
*** gfortran.h	(Revision 172206)
--- gfortran.h	(Arbeitskopie)
*************** typedef struct
*** 2220,2225 ****
--- 2220,2226 ----
    int flag_d_lines;
    int gfc_flag_openmp;
    int flag_sign_zero;
+   int flag_stack_arrays;
    int flag_module_private;
    int flag_recursive;
    int flag_init_local_zero;
Index: lang.opt
===================================================================
*** lang.opt	(Revision 172206)
--- lang.opt	(Arbeitskopie)
*************** fmax-stack-var-size=
*** 454,459 ****
--- 454,463 ----
  Fortran RejectNegative Joined UInteger
  -fmax-stack-var-size=<n>	Size in bytes of the largest array that will be put on the stack
  
+ fstack-arrays
+ Fortran
+ Put all local arrays on stack.
+ 
  fmodule-private
  Fortran
  Set default accessibility of module entities to PRIVATE.
Index: invoke.texi
===================================================================
*** invoke.texi	(Revision 172206)
--- invoke.texi	(Arbeitskopie)
*************** and warnings}.
*** 167,172 ****
--- 167,173 ----
  -fbounds-check -fcheck-array-temporaries  -fmax-array-constructor =@var{n} @gol
  -fcheck=@var{<all|array-temps|bounds|do|mem|pointer|recursion>} @gol
  -fcoarray=@var{<none|single|lib>} -fmax-stack-var-size=@var{n} @gol
+ -fstack-arrays @gol
  -fpack-derived  -frepack-arrays  -fshort-enums  -fexternal-blas @gol
  -fblas-matmul-limit=@var{n} -frecursive -finit-local-zero @gol
  -finit-integer=@var{n} -finit-real=@var{<zero|inf|-inf|nan|snan>} @gol
*************** Future versions of GNU Fortran may impro
*** 1361,1366 ****
--- 1362,1374 ----
  
  The default value for @var{n} is 32768.
  
+ @item -fstack-arrays
+ @opindex @code{fstack-arrays}
+ Adding this option will make the fortran compiler put all local arrays,
+ even those of unknown size onto stack memory.  If your program uses very
+ large local arrays it's possible that you'll have to extend your runtime
+ limits for stack memory on some operating systems.
+ 
  @item -fpack-derived
  @opindex @code{fpack-derived}
  @cindex structure packing
Index: options.c
===================================================================
*** options.c	(Revision 172206)
--- options.c	(Arbeitskopie)
*************** gfc_init_options (unsigned int decoded_o
*** 123,128 ****
--- 123,129 ----
  
    /* Default value of flag_max_stack_var_size is set in gfc_post_options.  */
    gfc_option.flag_max_stack_var_size = -2;
+   gfc_option.flag_stack_arrays = 0;
  
    gfc_option.flag_range_check = 1;
    gfc_option.flag_pack_derived = 0;
*************** gfc_handle_option (size_t scode, const c
*** 783,788 ****
--- 784,793 ----
        gfc_option.flag_max_stack_var_size = value;
        break;
  
+     case OPT_fstack_arrays:
+       gfc_option.flag_stack_arrays = value;
+       break;
+ 
      case OPT_fmodule_private:
        gfc_option.flag_module_private = value;
        break;

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-04-15 21:49 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-09 10:08 Implement stack arrays even for unknown sizes Dominique Dhumieres
2011-04-09 12:17 ` Paul Richard Thomas
2011-04-10 13:29   ` Dominique Dhumieres
2011-04-11 11:49     ` Michael Matz
2011-04-11 11:58       ` Axel Freyn
2011-04-11 13:35       ` Eric Botcazou
2011-04-11 14:06         ` Dominique Dhumieres
2011-04-11 14:59         ` Michael Matz
2011-04-11 16:04   ` Michael Matz
2011-04-11 16:45     ` Steven Bosscher
2011-04-12 11:54       ` Michael Matz
2011-04-12 12:42         ` N.M. Maclaren
2011-04-11 16:46     ` Dominique Dhumieres
2011-04-11 16:59     ` H.J. Lu
2011-04-11 17:06       ` N.M. Maclaren
2011-04-11 18:28     ` Tobias Burnus
2011-04-11 21:18     ` Dominique Dhumieres
2011-04-12  6:23     ` Paul Richard Thomas
2011-04-12  6:35       ` Dominique Dhumieres
2011-04-13  9:42         ` Paul Richard Thomas
2011-04-13 10:38           ` Richard Guenther
2011-04-13 11:13             ` Richard Guenther
2011-04-14 13:32               ` Dominique Dhumieres
2011-04-13 13:46             ` Paul Richard Thomas
2011-04-14 13:59         ` Michael Matz
2011-04-14 15:28           ` Michael Matz
2011-04-14 19:29             ` Dominique Dhumieres
2011-04-15  2:01             ` Dominique Dhumieres
2011-04-15 12:38               ` Michael Matz
2011-04-15 14:06                 ` Dominique Dhumieres
2011-04-15 15:19                   ` Michael Matz
2011-04-15 15:26                     ` Jerry DeLisle
2011-04-15 22:41                       ` Michael Matz
2011-04-14 20:23     ` Tobias Burnus
2011-04-14 20:30       ` Tobias Burnus
  -- strict thread matches above, loose matches on Subject: below --
2011-04-08 21:29 Michael Matz
2011-04-09  8:21 ` N.M. Maclaren
2011-04-09  8:51   ` Magnus Fromreide
2011-04-09  9:02   ` Eric Botcazou
2011-04-09  9:49     ` N.M. Maclaren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).