public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch 0/4] Prefetch patches for s390, debug and the heuristic V2
@ 2010-05-07 14:05 Christian Borntraeger
  2010-05-07 14:05 ` [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps Christian Borntraeger
                   ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 14:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang, Andreas Krebbel

This is the second version of my patches agains the prefetch code.
I adopted the comments from Richard, Zdenek and Changpeng. Thank you
for your review and comments.

patch1 does some initial settings for s390x.
patch2 adds some more debugging if prefetches are omitted.
patch3 tries to improve the prefetching for large prefetch_before values
patch4 tries to implement prefetching for loops with non-constant but
loop invariant step.

All patches are bootstrapped on s390x.
Ok to apply?

Christian



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 3/4] Prefetch: Ignore large values of prefetch_before
  2010-05-07 14:05 [patch 0/4] Prefetch patches for s390, debug and the heuristic V2 Christian Borntraeger
  2010-05-07 14:05 ` [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps Christian Borntraeger
  2010-05-07 14:05 ` [patch 1/4] Set sane prefetch values for s390x Christian Borntraeger
@ 2010-05-07 14:05 ` Christian Borntraeger
  2010-05-07 14:51   ` Zdenek Dvorak
  2010-05-07 14:05 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
  3 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 14:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang, Andreas Krebbel

[-- Attachment #1: prefetch_avoid_large_before.patch --]
[-- Type: text/plain, Size: 2590 bytes --]

There is a heuristic in the prefetch code to detect multiple accesses
that will "meet" somewhen in the future. The prefetch code assumes
that at this point one access is already "prefetched" by the other one.
The prefetch code, therefore, sets prefetch_before stating that only
the first prefetch_before iterations need a prefetch.

The problem is that the current prefetch code does not issue a prefetch
if prefetch_before is set.
Most of the time this is not a problem, but with large prefetch_before
settings we might have exceeded the cache before the reuse really
happens or prefetch_before might be in the same order of magnitude as
the amount of loop iterations.

This patch tries to address the first problem. We simply reset
prefetch_before to PREFETCH_ALL if prefetch_before is too large.
The amount of L2 cache together with the step size or cache line
size is used as a breaking point.

I did not check in should_issue_prefetch_p, because I think we should
distinguish (step > PREFETCH_BLOCK) and (step < PREFETCH_BLOCK).

On s390 this patch results in a 6.6% win for lbm. All other tests are
within the noise ratio with small losses and wins. 

Bootstrapped and tested on s390x-ibm-linux-gnu.

Ok to apply?

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Reset
	prefetch_before to PREFETCH_ALL if to accesses "meet" beyond
	cache size.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** prune_ref_by_group_reuse (struct mem_ref
*** 704,709 ****
--- 704,712 ----
        hit_from = ddown (delta_b, PREFETCH_BLOCK) * PREFETCH_BLOCK;
        prefetch_before = (hit_from - delta_r + step - 1) / step;
  
+       /* Do not reduce prefetch_before if we meet beyond cache size */
+       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / step))
+         prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
  
*************** prune_ref_by_group_reuse (struct mem_ref
*** 734,739 ****
--- 737,745 ----
  				reduced_prefetch_block, align_unit);
    if (miss_rate <= ACCEPTABLE_MISS_RATE)
      {
+       /* Do not reduce prefetch_before if we meet beyond cache size */
+       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / PREFETCH_BLOCK))
+         prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
  

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 1/4] Set sane prefetch values for s390x
  2010-05-07 14:05 [patch 0/4] Prefetch patches for s390, debug and the heuristic V2 Christian Borntraeger
  2010-05-07 14:05 ` [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps Christian Borntraeger
@ 2010-05-07 14:05 ` Christian Borntraeger
  2010-05-07 14:40   ` Zdenek Dvorak
  2010-05-07 14:46   ` Jakub Jelinek
  2010-05-07 14:05 ` [patch 3/4] Prefetch: Ignore large values of prefetch_before Christian Borntraeger
  2010-05-07 14:05 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
  3 siblings, 2 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 14:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang, Andreas Krebbel

[-- Attachment #1: prefetch_hw_settings-2.patch --]
[-- Type: text/plain, Size: 1949 bytes --]

This patch sets some sane values for -fprefetch-loop-arrays on s390.
Some of these settings also help to improve the performance of the
2 other performance patches.

Bootstrapped and tested on s390x-ibm-linux-gnu.

Ok to apply?

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * config/s390/s390.c: Define sane prefetch settings.
        * config/s390/s390.h: Declare that read can use write prefetch.

Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** override_options (void)
*** 1661,1666 ****
--- 1661,1679 ----
      }
  
    set_param_value ("max-pending-list-length", 256);
+   /* values for loop prefetching */
+   set_param_value ("l1-cache-line-size", 256);
+   if (!PARAM_SET_P (PARAM_L1_CACHE_SIZE))
+     set_param_value ("l1-cache-size",128);
+   /* s390 has more than 2 levels and the size is much larger. Since
+    * we are always running virtualized assume that we only get a small
+    * part of the caches above l1 */
+   if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
+     set_param_value ("l2-cache-size",1500);
+   if (!PARAM_SET_P (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO))
+     set_param_value ("prefetch-min-insn-to-mem-ratio", 2);
+   if (!PARAM_SET_P (PARAM_SIMULTANEOUS_PREFETCHES))
+     set_param_value ("simultaneous-prefetches", 6);
  }
  
  /* Map for smallest class containing reg regno.  */
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h.orig
--- gcc/config/s390/s390.h
*************** do {									\
*** 989,992 ****
--- 989,994 ----
    (TARGET_LONG_DISPLACEMENT? ((d) >= -524288 && (d) <= 524287) \
                             : ((d) >= 0 && (d) <= 4095))
  
+ /* reads can reuse write prefetches, used by tree-ssa-prefetch-loops.c */
+ #define READ_CAN_USE_WRITE_PREFETCH 1
  #endif

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 14:05 [patch 0/4] Prefetch patches for s390, debug and the heuristic V2 Christian Borntraeger
@ 2010-05-07 14:05 ` Christian Borntraeger
  2010-05-07 15:00   ` Zdenek Dvorak
  2010-05-07 14:05 ` [patch 1/4] Set sane prefetch values for s390x Christian Borntraeger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 14:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang, Andreas Krebbel

[-- Attachment #1: prefetch_invariant-2.patch --]
[-- Type: text/plain, Size: 14992 bytes --]

This patch changes the prefetch code to accept non-constant step sizes
for memory references in loops. If the step size is a constant integer
the patch should not change the current behaviour. 

This patch uses a heuristic to speculatively activate prefetching for
non-constant steps by prefetching "ahead" iterations in advance.

This patch improves some testcases where the old prefetch code did not
improve the performance (e.g. mcf).

Bootstrapped and tested on s390x-ibm-linux-gnu.

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
        * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
        * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to 
        handle a tree as step. this also checks for a constant int vs.
        non-constant but loop-invariant steps.
        * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort 
        algorithm to only consider steps that are constant ints.
        * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
        a tree instead of a HOST_WIDE_INT for step.
        * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
        tree instead of int and be prepared to see a NULL_TREE.
        * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
        for non-constant but loop-invariant steps.
 
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** along with GCC; see the file COPYING3.  
*** 204,210 ****
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   HOST_WIDE_INT step;		/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
--- 204,210 ----
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   tree step;			/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 248,254 ****
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->group->step);
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
--- 248,257 ----
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   if (cst_and_fits_in_hwi(ref->group->step))
!     fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value(ref->group->step));
!   else
!     fprintf (file, "is non-constant but loop-invariant");
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 264,282 ****
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base,
! 		      HOST_WIDE_INT step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if ((*groups)->step == step
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* Keep the list of groups sorted by decreasing step.  */
!       if ((*groups)->step < step)
  	break;
      }
  
--- 267,286 ----
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base, tree step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if (operand_equal_p ((*groups)->step, step, 0)
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* If step is an integer constant, keep the list of groups sorted
!          by decreasing step.  */
!         if (cst_and_fits_in_hwi((*groups)->step) && cst_and_fits_in_hwi(step)
!             && (int_cst_value((*groups)->step) < int_cst_value(step)))
  	break;
      }
  
*************** struct ar_data
*** 361,367 ****
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   HOST_WIDE_INT *step;			/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
--- 365,371 ----
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   tree *step;				/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
*************** idx_analyze_ref (tree base, tree *index,
*** 373,379 ****
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT istep, idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
--- 377,383 ----
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
*************** idx_analyze_ref (tree base, tree *index,
*** 381,395 ****
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, false))
      return false;
    ibase = iv.base;
    step = iv.step;
  
-   if (!cst_and_fits_in_hwi (step))
-     return false;
-   istep = int_cst_value (step);
- 
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
--- 385,395 ----
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, true))
      return false;
    ibase = iv.base;
    step = iv.step;
  
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
*************** idx_analyze_ref (tree base, tree *index,
*** 402,407 ****
--- 402,413 ----
        ibase = build_int_cst (TREE_TYPE (ibase), 0);
      }
  
+   if (*ar_data->step == NULL_TREE)
+     *ar_data->step = step;
+   else
+     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
+ 				  fold_convert(sizetype, *ar_data->step),
+ 				  fold_convert(sizetype, step));
    if (TREE_CODE (base) == ARRAY_REF)
      {
        stepsize = array_ref_element_size (base);
*************** idx_analyze_ref (tree base, tree *index,
*** 409,419 ****
  	return false;
        imult = int_cst_value (stepsize);
  
!       istep *= imult;
        idelta *= imult;
      }
  
-   *ar_data->step += istep;
    *ar_data->delta += idelta;
    *index = ibase;
  
--- 415,426 ----
  	return false;
        imult = int_cst_value (stepsize);
  
!       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
! 				    fold_convert(sizetype, *ar_data->step),
! 				    fold_convert(sizetype, step));
        idelta *= imult;
      }
  
    *ar_data->delta += idelta;
    *index = ibase;
  
*************** idx_analyze_ref (tree base, tree *index,
*** 427,433 ****
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     HOST_WIDE_INT *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
--- 434,440 ----
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     tree *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
*************** analyze_ref (struct loop *loop, tree *re
*** 435,441 ****
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = 0;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
--- 442,448 ----
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = NULL_TREE;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
*************** static bool
*** 470,477 ****
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base;
!   HOST_WIDE_INT step, delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
--- 477,484 ----
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base, step;
!   HOST_WIDE_INT delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
*************** gather_memory_references_ref (struct loo
*** 479,484 ****
--- 486,494 ----
  
    if (!analyze_ref (loop, &ref, &base, &step, &delta, stmt))
      return false;
+   /* if analyze_ref fails the default is a NULL_TREE. we can stop here */
+   if (step == NULL_TREE)
+     return false;
  
    /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
       are integer constants.  */
*************** gather_memory_references (struct loop *l
*** 553,560 ****
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
  
    if (step == 0)
      {
--- 563,578 ----
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step;
!   bool backward;
! 
!   /* if the step size is non constant, we cannot calculate prefetch_mod */
!   if (!cst_and_fits_in_hwi (ref->group->step))
!     return;
! 
!   step = int_cst_value (ref->group->step);
! 
!   backward = step < 0;
  
    if (step == 0)
      {
*************** static void
*** 638,645 ****
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
--- 656,663 ----
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step;
!   bool backward;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
*************** prune_ref_by_group_reuse (struct mem_ref
*** 650,655 ****
--- 668,683 ----
    tree ref_type;
    int align_unit;
  
+   /* if the step is non constant we cannot calculate prefetch_before */
+   if (!cst_and_fits_in_hwi (ref->group->step)) {
+     return;
+   }
+ 
+   step = int_cst_value (ref->group->step);
+ 
+   backward = step < 0;
+ 
+ 
    if (delta == 0)
      {
        /* If the references has the same address, only prefetch the
*************** prune_ref_by_group_reuse (struct mem_ref
*** 705,711 ****
        prefetch_before = (hit_from - delta_r + step - 1) / step;
  
        /* Do not reduce prefetch_before if we meet beyond cache size */
!       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / step))
          prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
--- 733,739 ----
        prefetch_before = (hit_from - delta_r + step - 1) / step;
  
        /* Do not reduce prefetch_before if we meet beyond cache size */
!       if (prefetch_before * abs(step) > L2_CACHE_SIZE_BYTES)
          prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
*************** prune_ref_by_group_reuse (struct mem_ref
*** 738,744 ****
    if (miss_rate <= ACCEPTABLE_MISS_RATE)
      {
        /* Do not reduce prefetch_before if we meet beyond cache size */
!       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / PREFETCH_BLOCK))
          prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
--- 766,772 ----
    if (miss_rate <= ACCEPTABLE_MISS_RATE)
      {
        /* Do not reduce prefetch_before if we meet beyond cache size */
!       if (prefetch_before * PREFETCH_BLOCK > L2_CACHE_SIZE_BYTES)
          prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
*************** static void
*** 957,963 ****
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
--- 985,991 ----
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local, forward;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
*************** issue_prefetch_ref (struct mem_ref *ref,
*** 980,992 ****
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       /* Determine the address to prefetch.  */
!       delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
!       addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
! 			  addr_base, size_int (delta));
!       addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
! 				       true, GSI_SAME_STMT);
! 
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);
--- 1008,1035 ----
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       if (cst_and_fits_in_hwi (ref->group->step))
!         {
!           /* Determine the address to prefetch.  */
!           delta = (ahead + ap * ref->prefetch_mod) *
! 		   int_cst_value (ref->group->step);
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
!                               addr_base, size_int (delta));
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
!                                            true, GSI_SAME_STMT);
!         }
!       else
!         {
!           /* the step size is non-constant but loop-invariant. We use the
!              heuristic to simply prefetch ahead iterations ahead. */
!           forward = fold_build2 (MULT_EXPR, sizetype,
!                                  fold_convert(sizetype, ref->group->step),
!                                  fold_convert(sizetype, size_int(ahead)));
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node, addr_base,
! 			      forward);
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true,
! 					   NULL, true, GSI_SAME_STMT);
!       }
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 2/4] Add debug information if prefetches are not issued
  2010-05-07 14:05 [patch 0/4] Prefetch patches for s390, debug and the heuristic V2 Christian Borntraeger
                   ` (2 preceding siblings ...)
  2010-05-07 14:05 ` [patch 3/4] Prefetch: Ignore large values of prefetch_before Christian Borntraeger
@ 2010-05-07 14:05 ` Christian Borntraeger
  2010-05-11  7:40   ` Andreas Krebbel
  3 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 14:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang, Andreas Krebbel

[-- Attachment #1: prefetch_debug.patch --]
[-- Type: text/plain, Size: 1434 bytes --]

These debug statements helped me to understand why a prefetch was not
issued. 

Bootstrapped and tested on s390x-ibm-linux-gnu.
Richard said this patch is ok for 4.6.

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** should_issue_prefetch_p (struct mem_ref 
*** 848,858 ****
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     return false;
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     return false;
  
    return true;
  }
--- 848,867 ----
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring %p due to prefetch_before\n",
! 		 (void *) ref);
!       return false;
!     }
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
!       return false;
!     }
  
    return true;
  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4] Set sane prefetch values for s390x
  2010-05-07 14:05 ` [patch 1/4] Set sane prefetch values for s390x Christian Borntraeger
@ 2010-05-07 14:40   ` Zdenek Dvorak
  2010-05-07 14:46   ` Jakub Jelinek
  1 sibling, 0 replies; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-07 14:40 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Hi,

> This patch sets some sane values for -fprefetch-loop-arrays on s390.
> Some of these settings also help to improve the performance of the
> 2 other performance patches.
> 
> Bootstrapped and tested on s390x-ibm-linux-gnu.
> 
> Ok to apply?

while this patch seems OK to me, I probably do not have the right to approve it,
so someone else will have to review it,

Zdenek

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4] Set sane prefetch values for s390x
  2010-05-07 14:05 ` [patch 1/4] Set sane prefetch values for s390x Christian Borntraeger
  2010-05-07 14:40   ` Zdenek Dvorak
@ 2010-05-07 14:46   ` Jakub Jelinek
  2010-05-07 15:25     ` [patch 1/4 v3] " Christian Borntraeger
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2010-05-07 14:46 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Zdenek Dvorak, Changpeng Fang,
	Andreas Krebbel

Hi!

On Fri, May 07, 2010 at 03:59:19PM +0200, Christian Borntraeger wrote:

This patch (and other patches too) have formatting nits.

> *** gcc/config/s390/s390.c.orig
> --- gcc/config/s390/s390.c
> *************** override_options (void)
> *** 1661,1666 ****
> --- 1661,1679 ----
>       }
>   
>     set_param_value ("max-pending-list-length", 256);
> +   /* values for loop prefetching */
> +   set_param_value ("l1-cache-line-size", 256);
> +   if (!PARAM_SET_P (PARAM_L1_CACHE_SIZE))
> +     set_param_value ("l1-cache-size",128);

Missing space after ,

> +   /* s390 has more than 2 levels and the size is much larger. Since
> +    * we are always running virtualized assume that we only get a small
> +    * part of the caches above l1 */

This is not the standard comment style.  No leading * on continuation lines,
two spaces instead of just one after . and missing dot (and one extra space)
at the end of the comment.

Other patch(es) miss a space before (.

> + /* reads can reuse write prefetches, used by tree-ssa-prefetch-loops.c */

Make this a sentence (Uppercase R, dot and extra space at the end of the
comment).

> + #define READ_CAN_USE_WRITE_PREFETCH 1
>   #endif

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/4] Prefetch: Ignore large values of prefetch_before
  2010-05-07 14:05 ` [patch 3/4] Prefetch: Ignore large values of prefetch_before Christian Borntraeger
@ 2010-05-07 14:51   ` Zdenek Dvorak
  2010-05-07 15:37     ` [patch 3/4 v3] " Christian Borntraeger
  0 siblings, 1 reply; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-07 14:51 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Hi,

> This patch tries to address the first problem. We simply reset
> prefetch_before to PREFETCH_ALL if prefetch_before is too large.
> The amount of L2 cache together with the step size or cache line
> size is used as a breaking point.

OK, but:

> +       /* Do not reduce prefetch_before if we meet beyond cache size */
> +       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / step))
> +         prefetch_before = PREFETCH_ALL;

Missing space after "abs".

> +       /* Do not reduce prefetch_before if we meet beyond cache size */
> +       if (prefetch_before > abs(L2_CACHE_SIZE_BYTES / PREFETCH_BLOCK))

No need for "abs" here.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 14:05 ` [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps Christian Borntraeger
@ 2010-05-07 15:00   ` Zdenek Dvorak
  2010-05-07 15:17     ` Christian Borntraeger
  2010-05-07 16:20     ` [patch 4/4 v3] " Christian Borntraeger
  0 siblings, 2 replies; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-07 15:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Hi,

> This patch changes the prefetch code to accept non-constant step sizes
> for memory references in loops. If the step size is a constant integer
> the patch should not change the current behaviour. 
> 
> This patch uses a heuristic to speculatively activate prefetching for
> non-constant steps by prefetching "ahead" iterations in advance.
> 
> This patch improves some testcases where the old prefetch code did not
> improve the performance (e.g. mcf).

OK, modulo coding style fixes:

>         * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to 
>         handle a tree as step. this also checks for a constant int vs.
>         non-constant but loop-invariant steps.

capital T in "this".

> !   if (cst_and_fits_in_hwi(ref->group->step))
> !     fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value(ref->group->step));

Missing space before (.  Repeats many times throughout the patch.

> !   else
> !     fprintf (file, "is non-constant but loop-invariant");

It might be useful to dump the step in this case, anyway.

>     if (!analyze_ref (loop, &ref, &base, &step, &delta, stmt))
>       return false;
> +   /* if analyze_ref fails the default is a NULL_TREE. we can stop here */

Capital letters at the starts of sentences.  Dot at the end.

> !   /* if the step size is non constant, we cannot calculate prefetch_mod */

Ditto.

> !           /* the step size is non-constant but loop-invariant. We use the
> !              heuristic to simply prefetch ahead iterations ahead. */

Ditto.

Zdenek

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 15:00   ` Zdenek Dvorak
@ 2010-05-07 15:17     ` Christian Borntraeger
  2010-05-07 16:20     ` [patch 4/4 v3] " Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 15:17 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Am Freitag 07 Mai 2010 17:00:10 schrieb Zdenek Dvorak:
> > !   else
> > !     fprintf (file, "is non-constant but loop-invariant");
> 
> It might be useful to dump the step in this case, anyway.

Ok to all coding style issues, but how should I dump the tree that
forms the step?  Do you want a pointer or do you want the tree parsed?h

Christian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 1/4 v3] Set sane prefetch values for s390x
  2010-05-07 14:46   ` Jakub Jelinek
@ 2010-05-07 15:25     ` Christian Borntraeger
  2010-05-11  7:19       ` Andreas Krebbel
  0 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 15:25 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Richard Guenther, Zdenek Dvorak, Changpeng Fang,
	Andreas Krebbel

Am Freitag 07 Mai 2010 16:53:54 schrieb Jakub Jelinek:
> This patch (and other patches too) have formatting nits. 
Ok. Updated Patch1:

Set sane prefetch values for s390x

This patch sets some sane values for -fprefetch-loop-arrays on s390.
Some of these settings also help to improve the performance of the
2 other performance patches.

Bootstrapped and tested on s390x-ibm-linux-gnu.

Ok to apply?

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * config/s390/s390.c: Define sane prefetch settings.
        * config/s390/s390.h: Declare that read can use write prefetch.

Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** override_options (void)
*** 1661,1666 ****
--- 1661,1679 ----
      }
  
    set_param_value ("max-pending-list-length", 256);
+   /* values for loop prefetching */
+   set_param_value ("l1-cache-line-size", 256);
+   if (!PARAM_SET_P (PARAM_L1_CACHE_SIZE))
+     set_param_value ("l1-cache-size", 128);
+   /* s390 has more than 2 levels and the size is much larger.  Since
+      we are always running virtualized assume that we only get a small
+      part of the caches above l1.  */
+   if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
+     set_param_value ("l2-cache-size", 1500);
+   if (!PARAM_SET_P (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO))
+     set_param_value ("prefetch-min-insn-to-mem-ratio", 2);
+   if (!PARAM_SET_P (PARAM_SIMULTANEOUS_PREFETCHES))
+     set_param_value ("simultaneous-prefetches", 6);
  }
  
  /* Map for smallest class containing reg regno.  */
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h.orig
--- gcc/config/s390/s390.h
*************** do {									\
*** 989,992 ****
--- 989,994 ----
    (TARGET_LONG_DISPLACEMENT? ((d) >= -524288 && (d) <= 524287) \
                             : ((d) >= 0 && (d) <= 4095))
  
+ /* Reads can reuse write prefetches, used by tree-ssa-prefetch-loops.c.  */
+ #define READ_CAN_USE_WRITE_PREFETCH 1
  #endif

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 3/4 v3] Prefetch: Ignore large values of prefetch_before
  2010-05-07 14:51   ` Zdenek Dvorak
@ 2010-05-07 15:37     ` Christian Borntraeger
  2010-05-11  7:41       ` Andreas Krebbel
  0 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 15:37 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Am Freitag 07 Mai 2010 16:51:41 schrieb Zdenek Dvorak:
> Missing space after "abs".
[...]
> No need for "abs" here.

Right. I also added a dot space at the end of the comments.



Prefetch: Ignore large values of prefetch_before

There is a heuristic in the prefetch code to detect multiple accesses
that will "meet" somewhen in the future. The prefetch code assumes
that at this point one access is already "prefetched" by the other one.
The prefetch code, therefore, sets prefetch_before stating that only
the first prefetch_before iterations need a prefetch.

The problem is that the current prefetch code does not issue a prefetch
if prefetch_before is set.
Most of the time this is not a problem, but with large prefetch_before
settings we might have exceeded the cache before the reuse really
happens or prefetch_before might be in the same order of magnitude as
the amount of loop iterations.

This patch tries to address the first problem. We simply reset
prefetch_before to PREFETCH_ALL if prefetch_before is too large.
The amount of L2 cache together with the step size or cache line
size is used as a breaking point.

I did not check in should_issue_prefetch_p, because I think we should
distinguish (step > PREFETCH_BLOCK) and (step < PREFETCH_BLOCK).

On s390 this patch results in a 6.6% win for lbm. All other tests are
within the noise ratio with small losses and wins. 

Bootstrapped and tested on s390x-ibm-linux-gnu.

Ok to apply?

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Reset
	prefetch_before to PREFETCH_ALL if to accesses "meet" beyond
	cache size.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** prune_ref_by_group_reuse (struct mem_ref
*** 704,709 ****
--- 704,712 ----
        hit_from = ddown (delta_b, PREFETCH_BLOCK) * PREFETCH_BLOCK;
        prefetch_before = (hit_from - delta_r + step - 1) / step;
  
+       /* Do not reduce prefetch_before if we meet beyond cache size.  */
+       if (prefetch_before > abs (L2_CACHE_SIZE_BYTES / step))
+         prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
  
*************** prune_ref_by_group_reuse (struct mem_ref
*** 734,739 ****
--- 737,745 ----
  				reduced_prefetch_block, align_unit);
    if (miss_rate <= ACCEPTABLE_MISS_RATE)
      {
+       /* Do not reduce prefetch_before if we meet beyond cache size.  */
+       if (prefetch_before > L2_CACHE_SIZE_BYTES / PREFETCH_BLOCK)
+         prefetch_before = PREFETCH_ALL;
        if (prefetch_before < ref->prefetch_before)
  	ref->prefetch_before = prefetch_before;
  

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 4/4 v3] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 15:00   ` Zdenek Dvorak
  2010-05-07 15:17     ` Christian Borntraeger
@ 2010-05-07 16:20     ` Christian Borntraeger
  2010-05-07 17:02       ` Zdenek Dvorak
  1 sibling, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 16:20 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Am Freitag 07 Mai 2010 17:00:10 schrieb Zdenek Dvorak:
> OK, modulo coding style fixes:
Lets hope that I found everything.

> [...]
> It might be useful to dump the step in this case, anyway.
Ok, I now use print_node_brief and the output looks like:
  group 0x80affad0 (base *arc_45, step is non-constant and loop-invariant: <nop_expr 0x20000802150>)
> [...]




Allow loop prefetch code to speculatively prefetch non constant steps

This patch changes the prefetch code to accept non-constant step sizes
for memory references in loops. If the step size is a constant integer
the patch should not change the current behaviour. 

This patch uses a heuristic to speculatively activate prefetching for
non-constant steps by prefetching "ahead" iterations in advance.

This patch improves some testcases where the old prefetch code did not
improve the performance (e.g. mcf).

Bootstrapped and tested on s390x-ibm-linux-gnu.

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
        * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
        * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to 
        handle a tree as step.  This also checks for a constant int vs.
        non-constant but loop-invariant steps.
        * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort 
        algorithm to only consider steps that are constant ints.
        * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
        a tree instead of a HOST_WIDE_INT for step.
        * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
        tree instead of int and be prepared to see a NULL_TREE.
        * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
        for non-constant but loop-invariant steps.
 
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** along with GCC; see the file COPYING3.  
*** 204,210 ****
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   HOST_WIDE_INT step;		/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
--- 204,210 ----
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   tree step;			/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 248,254 ****
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->group->step);
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
--- 248,258 ----
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   if (cst_and_fits_in_hwi (ref->group->step))
!     fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (ref->group->step));
!   else
!     print_node_brief (file, "is non-constant and loop-invariant:",
! 		      ref->group->step, 0);
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 264,282 ****
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base,
! 		      HOST_WIDE_INT step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if ((*groups)->step == step
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* Keep the list of groups sorted by decreasing step.  */
!       if ((*groups)->step < step)
  	break;
      }
  
--- 268,287 ----
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base, tree step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if (operand_equal_p ((*groups)->step, step, 0)
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* If step is an integer constant, keep the list of groups sorted
!          by decreasing step.  */
!         if (cst_and_fits_in_hwi ((*groups)->step) && cst_and_fits_in_hwi (step)
!             && (int_cst_value ((*groups)->step) < int_cst_value (step)))
  	break;
      }
  
*************** struct ar_data
*** 361,367 ****
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   HOST_WIDE_INT *step;			/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
--- 366,372 ----
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   tree *step;				/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
*************** idx_analyze_ref (tree base, tree *index,
*** 373,379 ****
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT istep, idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
--- 378,384 ----
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
*************** idx_analyze_ref (tree base, tree *index,
*** 381,395 ****
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, false))
      return false;
    ibase = iv.base;
    step = iv.step;
  
-   if (!cst_and_fits_in_hwi (step))
-     return false;
-   istep = int_cst_value (step);
- 
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
--- 386,396 ----
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, true))
      return false;
    ibase = iv.base;
    step = iv.step;
  
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
*************** idx_analyze_ref (tree base, tree *index,
*** 402,407 ****
--- 403,414 ----
        ibase = build_int_cst (TREE_TYPE (ibase), 0);
      }
  
+   if (*ar_data->step == NULL_TREE)
+     *ar_data->step = step;
+   else
+     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
+ 				  fold_convert (sizetype, *ar_data->step),
+ 				  fold_convert (sizetype, step));
    if (TREE_CODE (base) == ARRAY_REF)
      {
        stepsize = array_ref_element_size (base);
*************** idx_analyze_ref (tree base, tree *index,
*** 409,419 ****
  	return false;
        imult = int_cst_value (stepsize);
  
!       istep *= imult;
        idelta *= imult;
      }
  
-   *ar_data->step += istep;
    *ar_data->delta += idelta;
    *index = ibase;
  
--- 416,427 ----
  	return false;
        imult = int_cst_value (stepsize);
  
!       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
! 				    fold_convert (sizetype, *ar_data->step),
! 				    fold_convert (sizetype, step));
        idelta *= imult;
      }
  
    *ar_data->delta += idelta;
    *index = ibase;
  
*************** idx_analyze_ref (tree base, tree *index,
*** 427,433 ****
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     HOST_WIDE_INT *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
--- 435,441 ----
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     tree *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
*************** analyze_ref (struct loop *loop, tree *re
*** 435,441 ****
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = 0;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
--- 443,449 ----
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = NULL_TREE;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
*************** static bool
*** 470,477 ****
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base;
!   HOST_WIDE_INT step, delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
--- 478,485 ----
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base, step;
!   HOST_WIDE_INT delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
*************** gather_memory_references_ref (struct loo
*** 479,484 ****
--- 487,495 ----
  
    if (!analyze_ref (loop, &ref, &base, &step, &delta, stmt))
      return false;
+   /* If analyze_ref fails the default is a NULL_TREE.  We can stop here.  */
+   if (step == NULL_TREE)
+     return false;
  
    /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
       are integer constants.  */
*************** gather_memory_references (struct loop *l
*** 553,560 ****
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
  
    if (step == 0)
      {
--- 564,579 ----
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step;
!   bool backward;
! 
!   /* If the step size is non constant, we cannot calculate prefetch_mod.  */
!   if (!cst_and_fits_in_hwi (ref->group->step))
!     return;
! 
!   step = int_cst_value (ref->group->step);
! 
!   backward = step < 0;
  
    if (step == 0)
      {
*************** static void
*** 638,645 ****
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
--- 657,664 ----
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step;
!   bool backward;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
*************** prune_ref_by_group_reuse (struct mem_ref
*** 650,655 ****
--- 669,684 ----
    tree ref_type;
    int align_unit;
  
+   /* If the step is non constant we cannot calculate prefetch_before.  */
+   if (!cst_and_fits_in_hwi (ref->group->step)) {
+     return;
+   }
+ 
+   step = int_cst_value (ref->group->step);
+ 
+   backward = step < 0;
+ 
+ 
    if (delta == 0)
      {
        /* If the references has the same address, only prefetch the
*************** static void
*** 957,963 ****
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
--- 986,992 ----
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local, forward;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
*************** issue_prefetch_ref (struct mem_ref *ref,
*** 980,992 ****
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       /* Determine the address to prefetch.  */
!       delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
!       addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
! 			  addr_base, size_int (delta));
!       addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
! 				       true, GSI_SAME_STMT);
! 
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);
--- 1009,1036 ----
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       if (cst_and_fits_in_hwi (ref->group->step))
!         {
!           /* Determine the address to prefetch.  */
!           delta = (ahead + ap * ref->prefetch_mod) *
! 		   int_cst_value (ref->group->step);
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
!                               addr_base, size_int (delta));
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
!                                            true, GSI_SAME_STMT);
!         }
!       else
!         {
!           /* The step size is non-constant but loop-invariant.  We use the
!              heuristic to simply prefetch ahead iterations ahead.  */
!           forward = fold_build2 (MULT_EXPR, sizetype,
!                                  fold_convert (sizetype, ref->group->step),
!                                  fold_convert (sizetype, size_int (ahead)));
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node, addr_base,
! 			      forward);
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true,
! 					   NULL, true, GSI_SAME_STMT);
!       }
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v3] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 16:20     ` [patch 4/4 v3] " Christian Borntraeger
@ 2010-05-07 17:02       ` Zdenek Dvorak
  2010-05-07 20:04         ` [patch 4/4 v4] " Christian Borntraeger
  0 siblings, 1 reply; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-07 17:02 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

Hi,

> Am Freitag 07 Mai 2010 17:00:10 schrieb Zdenek Dvorak:
> > OK, modulo coding style fixes:
> Lets hope that I found everything.
> 
> > [...]
> > It might be useful to dump the step in this case, anyway.
> Ok, I now use print_node_brief and the output looks like:

please use print_generic_expr.

Zdenek

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 17:02       ` Zdenek Dvorak
@ 2010-05-07 20:04         ` Christian Borntraeger
  2010-05-07 21:10           ` Andreas Krebbel
  2010-05-19 10:57           ` Andreas Krebbel
  0 siblings, 2 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-07 20:04 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: gcc-patches, Richard Guenther, Changpeng Fang, Andreas Krebbel

> please use print_generic_expr.

Allow loop prefetch code to speculatively prefetch non constant steps

This patch changes the prefetch code to accept non-constant step sizes
for memory references in loops. If the step size is a constant integer
the patch should not change the current behaviour. 

This patch uses a heuristic to speculatively activate prefetching for
non-constant steps by prefetching "ahead" iterations in advance.

This patch improves some testcases where the old prefetch code did not
improve the performance (e.g. mcf).

Bootstrapped and tested on s390x-ibm-linux-gnu.

Christian.

2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
        * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
        * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to 
        handle a tree as step.  This also checks for a constant int vs.
        non-constant but loop-invariant steps.
        * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort 
        algorithm to only consider steps that are constant ints.
        * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
        a tree instead of a HOST_WIDE_INT for step.
        * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
        tree instead of int and be prepared to see a NULL_TREE.
        * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not prune
        prefetches if the step cannot be calculated at compile time.
        * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
        for non-constant but loop-invariant steps.
 
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** along with GCC; see the file COPYING3.  
*** 204,210 ****
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   HOST_WIDE_INT step;		/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
--- 204,210 ----
  struct mem_ref_group
  {
    tree base;			/* Base of the reference.  */
!   tree step;			/* Step of the reference.  */
    struct mem_ref *refs;		/* References in the group.  */
    struct mem_ref_group *next;	/* Next group of references.  */
  };
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 248,254 ****
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->group->step);
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
--- 248,257 ----
    fprintf (file, "  group %p (base ", (void *) ref->group);
    print_generic_expr (file, ref->group->base, TDF_SLIM);
    fprintf (file, ", step ");
!   if (cst_and_fits_in_hwi (ref->group->step))
!     fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (ref->group->step));
!   else
!     print_generic_expr (file, ref->group->step, TDF_TREE);
    fprintf (file, ")\n");
  
    fprintf (file, "  delta ");
*************** dump_mem_ref (FILE *file, struct mem_ref
*** 264,282 ****
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base,
! 		      HOST_WIDE_INT step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if ((*groups)->step == step
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* Keep the list of groups sorted by decreasing step.  */
!       if ((*groups)->step < step)
  	break;
      }
  
--- 267,286 ----
     exist.  */
  
  static struct mem_ref_group *
! find_or_create_group (struct mem_ref_group **groups, tree base, tree step)
  {
    struct mem_ref_group *group;
  
    for (; *groups; groups = &(*groups)->next)
      {
!       if (operand_equal_p ((*groups)->step, step, 0)
  	  && operand_equal_p ((*groups)->base, base, 0))
  	return *groups;
  
!       /* If step is an integer constant, keep the list of groups sorted
!          by decreasing step.  */
!         if (cst_and_fits_in_hwi ((*groups)->step) && cst_and_fits_in_hwi (step)
!             && (int_cst_value ((*groups)->step) < int_cst_value (step)))
  	break;
      }
  
*************** struct ar_data
*** 361,367 ****
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   HOST_WIDE_INT *step;			/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
--- 365,371 ----
  {
    struct loop *loop;			/* Loop of the reference.  */
    gimple stmt;				/* Statement of the reference.  */
!   tree *step;				/* Step of the memory reference.  */
    HOST_WIDE_INT *delta;			/* Offset of the memory reference.  */
  };
  
*************** idx_analyze_ref (tree base, tree *index,
*** 373,379 ****
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT istep, idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
--- 377,383 ----
  {
    struct ar_data *ar_data = (struct ar_data *) data;
    tree ibase, step, stepsize;
!   HOST_WIDE_INT idelta = 0, imult = 1;
    affine_iv iv;
  
    if (TREE_CODE (base) == MISALIGNED_INDIRECT_REF
*************** idx_analyze_ref (tree base, tree *index,
*** 381,395 ****
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, false))
      return false;
    ibase = iv.base;
    step = iv.step;
  
-   if (!cst_and_fits_in_hwi (step))
-     return false;
-   istep = int_cst_value (step);
- 
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
--- 385,395 ----
      return false;
  
    if (!simple_iv (ar_data->loop, loop_containing_stmt (ar_data->stmt),
! 		  *index, &iv, true))
      return false;
    ibase = iv.base;
    step = iv.step;
  
    if (TREE_CODE (ibase) == POINTER_PLUS_EXPR
        && cst_and_fits_in_hwi (TREE_OPERAND (ibase, 1)))
      {
*************** idx_analyze_ref (tree base, tree *index,
*** 402,407 ****
--- 402,413 ----
        ibase = build_int_cst (TREE_TYPE (ibase), 0);
      }
  
+   if (*ar_data->step == NULL_TREE)
+     *ar_data->step = step;
+   else
+     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
+ 				  fold_convert (sizetype, *ar_data->step),
+ 				  fold_convert (sizetype, step));
    if (TREE_CODE (base) == ARRAY_REF)
      {
        stepsize = array_ref_element_size (base);
*************** idx_analyze_ref (tree base, tree *index,
*** 409,419 ****
  	return false;
        imult = int_cst_value (stepsize);
  
!       istep *= imult;
        idelta *= imult;
      }
  
-   *ar_data->step += istep;
    *ar_data->delta += idelta;
    *index = ibase;
  
--- 415,426 ----
  	return false;
        imult = int_cst_value (stepsize);
  
!       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
! 				    fold_convert (sizetype, *ar_data->step),
! 				    fold_convert (sizetype, step));
        idelta *= imult;
      }
  
    *ar_data->delta += idelta;
    *index = ibase;
  
*************** idx_analyze_ref (tree base, tree *index,
*** 427,433 ****
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     HOST_WIDE_INT *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
--- 434,440 ----
  
  static bool
  analyze_ref (struct loop *loop, tree *ref_p, tree *base,
! 	     tree *step, HOST_WIDE_INT *delta,
  	     gimple stmt)
  {
    struct ar_data ar_data;
*************** analyze_ref (struct loop *loop, tree *re
*** 435,441 ****
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = 0;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
--- 442,448 ----
    HOST_WIDE_INT bit_offset;
    tree ref = *ref_p;
  
!   *step = NULL_TREE;
    *delta = 0;
  
    /* First strip off the component references.  Ignore bitfields.  */
*************** static bool
*** 470,477 ****
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base;
!   HOST_WIDE_INT step, delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
--- 477,484 ----
  gather_memory_references_ref (struct loop *loop, struct mem_ref_group **refs,
  			      tree ref, bool write_p, gimple stmt)
  {
!   tree base, step;
!   HOST_WIDE_INT delta;
    struct mem_ref_group *agrp;
  
    if (get_base_address (ref) == NULL)
*************** gather_memory_references_ref (struct loo
*** 479,484 ****
--- 486,494 ----
  
    if (!analyze_ref (loop, &ref, &base, &step, &delta, stmt))
      return false;
+   /* If analyze_ref fails the default is a NULL_TREE.  We can stop here.  */
+   if (step == NULL_TREE)
+     return false;
  
    /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
       are integer constants.  */
*************** gather_memory_references (struct loop *l
*** 553,560 ****
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
  
    if (step == 0)
      {
--- 563,578 ----
  static void
  prune_ref_by_self_reuse (struct mem_ref *ref)
  {
!   HOST_WIDE_INT step;
!   bool backward;
! 
!   /* If the step size is non constant, we cannot calculate prefetch_mod.  */
!   if (!cst_and_fits_in_hwi (ref->group->step))
!     return;
! 
!   step = int_cst_value (ref->group->step);
! 
!   backward = step < 0;
  
    if (step == 0)
      {
*************** static void
*** 638,645 ****
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step = ref->group->step;
!   bool backward = step < 0;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
--- 656,663 ----
  prune_ref_by_group_reuse (struct mem_ref *ref, struct mem_ref *by,
  			  bool by_is_before)
  {
!   HOST_WIDE_INT step;
!   bool backward;
    HOST_WIDE_INT delta_r = ref->delta, delta_b = by->delta;
    HOST_WIDE_INT delta = delta_b - delta_r;
    HOST_WIDE_INT hit_from;
*************** prune_ref_by_group_reuse (struct mem_ref
*** 650,655 ****
--- 668,683 ----
    tree ref_type;
    int align_unit;
  
+   /* If the step is non constant we cannot calculate prefetch_before.  */
+   if (!cst_and_fits_in_hwi (ref->group->step)) {
+     return;
+   }
+ 
+   step = int_cst_value (ref->group->step);
+ 
+   backward = step < 0;
+ 
+ 
    if (delta == 0)
      {
        /* If the references has the same address, only prefetch the
*************** static void
*** 957,963 ****
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
--- 985,991 ----
  issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
  {
    HOST_WIDE_INT delta;
!   tree addr, addr_base, write_p, local, forward;
    gimple prefetch;
    gimple_stmt_iterator bsi;
    unsigned n_prefetches, ap;
*************** issue_prefetch_ref (struct mem_ref *ref,
*** 980,992 ****
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       /* Determine the address to prefetch.  */
!       delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
!       addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
! 			  addr_base, size_int (delta));
!       addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
! 				       true, GSI_SAME_STMT);
! 
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);
--- 1008,1035 ----
  
    for (ap = 0; ap < n_prefetches; ap++)
      {
!       if (cst_and_fits_in_hwi (ref->group->step))
!         {
!           /* Determine the address to prefetch.  */
!           delta = (ahead + ap * ref->prefetch_mod) *
! 		   int_cst_value (ref->group->step);
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node,
!                               addr_base, size_int (delta));
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true, NULL,
!                                            true, GSI_SAME_STMT);
!         }
!       else
!         {
!           /* The step size is non-constant but loop-invariant.  We use the
!              heuristic to simply prefetch ahead iterations ahead.  */
!           forward = fold_build2 (MULT_EXPR, sizetype,
!                                  fold_convert (sizetype, ref->group->step),
!                                  fold_convert (sizetype, size_int (ahead)));
!           addr = fold_build2 (POINTER_PLUS_EXPR, ptr_type_node, addr_base,
! 			      forward);
!           addr = force_gimple_operand_gsi (&bsi, unshare_expr (addr), true,
! 					   NULL, true, GSI_SAME_STMT);
!       }
        /* Create the prefetch instruction.  */
        prefetch = gimple_build_call (built_in_decls[BUILT_IN_PREFETCH],
  				    3, addr, write_p, local);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 20:04         ` [patch 4/4 v4] " Christian Borntraeger
@ 2010-05-07 21:10           ` Andreas Krebbel
  2010-05-09 20:07             ` Christian Borntraeger
  2010-05-19 10:57           ` Andreas Krebbel
  1 sibling, 1 reply; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-07 21:10 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches

Hi Christian,

> Bootstrapped and tested on s390x-ibm-linux-gnu.

I assume you have enabled the pass when running the testsuite?!

> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>
>          * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
>          * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
>          * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to
>          handle a tree as step.  This also checks for a constant int vs.
>          non-constant but loop-invariant steps.
>          * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort
>          algorithm to only consider steps that are constant ints.
>          * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
>          a tree instead of a HOST_WIDE_INT for step.
>          * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
>          tree instead of int and be prepared to see a NULL_TREE.
>          * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
>          prefetches if the step cannot be calculated at compile time.
>          * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not prune
>          prefetches if the step cannot be calculated at compile time.
>          * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
>          for non-constant but loop-invariant steps.

No need to repeat the file for every function.

> !&&  (int_cst_value ((*groups)->step)<  int_cst_value (step)))

Superfluous brackets.


You don't need to re-post. I can do these changes for you when checking the 
patch in.

Thanks!

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 21:10           ` Andreas Krebbel
@ 2010-05-09 20:07             ` Christian Borntraeger
  0 siblings, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-09 20:07 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: gcc-patches

Am Freitag 07 Mai 2010 23:10:13 schrieb Andreas Krebbel:
> Hi Christian,
> 
> > Bootstrapped and tested on s390x-ibm-linux-gnu.
> 
> I assume you have enabled the pass when running the testsuite?!

fprefetch-loop-arrays cannot be combined with -Os causing ~500 testcases to
always fail, therefore I did both.
The normal run shows no difference at all, the run with /-fprefetch-loop-arrays
shows no regression on the successful testcases.

There are some strange effects, though, that there are 3 test cases which
causes a fail _without_ the patches are now "fixed" by the patches.

-FAIL: gcc.dg/tree-prof/pr34999.c compilation,  -fprofile-use -D_PROFILE_USE (internal compiler error)
-UNRESOLVED: gcc.dg/tree-prof/pr34999.c execution,    -fprofile-use -D_PROFILE_USE
-FAIL: gcc.dg/tree-ssa/loop-6.c scan-tree-dump-times optimized "else" 3

                === gcc Summary ===

-# of expected passes           65182
-# of unexpected failures       550
+# of expected passes           65185
+# of unexpected failures       548
 # of unexpected successes      20
 # of expected failures         114
-# of unresolved testcases      20
+# of unresolved testcases      19
 # of unsupported tests         712
 /home0/cborntra/REPOS/gcc/build-trunk/gcc/xgcc  version 4.6.0 20100507 (experimental) (GCC)

So from testing perspective the patches looks good. 

Christian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4 v3] Set sane prefetch values for s390x
  2010-05-07 15:25     ` [patch 1/4 v3] " Christian Borntraeger
@ 2010-05-11  7:19       ` Andreas Krebbel
  2010-05-11 15:34         ` Christian Borntraeger
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-11  7:19 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches

On 05/07/2010 05:25 PM, Christian Borntraeger wrote:
 > 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
 >
 >          * config/s390/s390.c: Define sane prefetch settings.
 >          * config/s390/s390.h: Declare that read can use write prefetch.

Since the prefeching pass is a win on S/390 please make -fprefetch-loop-arrays 
the default for -O3 on S/390.  Thanks!

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-07 14:05 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
@ 2010-05-11  7:40   ` Andreas Krebbel
  0 siblings, 0 replies; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-11  7:40 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, Changpeng Fang

On 05/07/2010 03:59 PM, Christian Borntraeger wrote:

 > 2010-05-07  Christian Borntraeger  <borntraeger@de.ibm.com>
 >
 >         * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

Applied. Thanks!

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/4 v3] Prefetch: Ignore large values of prefetch_before
  2010-05-07 15:37     ` [patch 3/4 v3] " Christian Borntraeger
@ 2010-05-11  7:41       ` Andreas Krebbel
  2010-05-11 13:17         ` H.J. Lu
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-11  7:41 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, Changpeng Fang

> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>
>          * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Reset
> 	prefetch_before to PREFETCH_ALL if to accesses "meet" beyond
> 	cache size.

Applied. Thanks!

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/4 v3] Prefetch: Ignore large values of prefetch_before
  2010-05-11  7:41       ` Andreas Krebbel
@ 2010-05-11 13:17         ` H.J. Lu
  2010-05-11 13:43           ` Christian Borntraeger
  2010-05-12 16:02           ` [PATCH] fix PR44078 Christian Borntraeger
  0 siblings, 2 replies; 45+ messages in thread
From: H.J. Lu @ 2010-05-11 13:17 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: Christian Borntraeger, gcc-patches, Changpeng Fang

On Tue, May 11, 2010 at 12:41 AM, Andreas Krebbel
<krebbel@linux.vnet.ibm.com> wrote:
>> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>>
>>         * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Reset
>>        prefetch_before to PREFETCH_ALL if to accesses "meet" beyond
>>        cache size.
>
> Applied. Thanks!
>

I think one of those prefetch changes caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/4 v3] Prefetch: Ignore large values of prefetch_before
  2010-05-11 13:17         ` H.J. Lu
@ 2010-05-11 13:43           ` Christian Borntraeger
  2010-05-12 16:02           ` [PATCH] fix PR44078 Christian Borntraeger
  1 sibling, 0 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-11 13:43 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andreas Krebbel, gcc-patches, Changpeng Fang

Am Dienstag 11 Mai 2010 15:17:18 schrieb H.J. Lu:
> On Tue, May 11, 2010 at 12:41 AM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
> >> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
> >>
> >>         * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Reset
> >>        prefetch_before to PREFETCH_ALL if to accesses "meet" beyond
> >>        cache size.
> >
> > Applied. Thanks!
> >
> 
> I think one of those prefetch changes caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44078

From a first look this looks like that the test case scans for
"nontemporal store" which is also emitted by the new debug messages:

-    return false;
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
+      return false;
+    }

I will check if that is the problem. If yes I will probably chang the testcase.

Christian

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4 v3] Set sane prefetch values for s390x
  2010-05-11  7:19       ` Andreas Krebbel
@ 2010-05-11 15:34         ` Christian Borntraeger
  2010-05-17  7:54           ` Andreas Krebbel
  2010-05-25 15:36           ` Andreas Krebbel
  0 siblings, 2 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-11 15:34 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: gcc-patches

Am Dienstag 11 Mai 2010 09:19:34 schrieb Andreas Krebbel:
> On 05/07/2010 05:25 PM, Christian Borntraeger wrote:
>  > 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>  >
>  >          * config/s390/s390.c: Define sane prefetch settings.
>  >          * config/s390/s390.h: Declare that read can use write prefetch.
> 
> Since the prefeching pass is a win on S/390 please make -fprefetch-loop-arrays 
> the default for -O3 on S/390.  Thanks!
Done.

Set sane prefetch values for s390x

This patch sets some sane values for -fprefetch-loop-arrays on s390.
Some of these settings also help to improve the performance of the
2 other performance patches.

Bootstrapped and tested on s390x-ibm-linux-gnu.

Christian.

2010-05-11  Christian Borntraeger  <borntraeger@de.ibm.com>

        * config/s390/s390.c: Define sane prefetch settings and activate
        flag_prefetch_loop_arrays on -O3.
        * config/s390/s390.h: Declare that read can use write prefetch.

Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** optimization_options (int level ATTRIBUT
*** 1475,1480 ****
--- 1475,1483 ----
       without maintaining a stack frame back-chain.  */
    flag_asynchronous_unwind_tables = 1;
  
+   if (HAVE_prefetch || optimize >= 3)
+       flag_prefetch_loop_arrays = 1;
+ 
    /* Use MVCLE instructions to decrease code size if requested.  */
    if (size != 0)
      target_flags |= MASK_MVCLE;
*************** override_options (void)
*** 1661,1666 ****
--- 1664,1682 ----
      }
  
    set_param_value ("max-pending-list-length", 256);
+   /* values for loop prefetching */
+   set_param_value ("l1-cache-line-size", 256);
+   if (!PARAM_SET_P (PARAM_L1_CACHE_SIZE))
+     set_param_value ("l1-cache-size", 128);
+   /* s390 has more than 2 levels and the size is much larger.  Since
+      we are always running virtualized assume that we only get a small
+      part of the caches above l1.  */
+   if (!PARAM_SET_P (PARAM_L2_CACHE_SIZE))
+     set_param_value ("l2-cache-size", 1500);
+   if (!PARAM_SET_P (PARAM_PREFETCH_MIN_INSN_TO_MEM_RATIO))
+     set_param_value ("prefetch-min-insn-to-mem-ratio", 2);
+   if (!PARAM_SET_P (PARAM_SIMULTANEOUS_PREFETCHES))
+     set_param_value ("simultaneous-prefetches", 6);
  }
  
  /* Map for smallest class containing reg regno.  */
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h.orig
--- gcc/config/s390/s390.h
*************** do {									\
*** 989,992 ****
--- 989,994 ----
    (TARGET_LONG_DISPLACEMENT? ((d) >= -524288 && (d) <= 524287) \
                             : ((d) >= 0 && (d) <= 4095))
  
+ /* Reads can reuse write prefetches, used by tree-ssa-prefetch-loops.c.  */
+ #define READ_CAN_USE_WRITE_PREFETCH 1
  #endif

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] fix PR44078
  2010-05-11 13:17         ` H.J. Lu
  2010-05-11 13:43           ` Christian Borntraeger
@ 2010-05-12 16:02           ` Christian Borntraeger
  2010-05-17  7:55             ` Andreas Krebbel
  1 sibling, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-12 16:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: H.J. Lu, Andreas Krebbel, Changpeng Fang

H.J.

this patch works on my thinkpad. Can you confirm that this patch fixes PR44078?
If yes, ok to apply?


There always was
    fprintf (dump_file, "Marked reference %p as a nontemporal store.\n",
in the old code.

r159256 now adds
        fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);

The testcase prefetch-7.c matches both messages, but it should only match the
first one.

This patch changes the testcase by making the matching more strict.

FYI: For some reason I cannot change the bugzilla state. So can somebody close
that if this patch is being accepted.


2010-05-12  Christian Borntraeger  <borntraeger@de.ibm.com>

        PR 44078
        * gcc.dg/tree-ssa/prefetch-7.c: Change pattern to match only the old
        debug messages but not the newly introduced one.

Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c.orig
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
*************** void test(int *p)
*** 44,50 ****
  
  /* { dg-final { scan-tree-dump-times "Issued prefetch" 5 "aprefetch" } } */
  /* { dg-final { scan-tree-dump-times "Issued nontemporal prefetch" 3 "aprefetch" } } */
! /* { dg-final { scan-tree-dump-times "nontemporal store" 2 "aprefetch" } } */
  
  /* { dg-final { scan-tree-dump-times "builtin_prefetch" 8 "optimized" } } */
  /* { dg-final { scan-tree-dump-times "=\\{nt\\}" 2 "optimized" } } */
--- 44,50 ----
  
  /* { dg-final { scan-tree-dump-times "Issued prefetch" 5 "aprefetch" } } */
  /* { dg-final { scan-tree-dump-times "Issued nontemporal prefetch" 3 "aprefetch" } } */
! /* { dg-final { scan-tree-dump-times "a nontemporal store" 2 "aprefetch" } } */
  
  /* { dg-final { scan-tree-dump-times "builtin_prefetch" 8 "optimized" } } */
  /* { dg-final { scan-tree-dump-times "=\\{nt\\}" 2 "optimized" } } */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4 v3] Set sane prefetch values for s390x
  2010-05-11 15:34         ` Christian Borntraeger
@ 2010-05-17  7:54           ` Andreas Krebbel
  2010-05-25 15:36           ` Andreas Krebbel
  1 sibling, 0 replies; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-17  7:54 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches

> 2010-05-11  Christian Borntraeger<borntraeger@de.ibm.com>
>
>          * config/s390/s390.c: Define sane prefetch settings and activate
>          flag_prefetch_loop_arrays on -O3.
>          * config/s390/s390.h: Declare that read can use write prefetch.

Applied. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] fix PR44078
  2010-05-12 16:02           ` [PATCH] fix PR44078 Christian Borntraeger
@ 2010-05-17  7:55             ` Andreas Krebbel
  0 siblings, 0 replies; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-17  7:55 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, H.J. Lu

> 2010-05-12  Christian Borntraeger<borntraeger@de.ibm.com>
>
>          PR 44078
>          * gcc.dg/tree-ssa/prefetch-7.c: Change pattern to match only the old
>          debug messages but not the newly introduced one.

Applied. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-07 20:04         ` [patch 4/4 v4] " Christian Borntraeger
  2010-05-07 21:10           ` Andreas Krebbel
@ 2010-05-19 10:57           ` Andreas Krebbel
  2010-05-19 22:32             ` H.J. Lu
  1 sibling, 1 reply; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-19 10:57 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches

On 05/07/2010 10:04 PM, Christian Borntraeger wrote:
> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>
>          * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
>          * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
>          * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to
>          handle a tree as step.  This also checks for a constant int vs.
>          non-constant but loop-invariant steps.
>          * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort
>          algorithm to only consider steps that are constant ints.
>          * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
>          a tree instead of a HOST_WIDE_INT for step.
>          * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
>          tree instead of int and be prepared to see a NULL_TREE.
>          * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
>          prefetches if the step cannot be calculated at compile time.
>          * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not prune
>          prefetches if the step cannot be calculated at compile time.
>          * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
>          for non-constant but loop-invariant steps.

Applied to mainline. Thanks!

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch  non constant steps
  2010-05-19 10:57           ` Andreas Krebbel
@ 2010-05-19 22:32             ` H.J. Lu
  2010-05-20  8:47               ` Christian Borntraeger
  2010-05-20 13:38               ` Christian Borntraeger
  0 siblings, 2 replies; 45+ messages in thread
From: H.J. Lu @ 2010-05-19 22:32 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: Christian Borntraeger, gcc-patches

On Wed, May 19, 2010 at 5:40 PM, Andreas Krebbel
<krebbel@linux.vnet.ibm.com> wrote:
> On 05/07/2010 10:04 PM, Christian Borntraeger wrote:
>>
>> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
>>
>>         * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
>>         * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
>>         * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to
>>         handle a tree as step.  This also checks for a constant int vs.
>>         non-constant but loop-invariant steps.
>>         * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort
>>         algorithm to only consider steps that are constant ints.
>>         * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
>>         a tree instead of a HOST_WIDE_INT for step.
>>         * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
>>         tree instead of int and be prepared to see a NULL_TREE.
>>         * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
>>         prefetches if the step cannot be calculated at compile time.
>>         * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not
>> prune
>>         prefetches if the step cannot be calculated at compile time.
>>         * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
>>         for non-constant but loop-invariant steps.
>
> Applied to mainline. Thanks!
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-19 22:32             ` H.J. Lu
@ 2010-05-20  8:47               ` Christian Borntraeger
  2010-05-20 10:06                 ` Zdenek Dvorak
  2010-05-20 13:38               ` Christian Borntraeger
  1 sibling, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-20  8:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andreas Krebbel, gcc-patches

Am Donnerstag 20 Mai 2010 00:25:48 schrieb H.J. Lu:
> On Wed, May 19, 2010 at 5:40 PM, Andreas Krebbel
> <krebbel@linux.vnet.ibm.com> wrote:
> > On 05/07/2010 10:04 PM, Christian Borntraeger wrote:
> >>
> >> 2010-05-07  Christian Borntraeger<borntraeger@de.ibm.com>
> >>
> >>         * tree-ssa-loop-prefetch.c (mem_ref_group): Change step to tree.
> >>         * tree-ssa-loop-prefetch.c (ar_data): Change step to tree.
> >>         * tree-ssa-loop-prefetch.c (dump_mem_ref): Adopt debug code to
> >>         handle a tree as step.  This also checks for a constant int vs.
> >>         non-constant but loop-invariant steps.
> >>         * tree-ssa-loop-prefetch.c (find_or_create_group): Change the sort
> >>         algorithm to only consider steps that are constant ints.
> >>         * tree-ssa-loop-prefetch.c (idx_analyze_ref): Adopt code to handle
> >>         a tree instead of a HOST_WIDE_INT for step.
> >>         * tree-ssa-loop-prefetch.c (gather_memory_references_ref): Handle
> >>         tree instead of int and be prepared to see a NULL_TREE.
> >>         * tree-ssa-loop-prefetch.c (prune_ref_by_self_reuse): Do not prune
> >>         prefetches if the step cannot be calculated at compile time.
> >>         * tree-ssa-loop-prefetch.c (prune_ref_by_group_reuse): Do not
> >> prune
> >>         prefetches if the step cannot be calculated at compile time.
> >>         * tree-ssa-loop-prefetch.c (issue_prefetch_ref): Issue prefetches
> >>         for non-constant but loop-invariant steps.
> >
> > Applied to mainline. Thanks!
> >
> 
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203

Indeed. I think I found a typo when handling array prefetches:

--- gcc/tree-ssa-loop-prefetch.c	(Revision 159557)
+++ gcc/tree-ssa-loop-prefetch.c	(Arbeitskopie)
@@ -440,7 +440,7 @@
 
       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
 				    fold_convert (sizetype, *ar_data->step),
-				    fold_convert (sizetype, step));
+				    fold_convert (sizetype, stepsize));
       idelta *= imult;
     }
 

I will test and update this fix.


Christian

PS: most prefetch testcases contain something like 
/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
/* { dg-require-effective-target ilp32 } */
which explains why I have not seen these regressions on s390x. I will check if
I can adopt these tests to run on more platforms.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-20  8:47               ` Christian Borntraeger
@ 2010-05-20 10:06                 ` Zdenek Dvorak
  0 siblings, 0 replies; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-20 10:06 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: H.J. Lu, Andreas Krebbel, gcc-patches

Hi,

> > > Applied to mainline. Thanks!
> > >
> > 
> > This caused:
> > 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
> 
> Indeed. I think I found a typo when handling array prefetches:
> 
> --- gcc/tree-ssa-loop-prefetch.c	(Revision 159557)
> +++ gcc/tree-ssa-loop-prefetch.c	(Arbeitskopie)
> @@ -440,7 +440,7 @@
>  
>        *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
>  				    fold_convert (sizetype, *ar_data->step),
> -				    fold_convert (sizetype, step));
> +				    fold_convert (sizetype, stepsize));
>        idelta *= imult;
>      }
>  
> 
> I will test and update this fix.
> 
> 
> Christian
> 
> PS: most prefetch testcases contain something like 
> /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> /* { dg-require-effective-target ilp32 } */
> which explains why I have not seen these regressions on s390x. I will check if
> I can adopt these tests to run on more platforms.

the reason for these constraints is that in many of the testcases, we check
for things like that the loops are unrolled the proper number of times, where
the "proper number" depends heavily on the target properties (most of which, like
basic type sizes, are not customizable by parameters).  But, it would certainly
be useful to make them run on more targets,

Zdenek

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-19 22:32             ` H.J. Lu
  2010-05-20  8:47               ` Christian Borntraeger
@ 2010-05-20 13:38               ` Christian Borntraeger
  2010-05-25 11:22                 ` Christian Borntraeger
  2010-05-25 11:46                 ` Eric Botcazou
  1 sibling, 2 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-20 13:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andreas Krebbel, gcc-patches

Am Donnerstag 20 Mai 2010 00:25:48 schrieb H.J. Lu:
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203

H.J.,

Can you verify the following patch?


2010-05-20  Christian Borntraeger  <borntraeger@de.ibm.com>

        PR 44203
        * gcc/tree-ssa-loop-prefetch.c: Fix logic for step calculation to
        match the original (and intended) behaviour before r159557.  This
        changeset changed a=a+b*c to a=(a+b)*b which was obviously wrong
        in two ways. 

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** idx_analyze_ref (tree base, tree *index,
*** 425,449 ****
        ibase = build_int_cst (TREE_TYPE (ibase), 0);
      }
  
-   if (*ar_data->step == NULL_TREE)
-     *ar_data->step = step;
-   else
-     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
- 				  fold_convert (sizetype, *ar_data->step),
- 				  fold_convert (sizetype, step));
    if (TREE_CODE (base) == ARRAY_REF)
      {
        stepsize = array_ref_element_size (base);
        if (!cst_and_fits_in_hwi (stepsize))
  	return false;
        imult = int_cst_value (stepsize);
! 
!       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
! 				    fold_convert (sizetype, *ar_data->step),
! 				    fold_convert (sizetype, step));
        idelta *= imult;
      }
  
    *ar_data->delta += idelta;
    *index = ibase;
  
--- 425,448 ----
        ibase = build_int_cst (TREE_TYPE (ibase), 0);
      }
  
    if (TREE_CODE (base) == ARRAY_REF)
      {
        stepsize = array_ref_element_size (base);
        if (!cst_and_fits_in_hwi (stepsize))
  	return false;
        imult = int_cst_value (stepsize);
!       step = fold_build2 (MULT_EXPR, sizetype,
! 			  fold_convert (sizetype, step),
! 			  fold_convert (sizetype, stepsize));
        idelta *= imult;
      }
  
+   if (*ar_data->step == NULL_TREE)
+     *ar_data->step = step;
+   else
+     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
+ 				  fold_convert (sizetype, *ar_data->step),
+ 				  fold_convert (sizetype, step));
    *ar_data->delta += idelta;
    *index = ibase;
  

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-20 13:38               ` Christian Borntraeger
@ 2010-05-25 11:22                 ` Christian Borntraeger
  2010-05-25 11:24                   ` Richard Guenther
  2010-05-25 11:46                 ` Eric Botcazou
  1 sibling, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-25 11:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: H.J. Lu, Andreas Krebbel, Zdenek Dvorak, Richard Guenther

Am Donnerstag 20 Mai 2010, 15:36:31 schrieb Christian Borntraeger:
> Am Donnerstag 20 Mai 2010 00:25:48 schrieb H.J. Lu:
> > This caused:
> > 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
> 
> H.J.,
> 
> Can you verify the following patch?

Patch was successfully tested by H.J. (see bugzilla)

OK, to apply?

Christian


> 2010-05-20  Christian Borntraeger  <borntraeger@de.ibm.com>
> 
>         PR 44203
>         * gcc/tree-ssa-loop-prefetch.c: Fix logic for step calculation to
>         match the original (and intended) behaviour before r159557.  This
>         changeset changed a=a+b*c to a=(a+b)*b which was obviously wrong
>         in two ways. 
> 
> Index: gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> *** gcc/tree-ssa-loop-prefetch.c.orig
> --- gcc/tree-ssa-loop-prefetch.c
> *************** idx_analyze_ref (tree base, tree *index,
> *** 425,449 ****
>         ibase = build_int_cst (TREE_TYPE (ibase), 0);
>       }
>   
> -   if (*ar_data->step == NULL_TREE)
> -     *ar_data->step = step;
> -   else
> -     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
> - 				  fold_convert (sizetype, *ar_data->step),
> - 				  fold_convert (sizetype, step));
>     if (TREE_CODE (base) == ARRAY_REF)
>       {
>         stepsize = array_ref_element_size (base);
>         if (!cst_and_fits_in_hwi (stepsize))
>   	return false;
>         imult = int_cst_value (stepsize);
> ! 
> !       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
> ! 				    fold_convert (sizetype, *ar_data->step),
> ! 				    fold_convert (sizetype, step));
>         idelta *= imult;
>       }
>   
>     *ar_data->delta += idelta;
>     *index = ibase;
>   
> --- 425,448 ----
>         ibase = build_int_cst (TREE_TYPE (ibase), 0);
>       }
>   
>     if (TREE_CODE (base) == ARRAY_REF)
>       {
>         stepsize = array_ref_element_size (base);
>         if (!cst_and_fits_in_hwi (stepsize))
>   	return false;
>         imult = int_cst_value (stepsize);
> !       step = fold_build2 (MULT_EXPR, sizetype,
> ! 			  fold_convert (sizetype, step),
> ! 			  fold_convert (sizetype, stepsize));
>         idelta *= imult;
>       }
>   
> +   if (*ar_data->step == NULL_TREE)
> +     *ar_data->step = step;
> +   else
> +     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
> + 				  fold_convert (sizetype, *ar_data->step),
> + 				  fold_convert (sizetype, step));
>     *ar_data->delta += idelta;
>     *index = ibase;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-25 11:22                 ` Christian Borntraeger
@ 2010-05-25 11:24                   ` Richard Guenther
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Guenther @ 2010-05-25 11:24 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, H.J. Lu, Andreas Krebbel, Zdenek Dvorak

On Tue, 25 May 2010, Christian Borntraeger wrote:

> Am Donnerstag 20 Mai 2010, 15:36:31 schrieb Christian Borntraeger:
> > Am Donnerstag 20 Mai 2010 00:25:48 schrieb H.J. Lu:
> > > This caused:
> > > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44203
> > 
> > H.J.,
> > 
> > Can you verify the following patch?
> 
> Patch was successfully tested by H.J. (see bugzilla)
> 
> OK, to apply?

Ok.

Thanks,
Richard.
 
> Christian
> 
> 
> > 2010-05-20  Christian Borntraeger  <borntraeger@de.ibm.com>
> > 
> >         PR 44203
> >         * gcc/tree-ssa-loop-prefetch.c: Fix logic for step calculation to
> >         match the original (and intended) behaviour before r159557.  This
> >         changeset changed a=a+b*c to a=(a+b)*b which was obviously wrong
> >         in two ways. 
> > 
> > Index: gcc/tree-ssa-loop-prefetch.c
> > ===================================================================
> > *** gcc/tree-ssa-loop-prefetch.c.orig
> > --- gcc/tree-ssa-loop-prefetch.c
> > *************** idx_analyze_ref (tree base, tree *index,
> > *** 425,449 ****
> >         ibase = build_int_cst (TREE_TYPE (ibase), 0);
> >       }
> >   
> > -   if (*ar_data->step == NULL_TREE)
> > -     *ar_data->step = step;
> > -   else
> > -     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
> > - 				  fold_convert (sizetype, *ar_data->step),
> > - 				  fold_convert (sizetype, step));
> >     if (TREE_CODE (base) == ARRAY_REF)
> >       {
> >         stepsize = array_ref_element_size (base);
> >         if (!cst_and_fits_in_hwi (stepsize))
> >   	return false;
> >         imult = int_cst_value (stepsize);
> > ! 
> > !       *ar_data->step = fold_build2 (MULT_EXPR, sizetype,
> > ! 				    fold_convert (sizetype, *ar_data->step),
> > ! 				    fold_convert (sizetype, step));
> >         idelta *= imult;
> >       }
> >   
> >     *ar_data->delta += idelta;
> >     *index = ibase;
> >   
> > --- 425,448 ----
> >         ibase = build_int_cst (TREE_TYPE (ibase), 0);
> >       }
> >   
> >     if (TREE_CODE (base) == ARRAY_REF)
> >       {
> >         stepsize = array_ref_element_size (base);
> >         if (!cst_and_fits_in_hwi (stepsize))
> >   	return false;
> >         imult = int_cst_value (stepsize);
> > !       step = fold_build2 (MULT_EXPR, sizetype,
> > ! 			  fold_convert (sizetype, step),
> > ! 			  fold_convert (sizetype, stepsize));
> >         idelta *= imult;
> >       }
> >   
> > +   if (*ar_data->step == NULL_TREE)
> > +     *ar_data->step = step;
> > +   else
> > +     *ar_data->step = fold_build2 (PLUS_EXPR, sizetype,
> > + 				  fold_convert (sizetype, *ar_data->step),
> > + 				  fold_convert (sizetype, step));
> >     *ar_data->delta += idelta;
> >     *index = ibase;
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-20 13:38               ` Christian Borntraeger
  2010-05-25 11:22                 ` Christian Borntraeger
@ 2010-05-25 11:46                 ` Eric Botcazou
  2010-05-26  8:50                   ` Eric Botcazou
  1 sibling, 1 reply; 45+ messages in thread
From: Eric Botcazou @ 2010-05-25 11:46 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, H.J. Lu, Andreas Krebbel

> 2010-05-20  Christian Borntraeger  <borntraeger@de.ibm.com>
>
>         PR 44203
>         * gcc/tree-ssa-loop-prefetch.c: Fix logic for step calculation to
>         match the original (and intended) behaviour before r159557.  This
>         changeset changed a=a+b*c to a=(a+b)*b which was obviously wrong
>         in two ways.

That isn't a valid ChangeLog entry, see the numerous examples in ChangeLog.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/4 v3] Set sane prefetch values for s390x
  2010-05-11 15:34         ` Christian Borntraeger
  2010-05-17  7:54           ` Andreas Krebbel
@ 2010-05-25 15:36           ` Andreas Krebbel
  1 sibling, 0 replies; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-25 15:36 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches

On 05/11/2010 05:34 PM, Christian Borntraeger wrote:
> +   if (HAVE_prefetch || optimize>= 3)
> +       flag_prefetch_loop_arrays = 1;

The check needs to be done in (or after) override_options since the 
s390_arch_flags have to set up in order to make this work.  Also the '||' seems 
to be wrong.

I've applied the attached patch to mainline:

2010-05-25  Andreas Krebbel  <Andreas.Krebbel@de.ibm.com>

         * config/s390/s390.c (optimization_options): Fix and move the
         flag_prefetch_loop_arrays override ...
         (override_options): ... here.

Index: gcc/config/s390/s390.c
===================================================================
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*************** optimization_options (int level ATTRIBUT
*** 1475,1483 ****
        without maintaining a stack frame back-chain.  */
     flag_asynchronous_unwind_tables = 1;

-   if (HAVE_prefetch || optimize >= 3)
-       flag_prefetch_loop_arrays = 1;
-
     /* Use MVCLE instructions to decrease code size if requested.  */
     if (size != 0)
       target_flags |= MASK_MVCLE;
--- 1475,1480 ----
*************** override_options (void)
*** 1677,1682 ****
--- 1674,1684 ----
       set_param_value ("prefetch-min-insn-to-mem-ratio", 2);
     if (!PARAM_SET_P (PARAM_SIMULTANEOUS_PREFETCHES))
       set_param_value ("simultaneous-prefetches", 6);
+
+   /* This cannot reside in optimization_options since HAVE_prefetch
+      requires the arch flags to be evaluated already.  */
+   if (HAVE_prefetch && optimize >= 3)
+     flag_prefetch_loop_arrays = 1;
   }

   /* Map for smallest class containing reg regno.  */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-25 11:46                 ` Eric Botcazou
@ 2010-05-26  8:50                   ` Eric Botcazou
  2010-05-26 11:44                     ` Andreas Krebbel
  0 siblings, 1 reply; 45+ messages in thread
From: Eric Botcazou @ 2010-05-26  8:50 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, H.J. Lu, Andreas Krebbel

> That isn't a valid ChangeLog entry, see the numerous examples in ChangeLog.

Here's a valid entry:

2010-05-25  Christian Borntraeger  <borntraeger@de.ibm.com>

	PR 44203
	* tree-ssa-loop-prefetch.c (idx_analyze_ref): Fix logic for step
	calculation to match the intended behaviour before r159557.

To sum up: function name in parentheses and optional angle brackets for the 
specific part; describe only the "what", not the "why", and nothing more.

See http://www.gnu.org/prep/standards/standards.html#Change-Logs for the full 
reference.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-26  8:50                   ` Eric Botcazou
@ 2010-05-26 11:44                     ` Andreas Krebbel
  2010-05-26 18:27                       ` Eric Botcazou
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Krebbel @ 2010-05-26 11:44 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Christian Borntraeger, gcc-patches

On 05/26/2010 09:42 AM, Eric Botcazou wrote:
>> That isn't a valid ChangeLog entry, see the numerous examples in ChangeLog.
>
> Here's a valid entry:
>
> 2010-05-25  Christian Borntraeger<borntraeger@de.ibm.com>
>
> 	PR 44203
> 	* tree-ssa-loop-prefetch.c (idx_analyze_ref): Fix logic for step
> 	calculation to match the intended behaviour before r159557.
>
The patch has already been committed - just a minute after Richards approval. 
I've only fixed the filename - the rest stayed as in Christians posting.  Feel 
free to replace the changelog entry with your version and sorry for not looking 
closely enough at it before committing.

Bye,

-Andreas-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/4 v4] Allow loop prefetch code to speculatively prefetch non constant steps
  2010-05-26 11:44                     ` Andreas Krebbel
@ 2010-05-26 18:27                       ` Eric Botcazou
  0 siblings, 0 replies; 45+ messages in thread
From: Eric Botcazou @ 2010-05-26 18:27 UTC (permalink / raw)
  To: Andreas Krebbel; +Cc: Christian Borntraeger, gcc-patches

> The patch has already been committed - just a minute after Richards
> approval. I've only fixed the filename - the rest stayed as in Christians
> posting.  Feel free to replace the changelog entry with your version and
> sorry for not looking closely enough at it before committing.

I don't particularly enjoy fixing ChangeLog entries or pinging folks about 
them, I have far more interesting things to do.  But broken entries just 
defeat the purpose of having ChangeLog.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 15:10         ` Christian Borntraeger
@ 2010-05-05 15:13           ` Andrew Haley
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Haley @ 2010-05-05 15:13 UTC (permalink / raw)
  To: gcc-patches

On 05/05/2010 04:10 PM, Christian Borntraeger wrote:
> Am Mittwoch 05 Mai 2010 16:58:25 schrieb Richard Guenther:
>> !         fprintf (dump_file, "Ignoring %p due to prefetch_before\n", 
>> (void
>> *) ref);
>>
>> watch long lines.  Coding-style restricts us to 80 columns.
> 
> Redone.
> Coming from Linux kernel coding, is there some tool like
> checkpatch.pl for GNU coding style? This would be really helpful for
> people like me that contribute to projects with conflicting coding 
> styles (GNU and Linux differ a lot) just as a sanity check that I have
> finsihed the "context switch".
> 
> In addition, is there something like a changelog skeleton creator that
> uses diffstat to create an empty changelog?

C-x 4 a  at the change creates the ChangeLog entry along with the function
name.  You are using emacs, right?

Andrew.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 14:58       ` Richard Guenther
@ 2010-05-05 15:10         ` Christian Borntraeger
  2010-05-05 15:13           ` Andrew Haley
  0 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-05 15:10 UTC (permalink / raw)
  To: Richard Guenther
  Cc: gcc-patches, Richard Guenther, Zdenek Dvorak, Changpeng Fang

[-- Attachment #1: Type: Text/Plain, Size: 659 bytes --]

Am Mittwoch 05 Mai 2010 16:58:25 schrieb Richard Guenther:
> !         fprintf (dump_file, "Ignoring %p due to prefetch_before\n", 
> (void
> *) ref);
> 
> watch long lines.  Coding-style restricts us to 80 columns.

Redone.
Coming from Linux kernel coding, is there some tool like
checkpatch.pl for GNU coding style? This would be really helpful for
people like me that contribute to projects with conflicting coding 
styles (GNU and Linux differ a lot) just as a sanity check that I have
finsihed the "context switch".

In addition, is there something like a changelog skeleton creator that
uses diffstat to create an empty changelog?

Thank you

Christian

[-- Attachment #2: prefetch_debug.patch --]
[-- Type: text/x-patch, Size: 1460 bytes --]

Add debug information if prefetches are not issued

These debug statements helped me to understand why a prefetch was not
issued. 

Bootstrapped and tested on s390x-ibm-linux-gnu. Ok to apply?

Christian.

2010-05-05  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** should_issue_prefetch_p (struct mem_ref 
*** 849,859 ****
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     return false;
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     return false;
  
    return true;
  }
--- 849,868 ----
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring %p due to prefetch_before\n",
! 		 (void *) ref);
!       return false;
!     }
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
!       return false;
!     }
  
    return true;
  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 14:40     ` Christian Borntraeger
@ 2010-05-05 14:58       ` Richard Guenther
  2010-05-05 15:10         ` Christian Borntraeger
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Guenther @ 2010-05-05 14:58 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Richard Guenther, gcc-patches, Zdenek Dvorak, Changpeng Fang

On Wed, 5 May 2010, Christian Borntraeger wrote:

> Am Mittwoch 05 Mai 2010 14:31:31 schrieb Richard Guenther:
> > You should guard them with
> > 
> >  if (dump_file && (dump_flags & TDF_DETAILS))
> 
> Right, thank you Richard. 
> 
> Here is an updated version. I also fixed a coding style issue of if and open braces.

!         fprintf (dump_file, "Ignoring %p due to prefetch_before\n", 
(void
*) ref);

watch long lines.  Coding-style restricts us to 80 columns.

> Would that patch be ok for the 4.5 brach as well or is it only for 4.6?

For 4.6.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 12:31   ` Richard Guenther
@ 2010-05-05 14:40     ` Christian Borntraeger
  2010-05-05 14:58       ` Richard Guenther
  0 siblings, 1 reply; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-05 14:40 UTC (permalink / raw)
  To: Richard Guenther
  Cc: gcc-patches, Richard Guenther, Zdenek Dvorak, Changpeng Fang

[-- Attachment #1: Type: Text/Plain, Size: 339 bytes --]

Am Mittwoch 05 Mai 2010 14:31:31 schrieb Richard Guenther:
> You should guard them with
> 
>  if (dump_file && (dump_flags & TDF_DETAILS))

Right, thank you Richard. 

Here is an updated version. I also fixed a coding style issue of if and open braces.

Would that patch be ok for the 4.5 brach as well or is it only for 4.6?

Christian



[-- Attachment #2: prefetch_debug.patch --]
[-- Type: text/x-patch, Size: 1455 bytes --]

Add debug information if prefetches are not issued

These debug statements helped me to understand why a prefetch was not
issued. 

Bootstrapped and tested on s390x-ibm-linux-gnu. Ok to apply?

Christian.

2010-05-05  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
*** gcc/tree-ssa-loop-prefetch.c.orig
--- gcc/tree-ssa-loop-prefetch.c
*************** should_issue_prefetch_p (struct mem_ref 
*** 849,859 ****
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     return false;
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     return false;
  
    return true;
  }
--- 849,867 ----
    /* For now do not issue prefetches for only first few of the
       iterations.  */
    if (ref->prefetch_before != PREFETCH_ALL)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring %p due to prefetch_before\n", (void *) ref);
!       return false;
!     }
  
    /* Do not prefetch nontemporal stores.  */
    if (ref->storent_p)
!     {
!       if (dump_file && (dump_flags & TDF_DETAILS))
!         fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
!       return false;
!     }
  
    return true;
  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 10:13 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
  2010-05-05 10:48   ` Zdenek Dvorak
@ 2010-05-05 12:31   ` Richard Guenther
  2010-05-05 14:40     ` Christian Borntraeger
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Guenther @ 2010-05-05 12:31 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: gcc-patches, Richard Guenther, Zdenek Dvorak, Changpeng Fang

On Wed, May 5, 2010 at 12:09 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> These debug statements helped me to understand why a prefetch was not
> issued.
>
> Bootstrapped and tested on s390x-ibm-linux-gnu. Ok to apply?
>
> Christian.
>
> 2010-05-05  Christian Borntraeger  <borntraeger@de.ibm.com>
>
>        * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

You should guard them with

 if (dump_file && (dump_flags & TDF_DETAILS))

> Index: b/gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> *** a/gcc/tree-ssa-loop-prefetch.c
> --- b/gcc/tree-ssa-loop-prefetch.c
> *************** should_issue_prefetch_p (struct mem_ref
> *** 848,859 ****
>  {
>    /* For now do not issue prefetches for only first few of the
>       iterations.  */
> !   if (ref->prefetch_before != PREFETCH_ALL)
>      return false;
>
>    /* Do not prefetch nontemporal stores.  */
> !   if (ref->storent_p)
>      return false;
>
>    return true;
>  }
> --- 848,863 ----
>  {
>    /* For now do not issue prefetches for only first few of the
>       iterations.  */
> !   if (ref->prefetch_before != PREFETCH_ALL) {
> !     fprintf (dump_file, "Ignoring %p due to prefetch_before\n", (void *) ref);
>      return false;
> +   }
>
>    /* Do not prefetch nontemporal stores.  */
> !   if (ref->storent_p) {
> !     fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
>      return false;
> +   }
>
>    return true;
>  }
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 10:13 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
@ 2010-05-05 10:48   ` Zdenek Dvorak
  2010-05-05 12:31   ` Richard Guenther
  1 sibling, 0 replies; 45+ messages in thread
From: Zdenek Dvorak @ 2010-05-05 10:48 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: gcc-patches, Richard Guenther, Changpeng Fang

Hi,

> These debug statements helped me to understand why a prefetch was not
> issued. 
> 
> Bootstrapped and tested on s390x-ibm-linux-gnu. Ok to apply?

OK,

Zdenek

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 2/4] Add debug information if prefetches are not issued
  2010-05-05 10:13 [patch 0/4] RFC: current prefetch patches Christian Borntraeger
@ 2010-05-05 10:13 ` Christian Borntraeger
  2010-05-05 10:48   ` Zdenek Dvorak
  2010-05-05 12:31   ` Richard Guenther
  0 siblings, 2 replies; 45+ messages in thread
From: Christian Borntraeger @ 2010-05-05 10:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Guenther, Zdenek Dvorak, Changpeng Fang

[-- Attachment #1: prefetch_debug.patch --]
[-- Type: text/plain, Size: 1278 bytes --]

These debug statements helped me to understand why a prefetch was not
issued. 

Bootstrapped and tested on s390x-ibm-linux-gnu. Ok to apply?

Christian.

2010-05-05  Christian Borntraeger  <borntraeger@de.ibm.com>

        * tree-ssa-loop-prefetch.c: Add debug for dropped prefetches.

Index: b/gcc/tree-ssa-loop-prefetch.c
===================================================================
*** a/gcc/tree-ssa-loop-prefetch.c
--- b/gcc/tree-ssa-loop-prefetch.c
*************** should_issue_prefetch_p (struct mem_ref 
*** 848,859 ****
  {
    /* For now do not issue prefetches for only first few of the
       iterations.  */
!   if (ref->prefetch_before != PREFETCH_ALL)
      return false;
  
    /* Do not prefetch nontemporal stores.  */
!   if (ref->storent_p)
      return false;
  
    return true;
  }
--- 848,863 ----
  {
    /* For now do not issue prefetches for only first few of the
       iterations.  */
!   if (ref->prefetch_before != PREFETCH_ALL) {
!     fprintf (dump_file, "Ignoring %p due to prefetch_before\n", (void *) ref);
      return false;
+   }
  
    /* Do not prefetch nontemporal stores.  */
!   if (ref->storent_p) {
!     fprintf (dump_file, "Ignoring nontemporal store %p\n", (void *) ref);
      return false;
+   }
  
    return true;
  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-05-26 18:08 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-07 14:05 [patch 0/4] Prefetch patches for s390, debug and the heuristic V2 Christian Borntraeger
2010-05-07 14:05 ` [patch 4/4] Allow loop prefetch code to speculatively prefetch non constant steps Christian Borntraeger
2010-05-07 15:00   ` Zdenek Dvorak
2010-05-07 15:17     ` Christian Borntraeger
2010-05-07 16:20     ` [patch 4/4 v3] " Christian Borntraeger
2010-05-07 17:02       ` Zdenek Dvorak
2010-05-07 20:04         ` [patch 4/4 v4] " Christian Borntraeger
2010-05-07 21:10           ` Andreas Krebbel
2010-05-09 20:07             ` Christian Borntraeger
2010-05-19 10:57           ` Andreas Krebbel
2010-05-19 22:32             ` H.J. Lu
2010-05-20  8:47               ` Christian Borntraeger
2010-05-20 10:06                 ` Zdenek Dvorak
2010-05-20 13:38               ` Christian Borntraeger
2010-05-25 11:22                 ` Christian Borntraeger
2010-05-25 11:24                   ` Richard Guenther
2010-05-25 11:46                 ` Eric Botcazou
2010-05-26  8:50                   ` Eric Botcazou
2010-05-26 11:44                     ` Andreas Krebbel
2010-05-26 18:27                       ` Eric Botcazou
2010-05-07 14:05 ` [patch 1/4] Set sane prefetch values for s390x Christian Borntraeger
2010-05-07 14:40   ` Zdenek Dvorak
2010-05-07 14:46   ` Jakub Jelinek
2010-05-07 15:25     ` [patch 1/4 v3] " Christian Borntraeger
2010-05-11  7:19       ` Andreas Krebbel
2010-05-11 15:34         ` Christian Borntraeger
2010-05-17  7:54           ` Andreas Krebbel
2010-05-25 15:36           ` Andreas Krebbel
2010-05-07 14:05 ` [patch 3/4] Prefetch: Ignore large values of prefetch_before Christian Borntraeger
2010-05-07 14:51   ` Zdenek Dvorak
2010-05-07 15:37     ` [patch 3/4 v3] " Christian Borntraeger
2010-05-11  7:41       ` Andreas Krebbel
2010-05-11 13:17         ` H.J. Lu
2010-05-11 13:43           ` Christian Borntraeger
2010-05-12 16:02           ` [PATCH] fix PR44078 Christian Borntraeger
2010-05-17  7:55             ` Andreas Krebbel
2010-05-07 14:05 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
2010-05-11  7:40   ` Andreas Krebbel
  -- strict thread matches above, loose matches on Subject: below --
2010-05-05 10:13 [patch 0/4] RFC: current prefetch patches Christian Borntraeger
2010-05-05 10:13 ` [patch 2/4] Add debug information if prefetches are not issued Christian Borntraeger
2010-05-05 10:48   ` Zdenek Dvorak
2010-05-05 12:31   ` Richard Guenther
2010-05-05 14:40     ` Christian Borntraeger
2010-05-05 14:58       ` Richard Guenther
2010-05-05 15:10         ` Christian Borntraeger
2010-05-05 15:13           ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).