public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] S/390: Alignment peeling prolog generation
@ 2017-04-11 14:38 Robin Dapp
  2017-04-11 14:57 ` Bin.Cheng
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-04-11 14:38 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

Hi,

when looking at various vectorization examples on s390x I noticed that
we still peel vf/2 iterations for alignment even though vectorization
costs of unaligned loads and stores are the same as normal loads/stores.

A simple example is

void foo(int *restrict a, int *restrict b, unsigned int n)
{
  for (unsigned int i = 0; i < n; i++)
    {
      b[i] = a[i] * 2 + 1;
    }
}

which gets peeled unless __builtin_assume_aligned (a, 8) is used.

In tree-vect-data-refs.c there are several checks that involve costs  in
the peeling decision none of which seems to suffice in this case. For a
loop with only read DRs there is a check that has been triggering (i.e.
disable peeling) since we implemented the vectorization costs.

Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
should still dictate to never peel. I attached a tentative patch for
discussion which fixes the problem by checking the costs for npeel = 0
and npeel = vf/2 after ensuring we support all misalignments. Is there a
better way and place to do it? Are we missing something somewhere else
that would preclude the peeling from happening?

This is not indended for stage 4 obviously :)

Regards
 Robin

[-- Attachment #2: gcc-omit-peeling.diff --]
[-- Type: text/x-patch, Size: 2442 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 3fc762a..795c22c 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1418,6 +1418,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
   bool all_misalignments_unknown = true;
+  bool all_misalignments_supported = true;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1547,6 +1548,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                 }
 
               all_misalignments_unknown = false;
+
               /* Data-ref that was chosen for the case that all the
                  misalignments are unknown is not relevant anymore, since we
                  have a data-ref with known alignment.  */
@@ -1609,6 +1611,24 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
               break;
             }
         }
+
+      /* Check if target supports misaligned data access for current data
+	 reference.  */
+      vectype = STMT_VINFO_VECTYPE (stmt_info);
+      machine_mode mode = TYPE_MODE (vectype);
+      if (targetm.vectorize.
+	  support_vector_misalignment (mode, TREE_TYPE (DR_REF (dr)),
+				       DR_MISALIGNMENT (dr), false))
+	{
+	  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
+				    dr, 0);
+	  /* Also insert vf/2 peeling that will be used when all
+	     misalignments are unknown. */
+	  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
+				    dr, vf / 2);
+	}
+      else
+	all_misalignments_supported = false;
     }
 
   /* Check if we can possibly peel the loop.  */
@@ -1687,6 +1707,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
             dr0 = first_store;
         }
 
+      /* If the target supports accessing all data references in a misaligned
+	 way, check costs to see if we can leave them unaligned and do not
+	 perform any peeling.  */
+      if (all_misalignments_supported)
+	{
+	  dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
+						       loop_vinfo, &npeel,
+						       &body_cost_vec);
+	  if (!dr0 || !npeel)
+	    do_peeling = false;
+	}
+
       /* In case there are only loads with different unknown misalignments, use
          peeling only if it may help to align other accesses in the loop or
 	 if it may help improving load bandwith when we'd end up using

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-11 14:38 [RFC] S/390: Alignment peeling prolog generation Robin Dapp
@ 2017-04-11 14:57 ` Bin.Cheng
  2017-04-11 15:03   ` Robin Dapp
  2017-04-11 16:25   ` Richard Biener
  0 siblings, 2 replies; 51+ messages in thread
From: Bin.Cheng @ 2017-04-11 14:57 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Tue, Apr 11, 2017 at 3:38 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Hi,
>
> when looking at various vectorization examples on s390x I noticed that
> we still peel vf/2 iterations for alignment even though vectorization
> costs of unaligned loads and stores are the same as normal loads/stores.
>
> A simple example is
>
> void foo(int *restrict a, int *restrict b, unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i++)
>     {
>       b[i] = a[i] * 2 + 1;
>     }
> }
>
> which gets peeled unless __builtin_assume_aligned (a, 8) is used.
>
> In tree-vect-data-refs.c there are several checks that involve costs  in
> the peeling decision none of which seems to suffice in this case. For a
> loop with only read DRs there is a check that has been triggering (i.e.
> disable peeling) since we implemented the vectorization costs.
>
> Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
> should still dictate to never peel. I attached a tentative patch for
> discussion which fixes the problem by checking the costs for npeel = 0
> and npeel = vf/2 after ensuring we support all misalignments. Is there a
> better way and place to do it? Are we missing something somewhere else
> that would preclude the peeling from happening?
>
> This is not indended for stage 4 obviously :)
Hi Robin,
Seems Richi added code like below comparing costs between aligned and
unsigned loads, and only peeling if it's beneficial:

      /* In case there are only loads with different unknown misalignments, use
         peeling only if it may help to align other accesses in the loop or
     if it may help improving load bandwith when we'd end up using
     unaligned loads.  */
      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
      if (!first_store
      && !STMT_VINFO_SAME_ALIGN_REFS (
          vinfo_for_stmt (DR_STMT (dr0))).length ()
      && (vect_supportable_dr_alignment (dr0, false)
          != dr_unaligned_supported
          || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
          == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
        do_peeling = false;

I think similar codes can be added for store cases too.

Thanks,
bin
>
> Regards
>  Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-11 14:57 ` Bin.Cheng
@ 2017-04-11 15:03   ` Robin Dapp
  2017-04-11 15:07     ` Bin.Cheng
  2017-04-11 16:25   ` Richard Biener
  1 sibling, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-04-11 15:03 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: GCC Patches

Hi Bin,

> Seems Richi added code like below comparing costs between aligned and
> unsigned loads, and only peeling if it's beneficial:
> 
>       /* In case there are only loads with different unknown misalignments, use
>          peeling only if it may help to align other accesses in the loop or
>      if it may help improving load bandwith when we'd end up using
>      unaligned loads.  */
>       tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
>       if (!first_store
>       && !STMT_VINFO_SAME_ALIGN_REFS (
>           vinfo_for_stmt (DR_STMT (dr0))).length ()
>       && (vect_supportable_dr_alignment (dr0, false)
>           != dr_unaligned_supported
>           || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
>           == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
>         do_peeling = false;

yes this is the "special case" I was referring to. This successfully
avoids peeling when there is no store (after we had set vectorization
costs). My patch tries to check the costs for all references.

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-11 15:03   ` Robin Dapp
@ 2017-04-11 15:07     ` Bin.Cheng
  0 siblings, 0 replies; 51+ messages in thread
From: Bin.Cheng @ 2017-04-11 15:07 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Tue, Apr 11, 2017 at 4:03 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Hi Bin,
>
>> Seems Richi added code like below comparing costs between aligned and
>> unsigned loads, and only peeling if it's beneficial:
>>
>>       /* In case there are only loads with different unknown misalignments, use
>>          peeling only if it may help to align other accesses in the loop or
>>      if it may help improving load bandwith when we'd end up using
>>      unaligned loads.  */
>>       tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
>>       if (!first_store
>>       && !STMT_VINFO_SAME_ALIGN_REFS (
>>           vinfo_for_stmt (DR_STMT (dr0))).length ()
>>       && (vect_supportable_dr_alignment (dr0, false)
>>           != dr_unaligned_supported
>>           || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
>>           == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
>>         do_peeling = false;
>
> yes this is the "special case" I was referring to. This successfully
> avoids peeling when there is no store (after we had set vectorization
> costs). My patch tries to check the costs for all references.
I am not sure if all references need to be checked, on AArch64,
aligned/unaligned costs are set globally, so only need to make one
check here.

Thanks,
bin
>
> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-11 14:57 ` Bin.Cheng
  2017-04-11 15:03   ` Robin Dapp
@ 2017-04-11 16:25   ` Richard Biener
  2017-04-12  7:51     ` Robin Dapp
  1 sibling, 1 reply; 51+ messages in thread
From: Richard Biener @ 2017-04-11 16:25 UTC (permalink / raw)
  To: gcc-patches, Bin.Cheng, Robin Dapp; +Cc: GCC Patches

On April 11, 2017 4:57:29 PM GMT+02:00, "Bin.Cheng" <amker.cheng@gmail.com> wrote:
>On Tue, Apr 11, 2017 at 3:38 PM, Robin Dapp <rdapp@linux.vnet.ibm.com>
>wrote:
>> Hi,
>>
>> when looking at various vectorization examples on s390x I noticed
>that
>> we still peel vf/2 iterations for alignment even though vectorization
>> costs of unaligned loads and stores are the same as normal
>loads/stores.
>>
>> A simple example is
>>
>> void foo(int *restrict a, int *restrict b, unsigned int n)
>> {
>>   for (unsigned int i = 0; i < n; i++)
>>     {
>>       b[i] = a[i] * 2 + 1;
>>     }
>> }
>>
>> which gets peeled unless __builtin_assume_aligned (a, 8) is used.
>>
>> In tree-vect-data-refs.c there are several checks that involve costs 
>in
>> the peeling decision none of which seems to suffice in this case. For
>a
>> loop with only read DRs there is a check that has been triggering
>(i.e.
>> disable peeling) since we implemented the vectorization costs.
>>
>> Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
>> should still dictate to never peel. I attached a tentative patch for
>> discussion which fixes the problem by checking the costs for npeel =
>0
>> and npeel = vf/2 after ensuring we support all misalignments. Is
>there a
>> better way and place to do it? Are we missing something somewhere
>else
>> that would preclude the peeling from happening?
>>
>> This is not indended for stage 4 obviously :)
>Hi Robin,
>Seems Richi added code like below comparing costs between aligned and
>unsigned loads, and only peeling if it's beneficial:
>
>/* In case there are only loads with different unknown misalignments,
>use
>     peeling only if it may help to align other accesses in the loop or
>     if it may help improving load bandwith when we'd end up using
>     unaligned loads.  */
>     tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
>      if (!first_store
>      && !STMT_VINFO_SAME_ALIGN_REFS (
>          vinfo_for_stmt (DR_STMT (dr0))).length ()
>      && (vect_supportable_dr_alignment (dr0, false)
>          != dr_unaligned_supported
>          || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
>          == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
>        do_peeling = false;
>
>I think similar codes can be added for store cases too.

Note I was very conservative here to allow store bandwidth starved CPUs to benefit from aligning a store.

I think it would be reasonable to apply the same heuristic to the store case that we only peel for same cost if peeling would at least align two refs.

Richard.

>Thanks,
>bin
>>
>> Regards
>>  Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-11 16:25   ` Richard Biener
@ 2017-04-12  7:51     ` Robin Dapp
  2017-04-12  7:58       ` Richard Biener
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-04-12  7:51 UTC (permalink / raw)
  To: Richard Biener, gcc-patches, Bin.Cheng

> Note I was very conservative here to allow store bandwidth starved
> CPUs to benefit from aligning a store.
> 
> I think it would be reasonable to apply the same heuristic to the
> store case that we only peel for same cost if peeling would at least
> align two refs.

Do you mean checking if peeling aligns >= 2 refs for sure? (i.e. with a
known misalignment) Or the same as currently via
STMT_VINFO_SAME_ALIGN_REFS just for stores and .length() >= 2?

Is checking via vect_peeling_hash_choose_best_peeling () too costly or
simply unnecessary if we already know the costs for aligned and
unaligned are the same?

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-12  7:51     ` Robin Dapp
@ 2017-04-12  7:58       ` Richard Biener
  2017-05-04  9:04         ` Robin Dapp
                           ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Richard Biener @ 2017-04-12  7:58 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Wed, Apr 12, 2017 at 9:50 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Note I was very conservative here to allow store bandwidth starved
>> CPUs to benefit from aligning a store.
>>
>> I think it would be reasonable to apply the same heuristic to the
>> store case that we only peel for same cost if peeling would at least
>> align two refs.
>
> Do you mean checking if peeling aligns >= 2 refs for sure? (i.e. with a
> known misalignment) Or the same as currently via
> STMT_VINFO_SAME_ALIGN_REFS just for stores and .length() >= 2?

The latter.

> Is checking via vect_peeling_hash_choose_best_peeling () too costly or
> simply unnecessary if we already know the costs for aligned and
> unaligned are the same?

This one only works for known misalignment, otherwise it's overkill.

OTOH if with some refactoring we can end up using a single cost model
that would be great.  That is for the SAME_ALIGN_REFS we want to
choose the unknown misalignment with the maximum number of
SAME_ALIGN_REFS.  And if we know the misalignment of a single
ref then we still may want to align a unknown misalign ref if that has
more SAME_ALIGN_REFS (I think we always choose the known-misalign
one currently).

Richard.

> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-04-12  7:58       ` Richard Biener
@ 2017-05-04  9:04         ` Robin Dapp
  2017-05-05 11:04           ` Richard Biener
  2017-05-04  9:04         ` [PATCH 1/3] " Robin Dapp
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-04  9:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

Hi,

> This one only works for known misalignment, otherwise it's overkill.
>
> OTOH if with some refactoring we can end up using a single cost model
> that would be great.  That is for the SAME_ALIGN_REFS we want to
> choose the unknown misalignment with the maximum number of
> SAME_ALIGN_REFS.  And if we know the misalignment of a single
> ref then we still may want to align a unknown misalign ref if that has
> more SAME_ALIGN_REFS (I think we always choose the known-misalign
> one currently).

[0/3]
Attempt to unify the peeling cost model as follows:

 - Keep the treatment of known misalignments.

 - Save the load and store with the most frequent misalignment.
  - Compare their costs and get the hardware-preferred one via costs.

 - Choose the best peeling from the best peeling with known
   misalignment and the best with unknown misalignment according to
   the number of aligned data refs.

 - Calculate costs for leaving everything misaligned and compare with
   the best peeling so far.

I also performed some refactoring that seemed necessary during writing
but which is not strictly necessary anymore ([1/3] and [2/3]) yet imho
simplifies understanding the code.  The bulk of the changes is in [3/3].

Testsuite on i386 and s390x is clean.  I guess some additional test
cases won't hurt and I will add them later, however I didn't succeed
defining a test cases with two datarefs with same but unknown
misalignment.  How can this be done?


A thing I did not understand when going over the existing code: In
vect_get_known_peeling_cost() we have

/* If peeled iterations are known but number of scalar loop
         iterations are unknown, count a taken branch per peeled loop.  */

retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
				 NULL, 0, vect_prologue);
retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
				 NULL, 0, vect_epilogue);

In all uses of the function, prologue_cost_vec is discarded afterwards,
only the return value is used.  Should the second statement read retval
+=?  This is only executed when the number of loop iterations is
unknown.  Currently we indeed count one taken branch, but why then
execute record_stmt_cost twice or rather not discard the first retval?

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/3] Vect peeling cost model
  2017-04-12  7:58       ` Richard Biener
  2017-05-04  9:04         ` Robin Dapp
@ 2017-05-04  9:04         ` Robin Dapp
  2017-05-05 10:32           ` Richard Biener
  2017-05-04  9:07         ` [PATCH 2/3] " Robin Dapp
  2017-05-04  9:14         ` [PATCH 3/3] " Robin Dapp
  3 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-04  9:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT,

gcc/ChangeLog:

2017-04-26  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP.
	* tree-vectorizer.h (dr_misalignment): Define DR_MISALIGNMENT.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use.
	(vect_update_misalignment_for_peel): Use.
	(vect_enhance_data_refs_alignment): Use.
	(vect_no_alias_p): Use.
	(vect_duplicate_ssa_name_ptr_info): Use.
	(known_alignment_for_access_p): Use.

[-- Attachment #2: gcc-peeling-p1.diff --]
[-- Type: text/x-patch, Size: 7701 bytes --]

diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 9003ea5..146853b 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -145,6 +145,9 @@ struct data_reference
 #define DR_PTR_INFO(DR)            (DR)->alias.ptr_info
 #define DR_ALIGNED_TO(DR)          (DR)->innermost.aligned_to
 #define DR_INNERMOST(DR)           (DR)->innermost
+#define DR_HAS_NEGATIVE_STEP(DR) \
+  tree_int_cst_compare (DR_STEP (DR), size_zero_node) < 0
+
 
 typedef struct data_reference *data_reference_p;
 
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index aa504b6..9ffae94 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -717,7 +717,7 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
     loop = LOOP_VINFO_LOOP (loop_vinfo);
 
   /* Initialize misalignment to unknown.  */
-  SET_DR_MISALIGNMENT (dr, -1);
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 
   if (tree_fits_shwi_p (DR_STEP (dr)))
     misalign = DR_INIT (dr);
@@ -947,7 +947,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
   if (known_alignment_for_access_p (dr)
       && known_alignment_for_access_p (dr_peel))
     {
-      bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0;
+      bool negative = DR_HAS_NEGATIVE_STEP (dr);
       int misal = DR_MISALIGNMENT (dr);
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
       misal += negative ? -npeel * dr_size : npeel * dr_size;
@@ -957,8 +957,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
     }
 
   if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment to -1.\n");
-  SET_DR_MISALIGNMENT (dr, -1);
+    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment " \
+		     "to unknown (-1).\n");
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 }
 
 
@@ -1526,32 +1527,30 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         {
           if (known_alignment_for_access_p (dr))
             {
-              unsigned int npeel_tmp;
-	      bool negative = tree_int_cst_compare (DR_STEP (dr),
-						    size_zero_node) < 0;
+              unsigned int npeel_tmp = 0;
+	      bool negative = DR_HAS_NEGATIVE_STEP (dr);
 
-              /* Save info about DR in the hash table.  */
               vectype = STMT_VINFO_VECTYPE (stmt_info);
               nelements = TYPE_VECTOR_SUBPARTS (vectype);
               mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
                                                 TREE_TYPE (DR_REF (dr))));
-              npeel_tmp = (negative
-			   ? (mis - nelements) : (nelements - mis))
-		  & (nelements - 1);
+	      if (DR_MISALIGNMENT (dr) != 0)
+		npeel_tmp = (negative ? (mis - nelements)
+			     : (nelements - mis)) & (nelements - 1);
 
               /* For multiple types, it is possible that the bigger type access
                  will have more than one peeling option.  E.g., a loop with two
                  types: one of size (vector size / 4), and the other one of
                  size (vector size / 8).  Vectorization factor will 8.  If both
-                 access are misaligned by 3, the first one needs one scalar
+                 accesses are misaligned by 3, the first one needs one scalar
                  iteration to be aligned, and the second one needs 5.  But the
 		 first one will be aligned also by peeling 5 scalar
                  iterations, and in that case both accesses will be aligned.
                  Hence, except for the immediate peeling amount, we also want
                  to try to add full vector size, while we don't exceed
                  vectorization factor.
-                 We do this automatically for cost model, since we calculate cost
-                 for every peeling option.  */
+                 We do this automatically for cost model, since we calculate
+		 cost for every peeling option.  */
               if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 		{
 		  if (STMT_SLP_TYPE (stmt_info))
@@ -1559,17 +1558,15 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		      = (vf * GROUP_SIZE (stmt_info)) / nelements;
 		  else
 		    possible_npeel_number = vf / nelements;
-		}
 
-              /* Handle the aligned case. We may decide to align some other
-                 access, making DR unaligned.  */
-              if (DR_MISALIGNMENT (dr) == 0)
-                {
-                  npeel_tmp = 0;
-                  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
-                    possible_npeel_number++;
-                }
+		  /* NPEEL_TMP is 0 when there is no misalignment, increment
+		     the peeling amount by one in order to ...  */
+		  if (DR_MISALIGNMENT (dr) == 0)
+		    possible_npeel_number++;
+		}
 
+	      /* Save info about DR in the hash table.  Also include peeling
+	         amounts according to the explanation above.  */
               for (j = 0; j < possible_npeel_number; j++)
                 {
                   vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
@@ -1755,8 +1752,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
       if (known_alignment_for_access_p (dr0))
         {
-	  bool negative = tree_int_cst_compare (DR_STEP (dr0),
-						size_zero_node) < 0;
+	  bool negative = DR_HAS_NEGATIVE_STEP (dr0);
           if (!npeel)
             {
               /* Since it's known at compile time, compute the number of
@@ -3009,7 +3005,7 @@ vect_no_alias_p (struct data_reference *a, struct data_reference *b,
   /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
      bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
      [a, a+12) */
-  if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
+  if (DR_HAS_NEGATIVE_STEP (a))
     {
       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
       seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
@@ -3020,7 +3016,7 @@ vect_no_alias_p (struct data_reference *a, struct data_reference *b,
   tree seg_b_min = DR_INIT (b);
   tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
 				seg_b_min, segment_length_b);
-  if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
+  if (DR_HAS_NEGATIVE_STEP (b))
     {
       tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
       seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
@@ -4161,7 +4157,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr,
   duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
   unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info));
   int misalign = DR_MISALIGNMENT (dr);
-  if (misalign == -1)
+  if (misalign == DR_MISALIGNMENT_UNKNOWN)
     mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
   else
     set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 12bb904..0d9fe14 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1012,6 +1012,7 @@ dr_misalignment (struct data_reference *dr)
    taking into account peeling/versioning if applied.  */
 #define DR_MISALIGNMENT(DR) dr_misalignment (DR)
 #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
+#define DR_MISALIGNMENT_UNKNOWN (-1)
 
 /* Return TRUE if the data access is aligned, and FALSE otherwise.  */
 
@@ -1027,7 +1028,7 @@ aligned_access_p (struct data_reference *data_ref_info)
 static inline bool
 known_alignment_for_access_p (struct data_reference *data_ref_info)
 {
-  return (DR_MISALIGNMENT (data_ref_info) != -1);
+  return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
 }
 
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 2/3] Vect peeling cost model
  2017-04-12  7:58       ` Richard Biener
  2017-05-04  9:04         ` Robin Dapp
  2017-05-04  9:04         ` [PATCH 1/3] " Robin Dapp
@ 2017-05-04  9:07         ` Robin Dapp
  2017-05-05 10:37           ` Richard Biener
  2017-05-04  9:14         ` [PATCH 3/3] " Robin Dapp
  3 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-04  9:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 370 bytes --]

Wrap some frequently used snippets in separate functions.

gcc/ChangeLog:

2017-04-26  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename.
	(vect_get_peeling_costs_all_drs): Create function.
	(vect_peeling_hash_get_lowest_cost):
	Use vect_get_peeling_costs_all_drs.
	(vect_peeling_supportable): Create function.

[-- Attachment #2: gcc-peeling-p2.diff --]
[-- Type: text/x-patch, Size: 8247 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 9ffae94..7b68582 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -903,7 +903,11 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 }
 
 
-/* Function vect_update_misalignment_for_peel
+/* Function vect_update_misalignment_for_peel.
+   Sets DR's misalignment
+   - to 0 if it has the same alignment as DR_PEEL,
+   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to -1 (unknown) otherwise.
 
    DR - the data reference whose misalignment is to be adjusted.
    DR_PEEL - the data reference whose misalignment is being made
@@ -916,7 +920,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
                                    struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  vec<dr_p> same_align_drs;
+  vec<dr_p> same_aligned_drs;
   struct data_reference *current_dr;
   int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
   int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel))));
@@ -932,9 +936,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 
   /* It can be assumed that the data refs with the same alignment as dr_peel
      are aligned in the vector loop.  */
-  same_align_drs
+  same_aligned_drs
     = STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (DR_STMT (dr_peel)));
-  FOR_EACH_VEC_ELT (same_align_drs, i, current_dr)
+  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
     {
       if (current_dr != dr)
         continue;
@@ -1234,27 +1238,23 @@ vect_peeling_hash_get_most_frequent (_vect_peel_info **slot,
   return 1;
 }
 
+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */
 
-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-				   _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+				unsigned int *inside_cost,
+				unsigned int *outside_cost,
+				stmt_vector_for_cost *body_cost_vec,
+				unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  struct data_reference *dr;
-  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
 
-  prologue_cost_vec.create (2);
-  body_cost_vec.create (2);
-  epilogue_cost_vec.create (2);
+  unsigned i;
+  data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
@@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
+      int save_misalignment;
       save_misalignment = DR_MISALIGNMENT (dr);
-      vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
-				 &body_cost_vec);
+      if (dr == dr0 && npeel == vf / 2)
+	SET_DR_MISALIGNMENT (dr, 0);
+      else
+	vect_update_misalignment_for_peel (dr, dr0, npeel);
+      vect_get_data_access_cost (dr, inside_cost, outside_cost,
+				 body_cost_vec);
       SET_DR_MISALIGNMENT (dr, save_misalignment);
     }
+}
+
+/* Traverse peeling hash table and calculate cost for each peeling option.
+   Find the one with the lowest cost.  */
+
+int
+vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
+				   _vect_peel_extended_info *min)
+{
+  vect_peel_info elem = *slot;
+  int dummy;
+  unsigned int inside_cost = 0, outside_cost = 0, i;
+  gimple *stmt = DR_STMT (elem->dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec,
+		       epilogue_cost_vec;
+
+  prologue_cost_vec.create (2);
+  body_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
+				  &body_cost_vec, elem->npeel, 0);
 
   outside_cost += vect_get_known_peeling_cost
     (loop_vinfo, elem->npeel, &dummy,
@@ -1292,7 +1320,8 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
   epilogue_cost_vec.release ();
 
   if (inside_cost < min->inside_cost
-      || (inside_cost == min->inside_cost && outside_cost < min->outside_cost))
+      || (inside_cost == min->inside_cost
+	  && outside_cost < min->outside_cost))
     {
       min->inside_cost = inside_cost;
       min->outside_cost = outside_cost;
@@ -1300,6 +1329,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
       min->body_cost_vec = body_cost_vec;
       min->peel_info.dr = elem->dr;
       min->peel_info.npeel = elem->npeel;
+      min->peel_info.count = elem->count;
     }
   else
     body_cost_vec.release ();
@@ -1342,6 +1372,54 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
    return res.peel_info.dr;
 }
 
+/* Return true if the new peeling NPEEL is supported.  */
+
+static bool
+vect_peeling_supportable (loop_vec_info loop_vinfo, struct data_reference *dr0,
+			  unsigned npeel)
+{
+  unsigned i;
+  struct data_reference *dr = NULL;
+  vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+  gimple *stmt;
+  stmt_vec_info stmt_info;
+  enum dr_alignment_support supportable_dr_alignment;
+
+  /* Ensure that all data refs can be vectorized after the peel.  */
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      int save_misalignment;
+
+      if (dr == dr0)
+	continue;
+
+      stmt = DR_STMT (dr);
+      stmt_info = vinfo_for_stmt (stmt);
+      /* For interleaving, only the alignment of the first access
+	 matters.  */
+      if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+	  && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
+	continue;
+
+      /* Strided accesses perform only component accesses, alignment is
+	 irrelevant for them.  */
+      if (STMT_VINFO_STRIDED_P (stmt_info)
+	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+	continue;
+
+      save_misalignment = DR_MISALIGNMENT (dr);
+      vect_update_misalignment_for_peel (dr, dr0, npeel);
+      supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
+      SET_DR_MISALIGNMENT (dr, save_misalignment);
+
+      if (!supportable_dr_alignment)
+	{
+	  return false;
+	}
+    }
+
+  return true;
+}
 
 /* Function vect_enhance_data_refs_alignment
 
@@ -1778,40 +1856,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                              "Try peeling by %d\n", npeel);
         }
 
-      /* Ensure that all data refs can be vectorized after the peel.  */
-      FOR_EACH_VEC_ELT (datarefs, i, dr)
-        {
-          int save_misalignment;
-
-	  if (dr == dr0)
-	    continue;
-
-	  stmt = DR_STMT (dr);
-	  stmt_info = vinfo_for_stmt (stmt);
-	  /* For interleaving, only the alignment of the first access
-            matters.  */
-	  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
-	      && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
-	    continue;
-
-	  /* Strided accesses perform only component accesses, alignment is
-	     irrelevant for them.  */
-	  if (STMT_VINFO_STRIDED_P (stmt_info)
-	      && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-	    continue;
-
-	  save_misalignment = DR_MISALIGNMENT (dr);
-	  vect_update_misalignment_for_peel (dr, dr0, npeel);
-	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
-	  SET_DR_MISALIGNMENT (dr, save_misalignment);
-
-	  if (!supportable_dr_alignment)
-	    {
-	      do_peeling = false;
-	      break;
-	    }
-	}
+      /* Ensure that all datarefs can be vectorized after the peel.  */
+      if (!vect_peeling_supportable (loop_vinfo, dr0, npeel))
+	do_peeling = false;
 
+      /* Check if all datarefs are supportable and log.  */
       if (do_peeling && known_alignment_for_access_p (dr0) && npeel == 0)
         {
           stat = vect_verify_datarefs_alignment (loop_vinfo);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/3] Vect peeling cost model
  2017-04-12  7:58       ` Richard Biener
                           ` (2 preceding siblings ...)
  2017-05-04  9:07         ` [PATCH 2/3] " Robin Dapp
@ 2017-05-04  9:14         ` Robin Dapp
  3 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-04  9:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 271 bytes --]

gcc/ChangeLog:

2017-04-26  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
	Change cost model.
	(vect_peeling_hash_choose_best_peeling): Return extended peel info.
	(vect_peeling_supportable): Return peeling status.

[-- Attachment #2: gcc-peeling-p3.diff --]
[-- Type: text/x-patch, Size: 18075 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7b68582..da49e35 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -904,9 +904,9 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 
 
 /* Function vect_update_misalignment_for_peel.
-   Sets DR's misalignment
+   Set DR's misalignment
    - to 0 if it has the same alignment as DR_PEEL,
-   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to the misalignment computed using NPEEL if DR's misalignment is known,
    - to -1 (unknown) otherwise.
 
    DR - the data reference whose misalignment is to be adjusted.
@@ -1293,7 +1293,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 {
   vect_peel_info elem = *slot;
   int dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
+  unsigned int inside_cost = 0, outside_cost = 0;
   gimple *stmt = DR_STMT (elem->dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
    choosing an option with the lowest cost (if cost model is enabled) or the
    option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_htab,
 				       loop_vec_info loop_vinfo,
                                        unsigned int *npeel,
@@ -1369,7 +1369,7 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
 
    *npeel = res.peel_info.npeel;
    *body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1518,7 +1518,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   enum dr_alignment_support supportable_dr_alignment;
-  struct data_reference *dr0 = NULL, *first_store = NULL;
+
+  struct data_reference *most_frequent_read = NULL;
+  unsigned int dr_read_count = 0;
+  struct data_reference *most_frequent_write = NULL;
+  unsigned int dr_write_count = 0;
   struct data_reference *dr;
   unsigned int i, j;
   bool do_peeling = false;
@@ -1527,11 +1531,13 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
+
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
-  unsigned int nelements, mis, same_align_drs_max = 0;
+  unsigned int nelements, mis;
   stmt_vector_for_cost body_cost_vec = stmt_vector_for_cost ();
   hash_table<peel_info_hasher> peeling_htab (1);
 
@@ -1652,57 +1658,67 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   npeel_tmp += nelements;
                 }
 
-              all_misalignments_unknown = false;
-              /* Data-ref that was chosen for the case that all the
-                 misalignments are unknown is not relevant anymore, since we
-                 have a data-ref with known alignment.  */
-              dr0 = NULL;
+	      one_misalignment_known = true;
             }
           else
             {
-              /* If we don't know any misalignment values, we prefer
-                 peeling for data-ref that has the maximum number of data-refs
-                 with the same alignment, unless the target prefers to align
-                 stores over load.  */
-              if (all_misalignments_unknown)
-                {
-		  unsigned same_align_drs
-		    = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-                  if (!dr0
-		      || same_align_drs_max < same_align_drs)
-                    {
-                      same_align_drs_max = same_align_drs;
-                      dr0 = dr;
-                    }
-		  /* For data-refs with the same number of related
-		     accesses prefer the one where the misalign
-		     computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
+	      /* If we don't know any misalignment values, we prefer
+		 peeling for data-ref that has the maximum number of data-refs
+		 with the same alignment, unless the target prefers to align
+		 stores over load.  */
+	      unsigned same_align_dr_count
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+
+	      /* For data-refs with the same number of related
+		 accesses prefer the one where the misalign
+		 computation will be invariant in the outermost loop.  */
+	      struct loop *ivloop_max_read = NULL, *ivloop_max_write = NULL,
+			  *ivloop_dr = NULL;
+	      if (most_frequent_read)
+		ivloop_max_read = outermost_invariant_loop_for_expr
+		  (loop, DR_BASE_ADDRESS (most_frequent_read));
+	      if (most_frequent_write)
+		ivloop_max_write = outermost_invariant_loop_for_expr
+		  (loop, DR_BASE_ADDRESS (most_frequent_write));
+	      ivloop_dr = outermost_invariant_loop_for_expr
+		(loop, DR_BASE_ADDRESS (dr));
+
+	      if (DR_IS_READ (dr))
+		{
+		  if (!most_frequent_read
+		      || (same_align_dr_count > dr_read_count))
 		    {
-		      struct loop *ivloop0, *ivloop;
-		      ivloop0 = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr0));
-		      ivloop = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr));
-		      if ((ivloop && !ivloop0)
-			  || (ivloop && ivloop0
-			      && flow_loop_nested_p (ivloop, ivloop0)))
-			dr0 = dr;
+		      most_frequent_read = dr;
+		      dr_read_count = same_align_dr_count;
 		    }
+		  else if (same_align_dr_count == dr_read_count
+			   && ((ivloop_dr && !ivloop_max_read)
+			       || (ivloop_dr && ivloop_max_read
+				   && flow_loop_nested_p
+				   (ivloop_dr, ivloop_max_read))))
+		    {
+		      most_frequent_read = dr;
+		    }
+		}
+	      else if (DR_IS_WRITE (dr))
+		{
+		  if (!most_frequent_write
+		      || (same_align_dr_count > dr_write_count))
+		    {
+		      most_frequent_write = dr;
+		      dr_write_count = same_align_dr_count;
+		    }
+		  else if (same_align_dr_count == dr_write_count
+			   && ((ivloop_dr && !ivloop_max_write)
+			       || (ivloop_dr && ivloop_max_write
+				   && flow_loop_nested_p
+				   (ivloop_dr, ivloop_max_write))))
+		    {
+		      most_frequent_write = dr;
+		    }
+		}
 
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
-
-              /* If there are both known and unknown misaligned accesses in the
-                 loop, we choose peeling amount according to the known
-                 accesses.  */
-              if (!supportable_dr_alignment)
-                {
-                  dr0 = dr;
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
+	      one_misalignment_unknown = true;
             }
         }
       else
@@ -1723,111 +1739,207 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
-  if (do_peeling
-      && all_misalignments_unknown
-      && vect_supportable_dr_alignment (dr0, false))
+  struct _vect_peel_extended_info best_peel;
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_known_alignment.peel_info.dr = NULL;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  peel_for_unknown_alignment.peel_info.dr = NULL;
+
+  if (do_peeling && one_misalignment_known)
     {
-      /* Check if the target requires to prefer stores over loads, i.e., if
-         misaligned stores are more expensive than misaligned loads (taking
-         drs with same alignment into account).  */
-      if (first_store && DR_IS_READ (dr0))
-        {
-          unsigned int load_inside_cost = 0, load_outside_cost = 0;
-          unsigned int store_inside_cost = 0, store_outside_cost = 0;
-          unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
-          unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
+      /* Peeling is possible, but there is no data access that is not supported
+	 unless aligned. So we try to choose the best possible peeling.  */
+
+      /* Choose the best peeling from the hash table.  */
+      peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
+	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
+    }
+
+  if (do_peeling && one_misalignment_unknown)
+    {
+      /* Calculate the costs for aligning MOST_FREQUENT_READ, potentially
+         leaving everything else misaligned.  */
+      unsigned int align_mf_read_inside_cost = 0;
+      unsigned int align_mf_read_outside_cost = 0;
+
+      if (most_frequent_read)
+	{
 	  stmt_vector_for_cost dummy;
 	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (most_frequent_read,
+					  &align_mf_read_inside_cost,
+					  &align_mf_read_outside_cost,
+					  &dummy, vf / 2, vf);
+	  dummy.release();
+	}
+      else
+	{
+	  align_mf_read_inside_cost = UINT_MAX;
+	  align_mf_read_outside_cost = UINT_MAX;
+	}
 
-          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
-				     &dummy);
-          vect_get_data_access_cost (first_store, &store_inside_cost,
-				     &store_outside_cost, &dummy);
+      /* Calculate the costs for aligning MOST_FREQUENT_WRITE, potentially
+         leaving everything else misaligned.  */
+      unsigned int align_mf_write_inside_cost = 0;
+      unsigned int align_mf_write_outside_cost = 0;
 
+      if (most_frequent_write)
+	{
+	  stmt_vector_for_cost dummy;
+	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (most_frequent_write,
+					  &align_mf_write_inside_cost,
+					  &align_mf_write_outside_cost,
+					  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+      else
+	{
+	  align_mf_write_inside_cost = UINT_MAX;
+	  align_mf_write_outside_cost = UINT_MAX;
+	}
 
-          /* Calculate the penalty for leaving FIRST_STORE unaligned (by
-             aligning the load DR0).  */
-          load_inside_penalty = store_inside_cost;
-          load_outside_penalty = store_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-			  DR_STMT (first_store))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                load_inside_penalty += load_inside_cost;
-                load_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                load_inside_penalty += store_inside_cost;
-                load_outside_penalty += store_outside_cost;
-              }
-
-          /* Calculate the penalty for leaving DR0 unaligned (by
-             aligning the FIRST_STORE).  */
-          store_inside_penalty = load_inside_cost;
-          store_outside_penalty = load_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-		      DR_STMT (dr0))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                store_inside_penalty += load_inside_cost;
-                store_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                store_inside_penalty += store_inside_cost;
-                store_outside_penalty += store_outside_cost;
-              }
-
-          if (load_inside_penalty > store_inside_penalty
-              || (load_inside_penalty == store_inside_penalty
-                  && load_outside_penalty > store_outside_penalty))
-            dr0 = first_store;
-        }
+      /* Choose best peeling according to given load and store peeling
+	 costs.  */
+      if (align_mf_read_inside_cost > align_mf_write_inside_cost
+	  || (align_mf_read_inside_cost == align_mf_write_inside_cost
+	      && align_mf_read_outside_cost > align_mf_write_outside_cost))
+	{
+	  peel_for_unknown_alignment.peel_info.dr = most_frequent_write;
+	  peel_for_unknown_alignment.peel_info.count =
+	    1 + STMT_VINFO_SAME_ALIGN_REFS
+	    (vinfo_for_stmt (DR_STMT (most_frequent_write))).length ();
+	  peel_for_unknown_alignment.inside_cost = align_mf_write_inside_cost;
+	  peel_for_unknown_alignment.outside_cost =
+	    align_mf_write_outside_cost;
+	}
+      else
+	{
+	  if (most_frequent_read)
+	    {
+	      peel_for_unknown_alignment.peel_info.dr = most_frequent_read;
+	      peel_for_unknown_alignment.peel_info.count =
+		1 + STMT_VINFO_SAME_ALIGN_REFS
+		(vinfo_for_stmt (DR_STMT (most_frequent_read))).length ();
+	    }
+	  else
+	    {
+	      peel_for_unknown_alignment.peel_info.dr = most_frequent_write;
+	      peel_for_unknown_alignment.peel_info.count =
+		1 + STMT_VINFO_SAME_ALIGN_REFS
+		(vinfo_for_stmt (DR_STMT (most_frequent_write))).length ();
+	    }
+	  peel_for_unknown_alignment.inside_cost = align_mf_read_inside_cost;
+	  peel_for_unknown_alignment.outside_cost =
+	    align_mf_read_outside_cost;
+	}
+
+      /* Add prologue and epilogue costs.  */
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
+
+      int dummy;
+      peel_for_unknown_alignment.outside_cost
+	+= vect_get_known_peeling_cost (loop_vinfo, vf / 2, &dummy,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
-	 if it may help improving load bandwith when we'd end up using
-	 unaligned loads.  */
-      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
-	  && (vect_supportable_dr_alignment (dr0, false)
-	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
-        do_peeling = false;
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
+
+      /* The code below expects npeel == 0 when we plan to peel vf/2
+	 iterations, so do not set npeel = vf/2 here.  */
+      peel_for_unknown_alignment.peel_info.npeel = 0;
     }
 
-  if (do_peeling && !dr0)
+  /* At this point, we have to choose between peeling for the datarefs with
+     known alignment and the ones with unknown alignment.  Prefer the one
+     that aligns more datarefs in total.  */
+  struct data_reference *dr0 = NULL;
+  if (do_peeling)
     {
-      /* Peeling is possible, but there is no data access that is not supported
-         unless aligned. So we try to choose the best possible peeling.  */
+      bool peel_for_unknown_alignment_valid =
+	peel_for_unknown_alignment.peel_info.dr != NULL;
+      bool peel_for_known_alignment_valid =
+	peel_for_known_alignment.peel_info.dr != NULL;
 
-      /* We should get here only if there are drs with known misalignment.  */
-      gcc_assert (!all_misalignments_unknown);
+      gcc_assert (peel_for_known_alignment_valid
+		  || peel_for_unknown_alignment_valid);
 
-      /* Choose the best peeling from the hash table.  */
-      dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
-						   loop_vinfo, &npeel,
-						   &body_cost_vec);
-      if (!dr0 || !npeel)
-        do_peeling = false;
+      if (peel_for_known_alignment_valid && !peel_for_unknown_alignment_valid)
+	best_peel = peel_for_known_alignment;
+
+      else if
+	(!peel_for_known_alignment_valid && peel_for_unknown_alignment_valid)
+	best_peel = peel_for_unknown_alignment;
+
+      else
+	{
+	  /* Choose the best peeling for known and unknown alignment
+	     according to the number of aligned datarefs.  */
+	  if (peel_for_unknown_alignment.peel_info.count
+	      > peel_for_known_alignment.peel_info.count)
+	    best_peel = peel_for_unknown_alignment;
+	  else
+	    best_peel = peel_for_known_alignment;
+	}
     }
 
+  /* Calculate the penalty for no peeling, i.e. leaving everything
+     unaligned.
+     TODO: use something like an adapted vect_get_peeling_costs_all_drs.  */
+  unsigned nopeel_inside_cost = 0;
+  unsigned nopeel_outside_cost = 0;
+
+  stmt_vector_for_cost dummy;
+  dummy.create (2);
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      vect_get_data_access_cost (dr, &nopeel_inside_cost,
+				 &nopeel_outside_cost, &dummy);
+    }
+  dummy.release ();
+
+  /* Add epilogue costs.  As we do no peeling for alignment here, no prologue
+     costs will be recorded.  */
+  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+  prologue_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  int dummy2;
+  nopeel_outside_cost += vect_get_known_peeling_cost
+    (loop_vinfo, 0, &dummy2,
+     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+     &prologue_cost_vec, &epilogue_cost_vec);
+
+  prologue_cost_vec.release ();
+  epilogue_cost_vec.release ();
+
+
+  /* Check if doing no peeling is not more expensive than the best peeling we
+     have so far.  */
+  if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))
+      && ((nopeel_inside_cost < best_peel.inside_cost)
+	  || (nopeel_inside_cost == best_peel.inside_cost
+	  && nopeel_outside_cost <= best_peel.outside_cost)))
+    {
+      do_peeling = false;
+      npeel = 0;
+    }
+
+
   if (do_peeling)
     {
+      dr0 = best_peel.peel_info.dr;
+      npeel = best_peel.peel_info.npeel;
+
       stmt = DR_STMT (dr0);
       stmt_info = vinfo_for_stmt (stmt);
       vectype = STMT_VINFO_VECTYPE (stmt_info);
       nelements = TYPE_VECTOR_SUBPARTS (vectype);
 
+      /* Define a peeling if not already set and log it.  */
       if (known_alignment_for_access_p (dr0))
         {
 	  bool negative = DR_HAS_NEGATIVE_STEP (dr0);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/3] Vect peeling cost model
  2017-05-04  9:04         ` [PATCH 1/3] " Robin Dapp
@ 2017-05-05 10:32           ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-05 10:32 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 4, 2017 at 11:04 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT,

+#define DR_HAS_NEGATIVE_STEP(DR) \
+  tree_int_cst_compare (DR_STEP (DR), size_zero_node) < 0

this will ICE for non-constant DR_STEP so isn't a suitable define.

If you want sth shorter than tree_int_cst_compare (...) < 0 then
tree_int_cst_sgn (DR_STEP (..)) == -1 should work or
compare_tree_int (DR_STEP (...), 0) < 0.  But I'd rather leave
this unchanged.

The rest of the patch is ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2017-04-26  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP.
>         * tree-vectorizer.h (dr_misalignment): Define DR_MISALIGNMENT.
>         * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use.
>         (vect_update_misalignment_for_peel): Use.
>         (vect_enhance_data_refs_alignment): Use.
>         (vect_no_alias_p): Use.
>         (vect_duplicate_ssa_name_ptr_info): Use.
>         (known_alignment_for_access_p): Use.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/3] Vect peeling cost model
  2017-05-04  9:07         ` [PATCH 2/3] " Robin Dapp
@ 2017-05-05 10:37           ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-05 10:37 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 4, 2017 at 11:05 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Wrap some frequently used snippets in separate functions.

+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */

-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-                                  _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+                               unsigned int *inside_cost,
+                               unsigned int *outside_cost,
+                               stmt_vector_for_cost *body_cost_vec,
+                               unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

ick.  Another case that shows why I like context diffs more ...

Patch looks ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2017-04-26  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename.
>         (vect_get_peeling_costs_all_drs): Create function.
>         (vect_peeling_hash_get_lowest_cost):
>         Use vect_get_peeling_costs_all_drs.
>         (vect_peeling_supportable): Create function.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-04  9:04         ` Robin Dapp
@ 2017-05-05 11:04           ` Richard Biener
  2017-05-08 16:12             ` Robin Dapp
                               ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-05 11:04 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 4, 2017 at 10:59 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Hi,
>
>> This one only works for known misalignment, otherwise it's overkill.
>>
>> OTOH if with some refactoring we can end up using a single cost model
>> that would be great.  That is for the SAME_ALIGN_REFS we want to
>> choose the unknown misalignment with the maximum number of
>> SAME_ALIGN_REFS.  And if we know the misalignment of a single
>> ref then we still may want to align a unknown misalign ref if that has
>> more SAME_ALIGN_REFS (I think we always choose the known-misalign
>> one currently).
>
> [0/3]
> Attempt to unify the peeling cost model as follows:
>
>  - Keep the treatment of known misalignments.
>
>  - Save the load and store with the most frequent misalignment.
>   - Compare their costs and get the hardware-preferred one via costs.
>
>  - Choose the best peeling from the best peeling with known
>    misalignment and the best with unknown misalignment according to
>    the number of aligned data refs.
>
>  - Calculate costs for leaving everything misaligned and compare with
>    the best peeling so far.

So the new part is the last point?  There's a lot of refactoring in 3/3 that
makes it hard to see what is actually changed ...  you need to resist
in doing this, it makes review very hard.

> I also performed some refactoring that seemed necessary during writing
> but which is not strictly necessary anymore ([1/3] and [2/3]) yet imho
> simplifies understanding the code.  The bulk of the changes is in [3/3].
>
> Testsuite on i386 and s390x is clean.  I guess some additional test
> cases won't hurt and I will add them later, however I didn't succeed
> defining a test cases with two datarefs with same but unknown
> misalignment.  How can this be done?

  a[i] += b[i]

should have the load DR of a[i] have the same misalignment as the
store DR of a[i].  I think that's the only case (load/store pair) where
this happens.  We might want to enhance the machinery to
have a[i] and a[i+4] be recorded for example in case the VF divides 4.
Richards patch may have improved things here.

>
> A thing I did not understand when going over the existing code: In
> vect_get_known_peeling_cost() we have
>
> /* If peeled iterations are known but number of scalar loop
>          iterations are unknown, count a taken branch per peeled loop.  */
>
> retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
>                                  NULL, 0, vect_prologue);
> retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
>                                  NULL, 0, vect_epilogue);
>
> In all uses of the function, prologue_cost_vec is discarded afterwards,
> only the return value is used.  Should the second statement read retval
> +=?  This is only executed when the number of loop iterations is
> unknown.  Currently we indeed count one taken branch, but why then
> execute record_stmt_cost twice or rather not discard the first retval?

Yes, it should be +=.

It's also somewhat odd code that should be refactored given it is supposed
to be only called when we know the number of iterations to peel.  That is,
we can't use it to get an estimate on the cost of peeling when the prologue
iteration is unknown (the vect_estimate_min_profitable_iters code has
this in a path not calling vect_get_known_peeling_cost.

Can you try producing a simpler patch that does the last '-' only, without
all the rest?

+  /* At this point, we have to choose between peeling for the datarefs with
+     known alignment and the ones with unknown alignment.  Prefer the one
+     that aligns more datarefs in total.  */
+  struct data_reference *dr0 = NULL;
+  if (do_peeling)
     {

I think it's always best to align a ref with known alignment as that simplifies
conditions and allows followup optimizations (unrolling of the
prologue / epilogue).
I think for this it's better to also compute full costs rather than relying on
sth as simple as "number of same aligned refs".

Does the code ever end up misaligning a previously known aligned ref?

Thanks,
Richard.

>
> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-05 11:04           ` Richard Biener
@ 2017-05-08 16:12             ` Robin Dapp
  2017-05-09 10:38               ` Richard Biener
  2017-05-08 16:13             ` [PATCH 3/4] " Robin Dapp
  2017-05-08 16:27             ` [PATCH 4/4] " Robin Dapp
  2 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-08 16:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

> So the new part is the last point?  There's a lot of refactoring in
3/3 that
> makes it hard to see what is actually changed ...  you need to resist
> in doing this, it makes review very hard.

The new part is actually spread across the three last "-"s.  Attached is
a new version of [3/3] split up into two patches with hopefully less
blending of refactoring and new functionality.

[3/4] Computes full costs when peeling for unknown alignment, uses
either read or write and compares the better one with the peeling costs
for known alignment.  If the peeling for unknown alignment "aligns" more
than twice the number of datarefs, it is preferred over the peeling for
known alignment.

[4/4] Computes the costs for no peeling and compares them with the costs
of the best peeling so far.  If it is not more expensive, no peeling
will be performed.

> I think it's always best to align a ref with known alignment as that
simplifies
> conditions and allows followup optimizations (unrolling of the
> prologue / epilogue).
> I think for this it's better to also compute full costs rather than
relying on
> sth as simple as "number of same aligned refs".
>
> Does the code ever end up misaligning a previously known aligned ref?

The following case used to get aligned via the known alignment of dd but
would not anymore since peeling for unknown alignment aligns two
accesses.  I guess the determining factor is still up for scrutiny and
should probably > 2.  Still, on e.g. s390x no peeling is performed due
to costs.

void foo(int *restrict a, int *restrict b, int *restrict c, int
*restrict d, unsigned int n)
{
  int *restrict dd = __builtin_assume_aligned (d, 8);
  for (unsigned int i = 0; i < n; i++)
    {
      b[i] = b[i] + a[i];
      c[i] = c[i] + b[i];
      dd[i] = a[i];
    }
}

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/4] Vect peeling cost model
  2017-05-05 11:04           ` Richard Biener
  2017-05-08 16:12             ` Robin Dapp
@ 2017-05-08 16:13             ` Robin Dapp
  2017-05-09 10:41               ` Richard Biener
  2017-05-08 16:27             ` [PATCH 4/4] " Robin Dapp
  2 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-08 16:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 325 bytes --]

gcc/ChangeLog:

2017-05-08  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
	Return peel info.
	(vect_enhance_data_refs_alignment):
	Compute full costs when peeling for unknown alignment, compare
	to costs for peeling for known alignment and choose the cheaper
	one.

[-- Attachment #2: gcc-peeling-p3.diff --]
[-- Type: text/x-patch, Size: 11258 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7b68582..786f826 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
    choosing an option with the lowest cost (if cost model is enabled) or the
    option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_htab,
 				       loop_vec_info loop_vinfo,
                                        unsigned int *npeel,
@@ -1369,7 +1369,7 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
 
    *npeel = res.peel_info.npeel;
    *body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1520,6 +1520,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
+  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1527,7 +1528,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1652,11 +1654,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   npeel_tmp += nelements;
                 }
 
-              all_misalignments_unknown = false;
-              /* Data-ref that was chosen for the case that all the
-                 misalignments are unknown is not relevant anymore, since we
-                 have a data-ref with known alignment.  */
-              dr0 = NULL;
+	      one_misalignment_known = true;
             }
           else
             {
@@ -1664,35 +1662,32 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                  peeling for data-ref that has the maximum number of data-refs
                  with the same alignment, unless the target prefers to align
                  stores over load.  */
-              if (all_misalignments_unknown)
-                {
-		  unsigned same_align_drs
-		    = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-                  if (!dr0
-		      || same_align_drs_max < same_align_drs)
-                    {
-                      same_align_drs_max = same_align_drs;
-                      dr0 = dr;
-                    }
-		  /* For data-refs with the same number of related
-		     accesses prefer the one where the misalign
-		     computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
-		    {
-		      struct loop *ivloop0, *ivloop;
-		      ivloop0 = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr0));
-		      ivloop = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr));
-		      if ((ivloop && !ivloop0)
-			  || (ivloop && ivloop0
-			      && flow_loop_nested_p (ivloop, ivloop0)))
-			dr0 = dr;
-		    }
+	      unsigned same_align_drs
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+	      if (!dr0
+		  || same_align_drs_max < same_align_drs)
+		{
+		  same_align_drs_max = same_align_drs;
+		  dr0 = dr;
+		}
+	      /* For data-refs with the same number of related
+		 accesses prefer the one where the misalign
+		 computation will be invariant in the outermost loop.  */
+	      else if (same_align_drs_max == same_align_drs)
+		{
+		  struct loop *ivloop0, *ivloop;
+		  ivloop0 = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr0));
+		  ivloop = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr));
+		  if ((ivloop && !ivloop0)
+		      || (ivloop && ivloop0
+			  && flow_loop_nested_p (ivloop, ivloop0)))
+		    dr0 = dr;
+		}
 
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
+	      if (!first_store && DR_IS_WRITE (dr))
+		first_store = dr;
 
               /* If there are both known and unknown misaligned accesses in the
                  loop, we choose peeling amount according to the known
@@ -1703,6 +1698,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   if (!first_store && DR_IS_WRITE (dr))
                     first_store = dr;
                 }
+
+	      one_misalignment_unknown = true;
             }
         }
       else
@@ -1723,8 +1720,12 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
+  unsigned int unknown_align_inside_cost = UINT_MAX;
+  unsigned int unknown_align_outside_cost = UINT_MAX;
+  unsigned int unknown_align_count = 0;
+
   if (do_peeling
-      && all_misalignments_unknown
+      && one_misalignment_unknown
       && vect_supportable_dr_alignment (dr0, false))
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
@@ -1732,62 +1733,54 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
          drs with same alignment into account).  */
       if (first_store && DR_IS_READ (dr0))
         {
-          unsigned int load_inside_cost = 0, load_outside_cost = 0;
-          unsigned int store_inside_cost = 0, store_outside_cost = 0;
-          unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
-          unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
+	  unsigned int load_inside_cost = 0;
+	  unsigned int load_outside_cost = 0;
+	  unsigned int store_inside_cost = 0;
+	  unsigned int store_outside_cost = 0;
 	  stmt_vector_for_cost dummy;
 	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (dr0,
+					  &load_inside_cost,
+					  &load_outside_cost,
+					  &dummy, vf / 2, vf);
+	  dummy.release ();
 
-          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
-				     &dummy);
-          vect_get_data_access_cost (first_store, &store_inside_cost,
-				     &store_outside_cost, &dummy);
-
+	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (first_store,
+					  &store_inside_cost,
+					  &store_outside_cost,
+					  &dummy, vf / 2, vf);
 	  dummy.release ();
 
-          /* Calculate the penalty for leaving FIRST_STORE unaligned (by
-             aligning the load DR0).  */
-          load_inside_penalty = store_inside_cost;
-          load_outside_penalty = store_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-			  DR_STMT (first_store))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                load_inside_penalty += load_inside_cost;
-                load_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                load_inside_penalty += store_inside_cost;
-                load_outside_penalty += store_outside_cost;
-              }
-
-          /* Calculate the penalty for leaving DR0 unaligned (by
-             aligning the FIRST_STORE).  */
-          store_inside_penalty = load_inside_cost;
-          store_outside_penalty = load_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-		      DR_STMT (dr0))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                store_inside_penalty += load_inside_cost;
-                store_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                store_inside_penalty += store_inside_cost;
-                store_outside_penalty += store_outside_cost;
-              }
-
-          if (load_inside_penalty > store_inside_penalty
-              || (load_inside_penalty == store_inside_penalty
-                  && load_outside_penalty > store_outside_penalty))
-            dr0 = first_store;
+          if (load_inside_cost > store_inside_cost
+              || (load_inside_cost == store_inside_cost
+		  && load_outside_cost > store_outside_cost))
+	    {
+	      dr0 = first_store;
+	      unknown_align_inside_cost = store_inside_cost;
+	      unknown_align_outside_cost = store_outside_cost;
+	    }
+	  else
+	    {
+	      unknown_align_inside_cost = load_inside_cost;
+	      unknown_align_outside_cost = load_outside_cost;
+	    }
+
+	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+	  prologue_cost_vec.create (2);
+	  epilogue_cost_vec.create (2);
+
+	  int dummy2;
+	  unknown_align_outside_cost += vect_get_known_peeling_cost
+	    (loop_vinfo, vf / 2, &dummy2,
+	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	     &prologue_cost_vec, &epilogue_cost_vec);
+
+	  prologue_cost_vec.release ();
+	  epilogue_cost_vec.release ();
+
+	  unknown_align_count = 1 + STMT_VINFO_SAME_ALIGN_REFS
+	    (vinfo_for_stmt (DR_STMT (dr0))).length ();
         }
 
       /* In case there are only loads with different unknown misalignments, use
@@ -1805,22 +1798,43 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         do_peeling = false;
     }
 
-  if (do_peeling && !dr0)
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_known_alignment.inside_cost = UINT_MAX;
+  peel_for_known_alignment.outside_cost = UINT_MAX;
+  peel_for_known_alignment.peel_info.count = 0;
+
+  if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
          unless aligned. So we try to choose the best possible peeling.  */
 
-      /* We should get here only if there are drs with known misalignment.  */
-      gcc_assert (!all_misalignments_unknown);
-
       /* Choose the best peeling from the hash table.  */
-      dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
-						   loop_vinfo, &npeel,
-						   &body_cost_vec);
-      if (!dr0 || !npeel)
-        do_peeling = false;
+      peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
+	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
+      dr0_known_align = peel_for_known_alignment.peel_info.dr;
     }
 
+  /* Compare costs of peeling for known and unknown alignment. */
+  if (unknown_align_inside_cost > peel_for_known_alignment.inside_cost
+      || (unknown_align_inside_cost == peel_for_known_alignment.inside_cost
+	  && unknown_align_outside_cost > peel_for_known_alignment.outside_cost))
+    {
+      dr0 = dr0_known_align;
+    }
+
+  /* We might still want to try to align the datarefs with unknown
+     misalignment if peeling for known alignment aligns significantly
+     less datarefs.  */
+  if (peel_for_known_alignment.peel_info.count * 2 > unknown_align_count)
+    {
+      dr0 = dr0_known_align;
+    }
+
+  if (dr0 == dr0_known_align && !npeel)
+    do_peeling = false;
+  if (dr0 == NULL)
+    do_peeling = false;
+
   if (do_peeling)
     {
       stmt = DR_STMT (dr0);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 4/4] Vect peeling cost model
  2017-05-05 11:04           ` Richard Biener
  2017-05-08 16:12             ` Robin Dapp
  2017-05-08 16:13             ` [PATCH 3/4] " Robin Dapp
@ 2017-05-08 16:27             ` Robin Dapp
  2017-05-09 10:55               ` Richard Biener
  2 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-08 16:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 274 bytes --]

gcc/ChangeLog:

2017-05-08  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
	Remove unused variable.
	(vect_enhance_data_refs_alignment):
	Compare best peelings costs to doing no peeling and choose no
	peeling if equal.

[-- Attachment #2: gcc-peeling-p4.diff --]
[-- Type: text/x-patch, Size: 8739 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 786f826..67d2f57 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1293,7 +1293,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 {
   vect_peel_info elem = *slot;
   int dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
+  unsigned int inside_cost = 0, outside_cost = 0;
   gimple *stmt = DR_STMT (elem->dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
@@ -1520,7 +1520,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
-  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1720,6 +1719,9 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  struct _vect_peel_extended_info best_peel;
   unsigned int unknown_align_inside_cost = UINT_MAX;
   unsigned int unknown_align_outside_cost = UINT_MAX;
   unsigned int unknown_align_count = 0;
@@ -1731,74 +1733,72 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       /* Check if the target requires to prefer stores over loads, i.e., if
          misaligned stores are more expensive than misaligned loads (taking
          drs with same alignment into account).  */
-      if (first_store && DR_IS_READ (dr0))
-        {
-	  unsigned int load_inside_cost = 0;
-	  unsigned int load_outside_cost = 0;
-	  unsigned int store_inside_cost = 0;
-	  unsigned int store_outside_cost = 0;
-	  stmt_vector_for_cost dummy;
-	  dummy.create (2);
-	  vect_get_peeling_costs_all_drs (dr0,
-					  &load_inside_cost,
-					  &load_outside_cost,
-					  &dummy, vf / 2, vf);
-	  dummy.release ();
-
+      unsigned int load_inside_cost = 0;
+      unsigned int load_outside_cost = 0;
+      unsigned int store_inside_cost = 0;
+      unsigned int store_outside_cost = 0;
+
+      stmt_vector_for_cost dummy;
+      dummy.create (2);
+      vect_get_peeling_costs_all_drs (dr0,
+				      &load_inside_cost,
+				      &load_outside_cost,
+				      &dummy, vf / 2, vf);
+      dummy.release ();
+
+      if (first_store)
+	{
 	  dummy.create (2);
 	  vect_get_peeling_costs_all_drs (first_store,
 					  &store_inside_cost,
 					  &store_outside_cost,
 					  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+      else
+	{
+	  store_inside_cost = UINT_MAX;
+	  store_outside_cost = UINT_MAX;
+	}
 
-          if (load_inside_cost > store_inside_cost
-              || (load_inside_cost == store_inside_cost
-		  && load_outside_cost > store_outside_cost))
-	    {
-	      dr0 = first_store;
-	      unknown_align_inside_cost = store_inside_cost;
-	      unknown_align_outside_cost = store_outside_cost;
-	    }
-	  else
-	    {
-	      unknown_align_inside_cost = load_inside_cost;
-	      unknown_align_outside_cost = load_outside_cost;
-	    }
-
-	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
-	  prologue_cost_vec.create (2);
-	  epilogue_cost_vec.create (2);
+      if (load_inside_cost > store_inside_cost
+	  || (load_inside_cost == store_inside_cost
+	      && load_outside_cost > store_outside_cost))
+	{
+	  dr0 = first_store;
+	  unknown_align_inside_cost = store_inside_cost;
+	  unknown_align_outside_cost = store_outside_cost;
+	}
+      else
+	{
+	  unknown_align_inside_cost = load_inside_cost;
+	  unknown_align_outside_cost = load_outside_cost;
+	}
 
-	  int dummy2;
-	  unknown_align_outside_cost += vect_get_known_peeling_cost
-	    (loop_vinfo, vf / 2, &dummy2,
-	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-	     &prologue_cost_vec, &epilogue_cost_vec);
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
 
-	  prologue_cost_vec.release ();
-	  epilogue_cost_vec.release ();
+      int dummy2;
+      unknown_align_outside_cost += vect_get_known_peeling_cost
+	(loop_vinfo, vf / 2, &dummy2,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
 
-	  unknown_align_count = 1 + STMT_VINFO_SAME_ALIGN_REFS
-	    (vinfo_for_stmt (DR_STMT (dr0))).length ();
-        }
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
-	 if it may help improving load bandwith when we'd end up using
-	 unaligned loads.  */
-      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
-	  && (vect_supportable_dr_alignment (dr0, false)
-	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
-        do_peeling = false;
+      unknown_align_count = 1 + STMT_VINFO_SAME_ALIGN_REFS
+	(vinfo_for_stmt (DR_STMT (dr0))).length ();
     }
 
-  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_unknown_alignment.peel_info.count = unknown_align_count;
+  peel_for_unknown_alignment.inside_cost = unknown_align_inside_cost;
+  peel_for_unknown_alignment.outside_cost = unknown_align_outside_cost;
+  peel_for_unknown_alignment.peel_info.npeel = 0;
+  peel_for_unknown_alignment.peel_info.dr = dr0;
+  best_peel = peel_for_unknown_alignment;
+
   peel_for_known_alignment.inside_cost = UINT_MAX;
   peel_for_known_alignment.outside_cost = UINT_MAX;
   peel_for_known_alignment.peel_info.count = 0;
@@ -1811,15 +1811,14 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       /* Choose the best peeling from the hash table.  */
       peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
 	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
-      dr0_known_align = peel_for_known_alignment.peel_info.dr;
     }
 
   /* Compare costs of peeling for known and unknown alignment. */
-  if (unknown_align_inside_cost > peel_for_known_alignment.inside_cost
+  if (peel_for_unknown_alignment.inside_cost > peel_for_known_alignment.inside_cost
       || (unknown_align_inside_cost == peel_for_known_alignment.inside_cost
 	  && unknown_align_outside_cost > peel_for_known_alignment.outside_cost))
     {
-      dr0 = dr0_known_align;
+      best_peel = peel_for_known_alignment;
     }
 
   /* We might still want to try to align the datarefs with unknown
@@ -1827,13 +1826,53 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
      less datarefs.  */
   if (peel_for_known_alignment.peel_info.count * 2 > unknown_align_count)
     {
-      dr0 = dr0_known_align;
+      best_peel = peel_for_known_alignment;
     }
 
-  if (dr0 == dr0_known_align && !npeel)
-    do_peeling = false;
-  if (dr0 == NULL)
-    do_peeling = false;
+  /* Calculate the penalty for no peeling, i.e. leaving everything
+     unaligned.
+     TODO: use something like an adapted vect_get_peeling_costs_all_drs.  */
+  unsigned nopeel_inside_cost = 0;
+  unsigned nopeel_outside_cost = 0;
+
+  stmt_vector_for_cost dummy;
+  dummy.create (2);
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      vect_get_data_access_cost (dr, &nopeel_inside_cost,
+				 &nopeel_outside_cost, &dummy);
+    }
+  dummy.release ();
+
+  /* Add epilogue costs.  As we do no peeling for alignment here, no prologue
+     costs will be recorded.  */
+  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+  prologue_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  int dummy2;
+  nopeel_outside_cost += vect_get_known_peeling_cost
+    (loop_vinfo, vf / 2, &dummy2,
+     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+     &prologue_cost_vec, &epilogue_cost_vec);
+
+  prologue_cost_vec.release ();
+  epilogue_cost_vec.release ();
+
+  npeel = best_peel.peel_info.npeel;
+  dr0 = best_peel.peel_info.dr;
+
+  /* Check if doing no peeling is not more expensive than the best peeling we
+     have so far.  */
+  if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))
+      && vect_supportable_dr_alignment (dr0, false)
+      && ((nopeel_inside_cost < best_peel.inside_cost)
+	  || (nopeel_inside_cost == best_peel.inside_cost
+	      && nopeel_outside_cost <= best_peel.outside_cost)))
+    {
+      do_peeling = false;
+      npeel = 0;
+    }
 
   if (do_peeling)
     {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-08 16:12             ` Robin Dapp
@ 2017-05-09 10:38               ` Richard Biener
  2017-05-11 11:17                 ` Robin Dapp
                                   ` (5 more replies)
  0 siblings, 6 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-09 10:38 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Mon, May 8, 2017 at 6:11 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> So the new part is the last point?  There's a lot of refactoring in
> 3/3 that
>> makes it hard to see what is actually changed ...  you need to resist
>> in doing this, it makes review very hard.
>
> The new part is actually spread across the three last "-"s.  Attached is
> a new version of [3/3] split up into two patches with hopefully less
> blending of refactoring and new functionality.
>
> [3/4] Computes full costs when peeling for unknown alignment, uses
> either read or write and compares the better one with the peeling costs
> for known alignment.  If the peeling for unknown alignment "aligns" more
> than twice the number of datarefs, it is preferred over the peeling for
> known alignment.
>
> [4/4] Computes the costs for no peeling and compares them with the costs
> of the best peeling so far.  If it is not more expensive, no peeling
> will be performed.
>
>> I think it's always best to align a ref with known alignment as that
> simplifies
>> conditions and allows followup optimizations (unrolling of the
>> prologue / epilogue).
>> I think for this it's better to also compute full costs rather than
> relying on
>> sth as simple as "number of same aligned refs".
>>
>> Does the code ever end up misaligning a previously known aligned ref?
>
> The following case used to get aligned via the known alignment of dd but
> would not anymore since peeling for unknown alignment aligns two
> accesses.  I guess the determining factor is still up for scrutiny and
> should probably > 2.  Still, on e.g. s390x no peeling is performed due
> to costs.

Ok, in principle this makes sense if we manage to correctly compute
the costs.  What exactly is profitable or not is of course subject to
the target costs.

Richard.

> void foo(int *restrict a, int *restrict b, int *restrict c, int
> *restrict d, unsigned int n)
> {
>   int *restrict dd = __builtin_assume_aligned (d, 8);
>   for (unsigned int i = 0; i < n; i++)
>     {
>       b[i] = b[i] + a[i];
>       c[i] = c[i] + b[i];
>       dd[i] = a[i];
>     }
> }
>
> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 3/4] Vect peeling cost model
  2017-05-08 16:13             ` [PATCH 3/4] " Robin Dapp
@ 2017-05-09 10:41               ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-09 10:41 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Mon, May 8, 2017 at 6:12 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> gcc/ChangeLog:

+  /* Compare costs of peeling for known and unknown alignment. */
+  if (unknown_align_inside_cost > peel_for_known_alignment.inside_cost
+      || (unknown_align_inside_cost == peel_for_known_alignment.inside_cost
+         && unknown_align_outside_cost >
peel_for_known_alignment.outside_cost))
+    {

no braces around single stmts.

+      dr0 = dr0_known_align;
+    }
+

I think when equal we should prefer dr0_known_align peeling.  That is,
I'd simply
use

   if (unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
     dr0 = dr0_known_align;

this is because followup optimizations are easier with the
prologue/epilogue having niters known.

+  /* We might still want to try to align the datarefs with unknown
+     misalignment if peeling for known alignment aligns significantly
+     less datarefs.  */
+  if (peel_for_known_alignment.peel_info.count * 2 > unknown_align_count)
+    {
+      dr0 = dr0_known_align;

the comment doesn't match the code.  I also think this heuristic is bogus and
instead the cost computation should have figured out the correct DR to peel
in the first place.

Otherwise this patch looks ok.

Thanks,
Richard.


> 2017-05-08  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
>         Return peel info.
>         (vect_enhance_data_refs_alignment):
>         Compute full costs when peeling for unknown alignment, compare
>         to costs for peeling for known alignment and choose the cheaper
>         one.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/4] Vect peeling cost model
  2017-05-08 16:27             ` [PATCH 4/4] " Robin Dapp
@ 2017-05-09 10:55               ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-09 10:55 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Mon, May 8, 2017 at 6:13 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> gcc/ChangeLog:
>
> 2017-05-08  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
>         Remove unused variable.
>         (vect_enhance_data_refs_alignment):
>         Compare best peelings costs to doing no peeling and choose no
>         peeling if equal.

no braces around single stmt ifs please.

+  /* Add epilogue costs.  As we do no peeling for alignment here, no prologue
+     costs will be recorded.  */
+  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+  prologue_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  int dummy2;
+  nopeel_outside_cost += vect_get_known_peeling_cost
+    (loop_vinfo, vf / 2, &dummy2,

^^ pass 0 instead of vf / 2?

+     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+     &prologue_cost_vec, &epilogue_cost_vec);

+  /* Check if doing no peeling is not more expensive than the best peeling we
+     have so far.  */
+  if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))
+      && vect_supportable_dr_alignment (dr0, false)
+      && ((nopeel_inside_cost < best_peel.inside_cost)
+         || (nopeel_inside_cost == best_peel.inside_cost
+             && nopeel_outside_cost <= best_peel.outside_cost)))
+    {
+      do_peeling = false;
+      npeel = 0;
+    }

please on tie do no peeling, thus change to

 if (...
     && nopeel_inside_cost <= best_peel.inside_cost)

I'm not sure why you test for unlimited_cost_model here as I said
elsewhere I'm not sure
what not cost modeling means for static decisions.  The purpose of
unlimited_cost_model
is to always vectorize when possible and omit the runtime
profitability check.  So for peeling
I'd just always use the cost model.  Thus please drop this check.

Otherwise ok.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-09 10:38               ` Richard Biener
@ 2017-05-11 11:17                 ` Robin Dapp
  2017-05-11 12:15                   ` Richard Biener
  2017-05-11 11:17                 ` [PATCH 1/5] Vect peeling cost model Robin Dapp
                                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

Included the requested changes in the patches (to follow).  I removed
the alignment count check now altogether.

> I'm not sure why you test for unlimited_cost_model here as I said
> elsewhere I'm not sure
> what not cost modeling means for static decisions.  The purpose of
> unlimited_cost_model
> is to always vectorize when possible and omit the runtime
> profitability check.  So for peeling
> I'd just always use the cost model.  Thus please drop this check.

Without that, I get one additional FAIL gcc.dg/vect/slp-25.c for x86.
It is caused by choosing no peeling (inside costs 0) over peeling for
known alignment with unlimited cost model (inside costs 0 as well).
Costs 0 for no peeling are caused by count == 0 or rather ncopies = vf /
nunits == 4 / 8 == 0 in record_stmt_costs ().  Shouldn't always hold
ncopies > 0? Even 0.5 would have worked here to make no peeling more
expensive than 0.

Test suite on s390x is clean.

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/5] Vect peeling cost model
  2017-05-09 10:38               ` Richard Biener
  2017-05-11 11:17                 ` Robin Dapp
@ 2017-05-11 11:17                 ` Robin Dapp
  2017-05-11 11:18                 ` [PATCH 2/5] " Robin Dapp
                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

gcc/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vectorizer.h (dr_misalignment): Introduce
	DR_MISALIGNMENT_UNKNOWN.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Refactoring.
	(vect_update_misalignment_for_peel): Use DR_MISALIGNMENT_UNKNOWN.
	(vect_enhance_data_refs_alignment): Likewise.
	(vect_duplicate_ssa_name_ptr_info): Likewise.
	(known_alignment_for_access_p): Likewise.

[-- Attachment #2: gcc-peeling-p1.diff --]
[-- Type: text/x-patch, Size: 5346 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index aa504b6..0d94320 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -717,7 +717,7 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
     loop = LOOP_VINFO_LOOP (loop_vinfo);
 
   /* Initialize misalignment to unknown.  */
-  SET_DR_MISALIGNMENT (dr, -1);
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 
   if (tree_fits_shwi_p (DR_STEP (dr)))
     misalign = DR_INIT (dr);
@@ -957,8 +957,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
     }
 
   if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment to -1.\n");
-  SET_DR_MISALIGNMENT (dr, -1);
+    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment " \
+		     "to unknown (-1).\n");
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 }
 
 
@@ -1526,32 +1527,31 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         {
           if (known_alignment_for_access_p (dr))
             {
-              unsigned int npeel_tmp;
+              unsigned int npeel_tmp = 0;
 	      bool negative = tree_int_cst_compare (DR_STEP (dr),
 						    size_zero_node) < 0;
 
-              /* Save info about DR in the hash table.  */
               vectype = STMT_VINFO_VECTYPE (stmt_info);
               nelements = TYPE_VECTOR_SUBPARTS (vectype);
               mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
                                                 TREE_TYPE (DR_REF (dr))));
-              npeel_tmp = (negative
-			   ? (mis - nelements) : (nelements - mis))
-		  & (nelements - 1);
+	      if (DR_MISALIGNMENT (dr) != 0)
+		npeel_tmp = (negative ? (mis - nelements)
+			     : (nelements - mis)) & (nelements - 1);
 
               /* For multiple types, it is possible that the bigger type access
                  will have more than one peeling option.  E.g., a loop with two
                  types: one of size (vector size / 4), and the other one of
                  size (vector size / 8).  Vectorization factor will 8.  If both
-                 access are misaligned by 3, the first one needs one scalar
+                 accesses are misaligned by 3, the first one needs one scalar
                  iteration to be aligned, and the second one needs 5.  But the
 		 first one will be aligned also by peeling 5 scalar
                  iterations, and in that case both accesses will be aligned.
                  Hence, except for the immediate peeling amount, we also want
                  to try to add full vector size, while we don't exceed
                  vectorization factor.
-                 We do this automatically for cost model, since we calculate cost
-                 for every peeling option.  */
+                 We do this automatically for cost model, since we calculate
+		 cost for every peeling option.  */
               if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 		{
 		  if (STMT_SLP_TYPE (stmt_info))
@@ -1559,17 +1559,15 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		      = (vf * GROUP_SIZE (stmt_info)) / nelements;
 		  else
 		    possible_npeel_number = vf / nelements;
-		}
 
-              /* Handle the aligned case. We may decide to align some other
-                 access, making DR unaligned.  */
-              if (DR_MISALIGNMENT (dr) == 0)
-                {
-                  npeel_tmp = 0;
-                  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
-                    possible_npeel_number++;
-                }
+		  /* NPEEL_TMP is 0 when there is no misalignment, increment
+		     the peeling amount by one in order to ...  */
+		  if (DR_MISALIGNMENT (dr) == 0)
+		    possible_npeel_number++;
+		}
 
+	      /* Save info about DR in the hash table.  Also include peeling
+	         amounts according to the explanation above.  */
               for (j = 0; j < possible_npeel_number; j++)
                 {
                   vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
@@ -4161,7 +4159,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr,
   duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
   unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info));
   int misalign = DR_MISALIGNMENT (dr);
-  if (misalign == -1)
+  if (misalign == DR_MISALIGNMENT_UNKNOWN)
     mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
   else
     set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 12bb904..0d9fe14 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1012,6 +1012,7 @@ dr_misalignment (struct data_reference *dr)
    taking into account peeling/versioning if applied.  */
 #define DR_MISALIGNMENT(DR) dr_misalignment (DR)
 #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
+#define DR_MISALIGNMENT_UNKNOWN (-1)
 
 /* Return TRUE if the data access is aligned, and FALSE otherwise.  */
 
@@ -1027,7 +1028,7 @@ aligned_access_p (struct data_reference *data_ref_info)
 static inline bool
 known_alignment_for_access_p (struct data_reference *data_ref_info)
 {
-  return (DR_MISALIGNMENT (data_ref_info) != -1);
+  return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
 }
 
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 2/5] Vect peeling cost model
  2017-05-09 10:38               ` Richard Biener
  2017-05-11 11:17                 ` Robin Dapp
  2017-05-11 11:17                 ` [PATCH 1/5] Vect peeling cost model Robin Dapp
@ 2017-05-11 11:18                 ` Robin Dapp
  2017-05-11 11:19                 ` [PATCH 3/5] " Robin Dapp
                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 344 bytes --]

gcc/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Change
	comment and rename variable.
	(vect_get_peeling_costs_all_drs): New function.
	(vect_peeling_hash_get_lowest_cost): Use.
	(vect_peeling_supportable): New function.
	(vect_enhance_data_refs_alignment): Use.

[-- Attachment #2: gcc-peeling-p2.diff --]
[-- Type: text/x-patch, Size: 8234 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 0d94320..5907856 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -903,7 +903,11 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 }
 
 
-/* Function vect_update_misalignment_for_peel
+/* Function vect_update_misalignment_for_peel.
+   Sets DR's misalignment
+   - to 0 if it has the same alignment as DR_PEEL,
+   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to -1 (unknown) otherwise.
 
    DR - the data reference whose misalignment is to be adjusted.
    DR_PEEL - the data reference whose misalignment is being made
@@ -916,7 +920,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
                                    struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  vec<dr_p> same_align_drs;
+  vec<dr_p> same_aligned_drs;
   struct data_reference *current_dr;
   int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
   int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel))));
@@ -932,9 +936,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 
   /* It can be assumed that the data refs with the same alignment as dr_peel
      are aligned in the vector loop.  */
-  same_align_drs
+  same_aligned_drs
     = STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (DR_STMT (dr_peel)));
-  FOR_EACH_VEC_ELT (same_align_drs, i, current_dr)
+  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
     {
       if (current_dr != dr)
         continue;
@@ -1234,27 +1238,23 @@ vect_peeling_hash_get_most_frequent (_vect_peel_info **slot,
   return 1;
 }
 
+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */
 
-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-				   _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+				unsigned int *inside_cost,
+				unsigned int *outside_cost,
+				stmt_vector_for_cost *body_cost_vec,
+				unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  struct data_reference *dr;
-  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
 
-  prologue_cost_vec.create (2);
-  body_cost_vec.create (2);
-  epilogue_cost_vec.create (2);
+  unsigned i;
+  data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
@@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
+      int save_misalignment;
       save_misalignment = DR_MISALIGNMENT (dr);
-      vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
-				 &body_cost_vec);
+      if (dr == dr0 && npeel == vf / 2)
+	SET_DR_MISALIGNMENT (dr, 0);
+      else
+	vect_update_misalignment_for_peel (dr, dr0, npeel);
+      vect_get_data_access_cost (dr, inside_cost, outside_cost,
+				 body_cost_vec);
       SET_DR_MISALIGNMENT (dr, save_misalignment);
     }
+}
+
+/* Traverse peeling hash table and calculate cost for each peeling option.
+   Find the one with the lowest cost.  */
+
+int
+vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
+				   _vect_peel_extended_info *min)
+{
+  vect_peel_info elem = *slot;
+  int dummy;
+  unsigned int inside_cost = 0, outside_cost = 0;
+  gimple *stmt = DR_STMT (elem->dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec,
+		       epilogue_cost_vec;
+
+  prologue_cost_vec.create (2);
+  body_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
+				  &body_cost_vec, elem->npeel, 0);
 
   outside_cost += vect_get_known_peeling_cost
     (loop_vinfo, elem->npeel, &dummy,
@@ -1292,7 +1320,8 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
   epilogue_cost_vec.release ();
 
   if (inside_cost < min->inside_cost
-      || (inside_cost == min->inside_cost && outside_cost < min->outside_cost))
+      || (inside_cost == min->inside_cost
+	  && outside_cost < min->outside_cost))
     {
       min->inside_cost = inside_cost;
       min->outside_cost = outside_cost;
@@ -1300,6 +1329,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
       min->body_cost_vec = body_cost_vec;
       min->peel_info.dr = elem->dr;
       min->peel_info.npeel = elem->npeel;
+      min->peel_info.count = elem->count;
     }
   else
     body_cost_vec.release ();
@@ -1342,6 +1372,52 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
    return res.peel_info.dr;
 }
 
+/* Return true if the new peeling NPEEL is supported.  */
+
+static bool
+vect_peeling_supportable (loop_vec_info loop_vinfo, struct data_reference *dr0,
+			  unsigned npeel)
+{
+  unsigned i;
+  struct data_reference *dr = NULL;
+  vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+  gimple *stmt;
+  stmt_vec_info stmt_info;
+  enum dr_alignment_support supportable_dr_alignment;
+
+  /* Ensure that all data refs can be vectorized after the peel.  */
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      int save_misalignment;
+
+      if (dr == dr0)
+	continue;
+
+      stmt = DR_STMT (dr);
+      stmt_info = vinfo_for_stmt (stmt);
+      /* For interleaving, only the alignment of the first access
+	 matters.  */
+      if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+	  && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
+	continue;
+
+      /* Strided accesses perform only component accesses, alignment is
+	 irrelevant for them.  */
+      if (STMT_VINFO_STRIDED_P (stmt_info)
+	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+	continue;
+
+      save_misalignment = DR_MISALIGNMENT (dr);
+      vect_update_misalignment_for_peel (dr, dr0, npeel);
+      supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
+      SET_DR_MISALIGNMENT (dr, save_misalignment);
+
+      if (!supportable_dr_alignment)
+	return false;
+    }
+
+  return true;
+}
 
 /* Function vect_enhance_data_refs_alignment
 
@@ -1780,40 +1856,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                              "Try peeling by %d\n", npeel);
         }
 
-      /* Ensure that all data refs can be vectorized after the peel.  */
-      FOR_EACH_VEC_ELT (datarefs, i, dr)
-        {
-          int save_misalignment;
-
-	  if (dr == dr0)
-	    continue;
-
-	  stmt = DR_STMT (dr);
-	  stmt_info = vinfo_for_stmt (stmt);
-	  /* For interleaving, only the alignment of the first access
-            matters.  */
-	  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
-	      && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
-	    continue;
-
-	  /* Strided accesses perform only component accesses, alignment is
-	     irrelevant for them.  */
-	  if (STMT_VINFO_STRIDED_P (stmt_info)
-	      && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-	    continue;
-
-	  save_misalignment = DR_MISALIGNMENT (dr);
-	  vect_update_misalignment_for_peel (dr, dr0, npeel);
-	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
-	  SET_DR_MISALIGNMENT (dr, save_misalignment);
-
-	  if (!supportable_dr_alignment)
-	    {
-	      do_peeling = false;
-	      break;
-	    }
-	}
+      /* Ensure that all datarefs can be vectorized after the peel.  */
+      if (!vect_peeling_supportable (loop_vinfo, dr0, npeel))
+	do_peeling = false;
 
+      /* Check if all datarefs are supportable and log.  */
       if (do_peeling && known_alignment_for_access_p (dr0) && npeel == 0)
         {
           stat = vect_verify_datarefs_alignment (loop_vinfo);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/5] Vect peeling cost model
  2017-05-09 10:38               ` Richard Biener
                                   ` (2 preceding siblings ...)
  2017-05-11 11:18                 ` [PATCH 2/5] " Robin Dapp
@ 2017-05-11 11:19                 ` Robin Dapp
  2017-05-11 11:20                 ` [PATCH 4/5] " Robin Dapp
  2017-05-11 11:59                 ` [PATCH 5/5] " Robin Dapp
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

gcc/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
	Return peeling info and set costs to zero for unlimited cost
	model.
	(vect_enhance_data_refs_alignment): Also inspect all datarefs
	with unknown misalignment. Compute and costs for unknown
	misalignment, compare them to the costs for known misalignment
	and choose the cheapest for peeling.

[-- Attachment #2: gcc-peeling-p3.diff --]
[-- Type: text/x-patch, Size: 10947 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 5907856..1bba2b9 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
    choosing an option with the lowest cost (if cost model is enabled) or the
    option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_htab,
 				       loop_vec_info loop_vinfo,
                                        unsigned int *npeel,
@@ -1365,11 +1365,13 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
        res.peel_info.count = 0;
        peeling_htab->traverse <_vect_peel_extended_info *,
 	   		       vect_peeling_hash_get_most_frequent> (&res);
+       res.inside_cost = 0;
+       res.outside_cost = 0;
      }
 
    *npeel = res.peel_info.npeel;
    *body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1518,6 +1520,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
+  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1525,7 +1528,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1651,11 +1655,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   npeel_tmp += nelements;
                 }
 
-              all_misalignments_unknown = false;
-              /* Data-ref that was chosen for the case that all the
-                 misalignments are unknown is not relevant anymore, since we
-                 have a data-ref with known alignment.  */
-              dr0 = NULL;
+	      one_misalignment_known = true;
             }
           else
             {
@@ -1663,35 +1663,32 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                  peeling for data-ref that has the maximum number of data-refs
                  with the same alignment, unless the target prefers to align
                  stores over load.  */
-              if (all_misalignments_unknown)
-                {
-		  unsigned same_align_drs
-		    = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-                  if (!dr0
-		      || same_align_drs_max < same_align_drs)
-                    {
-                      same_align_drs_max = same_align_drs;
-                      dr0 = dr;
-                    }
-		  /* For data-refs with the same number of related
-		     accesses prefer the one where the misalign
-		     computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
-		    {
-		      struct loop *ivloop0, *ivloop;
-		      ivloop0 = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr0));
-		      ivloop = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr));
-		      if ((ivloop && !ivloop0)
-			  || (ivloop && ivloop0
-			      && flow_loop_nested_p (ivloop, ivloop0)))
-			dr0 = dr;
-		    }
+	      unsigned same_align_drs
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+	      if (!dr0
+		  || same_align_drs_max < same_align_drs)
+		{
+		  same_align_drs_max = same_align_drs;
+		  dr0 = dr;
+		}
+	      /* For data-refs with the same number of related
+		 accesses prefer the one where the misalign
+		 computation will be invariant in the outermost loop.  */
+	      else if (same_align_drs_max == same_align_drs)
+		{
+		  struct loop *ivloop0, *ivloop;
+		  ivloop0 = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr0));
+		  ivloop = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr));
+		  if ((ivloop && !ivloop0)
+		      || (ivloop && ivloop0
+			  && flow_loop_nested_p (ivloop, ivloop0)))
+		    dr0 = dr;
+		}
 
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
+	      if (!first_store && DR_IS_WRITE (dr))
+		first_store = dr;
 
               /* If there are both known and unknown misaligned accesses in the
                  loop, we choose peeling amount according to the known
@@ -1702,6 +1699,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   if (!first_store && DR_IS_WRITE (dr))
                     first_store = dr;
                 }
+
+	      one_misalignment_unknown = true;
             }
         }
       else
@@ -1722,8 +1721,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
+  unsigned int unknown_align_inside_cost = INT_MAX;
+  unsigned int unknown_align_outside_cost = INT_MAX;
+
   if (do_peeling
-      && all_misalignments_unknown
+      && one_misalignment_unknown
       && vect_supportable_dr_alignment (dr0, false))
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
@@ -1731,62 +1733,51 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
          drs with same alignment into account).  */
       if (first_store && DR_IS_READ (dr0))
         {
-          unsigned int load_inside_cost = 0, load_outside_cost = 0;
-          unsigned int store_inside_cost = 0, store_outside_cost = 0;
-          unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
-          unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
+	  unsigned int load_inside_cost = 0;
+	  unsigned int load_outside_cost = 0;
+	  unsigned int store_inside_cost = 0;
+	  unsigned int store_outside_cost = 0;
 	  stmt_vector_for_cost dummy;
 	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (dr0,
+					  &load_inside_cost,
+					  &load_outside_cost,
+					  &dummy, vf / 2, vf);
+	  dummy.release ();
 
-          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
-				     &dummy);
-          vect_get_data_access_cost (first_store, &store_inside_cost,
-				     &store_outside_cost, &dummy);
-
+	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (first_store,
+					  &store_inside_cost,
+					  &store_outside_cost,
+					  &dummy, vf / 2, vf);
 	  dummy.release ();
 
-          /* Calculate the penalty for leaving FIRST_STORE unaligned (by
-             aligning the load DR0).  */
-          load_inside_penalty = store_inside_cost;
-          load_outside_penalty = store_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-			  DR_STMT (first_store))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                load_inside_penalty += load_inside_cost;
-                load_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                load_inside_penalty += store_inside_cost;
-                load_outside_penalty += store_outside_cost;
-              }
-
-          /* Calculate the penalty for leaving DR0 unaligned (by
-             aligning the FIRST_STORE).  */
-          store_inside_penalty = load_inside_cost;
-          store_outside_penalty = load_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-		      DR_STMT (dr0))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                store_inside_penalty += load_inside_cost;
-                store_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                store_inside_penalty += store_inside_cost;
-                store_outside_penalty += store_outside_cost;
-              }
-
-          if (load_inside_penalty > store_inside_penalty
-              || (load_inside_penalty == store_inside_penalty
-                  && load_outside_penalty > store_outside_penalty))
-            dr0 = first_store;
+          if (load_inside_cost > store_inside_cost
+              || (load_inside_cost == store_inside_cost
+		  && load_outside_cost > store_outside_cost))
+	    {
+	      dr0 = first_store;
+	      unknown_align_inside_cost = store_inside_cost;
+	      unknown_align_outside_cost = store_outside_cost;
+	    }
+	  else
+	    {
+	      unknown_align_inside_cost = load_inside_cost;
+	      unknown_align_outside_cost = load_outside_cost;
+	    }
+
+	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+	  prologue_cost_vec.create (2);
+	  epilogue_cost_vec.create (2);
+
+	  int dummy2;
+	  unknown_align_outside_cost += vect_get_known_peeling_cost
+	    (loop_vinfo, vf / 2, &dummy2,
+	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	     &prologue_cost_vec, &epilogue_cost_vec);
+
+	  prologue_cost_vec.release ();
+	  epilogue_cost_vec.release ();
         }
 
       /* In case there are only loads with different unknown misalignments, use
@@ -1804,22 +1795,35 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         do_peeling = false;
     }
 
-  if (do_peeling && !dr0)
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_known_alignment.inside_cost = INT_MAX;
+  peel_for_known_alignment.outside_cost = INT_MAX;
+  peel_for_known_alignment.peel_info.count = 0;
+  peel_for_known_alignment.peel_info.dr = NULL;
+
+  if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
          unless aligned. So we try to choose the best possible peeling.  */
 
-      /* We should get here only if there are drs with known misalignment.  */
-      gcc_assert (!all_misalignments_unknown);
-
       /* Choose the best peeling from the hash table.  */
-      dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
-						   loop_vinfo, &npeel,
-						   &body_cost_vec);
-      if (!dr0 || !npeel)
-        do_peeling = false;
+      peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
+	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
+      dr0_known_align = peel_for_known_alignment.peel_info.dr;
+    }
+
+  /* Compare costs of peeling for known and unknown alignment. */
+  if (dr0_known_align != NULL
+      && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
+    {
+      dr0 = dr0_known_align;
+      if (!npeel)
+	do_peeling = false;
     }
 
+  if (dr0 == NULL)
+    do_peeling = false;
+
   if (do_peeling)
     {
       stmt = DR_STMT (dr0);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 4/5] Vect peeling cost model
  2017-05-09 10:38               ` Richard Biener
                                   ` (3 preceding siblings ...)
  2017-05-11 11:19                 ` [PATCH 3/5] " Robin Dapp
@ 2017-05-11 11:20                 ` Robin Dapp
  2017-05-11 15:30                   ` [PATCH 4/5 v2] " Robin Dapp
  2017-05-11 11:59                 ` [PATCH 5/5] " Robin Dapp
  5 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 289 bytes --]

gcc/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
	Remove check for supportable_dr_alignment, compute costs for
	doing no peeling at all, compare to the best peeling costs so
	far and do no peeling if cheaper.

[-- Attachment #2: gcc-peeling-p4.diff --]
[-- Type: text/x-patch, Size: 9291 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 1bba2b9..00c4d5d 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1517,10 +1517,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 {
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
-  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1601,7 +1599,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
-      supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
       do_peeling = vector_alignment_reachable_p (dr);
       if (do_peeling)
         {
@@ -1690,16 +1687,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	      if (!first_store && DR_IS_WRITE (dr))
 		first_store = dr;
 
-              /* If there are both known and unknown misaligned accesses in the
-                 loop, we choose peeling amount according to the known
-                 accesses.  */
-              if (!supportable_dr_alignment)
-                {
-                  dr0 = dr;
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
-
 	      one_misalignment_unknown = true;
             }
         }
@@ -1721,81 +1708,85 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
-  unsigned int unknown_align_inside_cost = INT_MAX;
-  unsigned int unknown_align_outside_cost = INT_MAX;
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  struct _vect_peel_extended_info best_peel;
+
+  peel_for_unknown_alignment.inside_cost = INT_MAX;
+  peel_for_unknown_alignment.outside_cost = INT_MAX;
+  peel_for_unknown_alignment.peel_info.count = 0;
 
   if (do_peeling
-      && one_misalignment_unknown
-      && vect_supportable_dr_alignment (dr0, false))
+      && one_misalignment_unknown)
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
          misaligned stores are more expensive than misaligned loads (taking
          drs with same alignment into account).  */
-      if (first_store && DR_IS_READ (dr0))
-        {
-	  unsigned int load_inside_cost = 0;
-	  unsigned int load_outside_cost = 0;
-	  unsigned int store_inside_cost = 0;
-	  unsigned int store_outside_cost = 0;
-	  stmt_vector_for_cost dummy;
-	  dummy.create (2);
-	  vect_get_peeling_costs_all_drs (dr0,
-					  &load_inside_cost,
-					  &load_outside_cost,
-					  &dummy, vf / 2, vf);
-	  dummy.release ();
-
+      unsigned int load_inside_cost = 0;
+      unsigned int load_outside_cost = 0;
+      unsigned int store_inside_cost = 0;
+      unsigned int store_outside_cost = 0;
+
+      stmt_vector_for_cost dummy;
+      dummy.create (2);
+      vect_get_peeling_costs_all_drs (dr0,
+				      &load_inside_cost,
+				      &load_outside_cost,
+				      &dummy, vf / 2, vf);
+      dummy.release ();
+
+      if (first_store)
+	{
 	  dummy.create (2);
 	  vect_get_peeling_costs_all_drs (first_store,
 					  &store_inside_cost,
 					  &store_outside_cost,
 					  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+      else
+	{
+	  store_inside_cost = INT_MAX;
+	  store_outside_cost = INT_MAX;
+	}
 
-          if (load_inside_cost > store_inside_cost
-              || (load_inside_cost == store_inside_cost
-		  && load_outside_cost > store_outside_cost))
-	    {
-	      dr0 = first_store;
-	      unknown_align_inside_cost = store_inside_cost;
-	      unknown_align_outside_cost = store_outside_cost;
-	    }
-	  else
-	    {
-	      unknown_align_inside_cost = load_inside_cost;
-	      unknown_align_outside_cost = load_outside_cost;
-	    }
+      if (load_inside_cost > store_inside_cost
+	  || (load_inside_cost == store_inside_cost
+	      && load_outside_cost > store_outside_cost))
+	{
+	  dr0 = first_store;
+	  peel_for_unknown_alignment.inside_cost = store_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = store_outside_cost;
+	}
+      else
+	{
+	  peel_for_unknown_alignment.inside_cost = load_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = load_outside_cost;
+	}
 
-	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
-	  prologue_cost_vec.create (2);
-	  epilogue_cost_vec.create (2);
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
 
-	  int dummy2;
-	  unknown_align_outside_cost += vect_get_known_peeling_cost
-	    (loop_vinfo, vf / 2, &dummy2,
-	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-	     &prologue_cost_vec, &epilogue_cost_vec);
+      int dummy2;
+      peel_for_unknown_alignment.outside_cost += vect_get_known_peeling_cost
+	(loop_vinfo, vf / 2, &dummy2,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
 
-	  prologue_cost_vec.release ();
-	  epilogue_cost_vec.release ();
-        }
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
-	 if it may help improving load bandwith when we'd end up using
-	 unaligned loads.  */
-      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
-	  && (vect_supportable_dr_alignment (dr0, false)
-	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
-        do_peeling = false;
+      peel_for_unknown_alignment.peel_info.count = 1
+	+ STMT_VINFO_SAME_ALIGN_REFS
+	(vinfo_for_stmt (DR_STMT (dr0))).length ();
     }
 
-  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_unknown_alignment.peel_info.npeel = 0;
+  peel_for_unknown_alignment.peel_info.dr = dr0;
+
+  best_peel = peel_for_unknown_alignment;
+
   peel_for_known_alignment.inside_cost = INT_MAX;
   peel_for_known_alignment.outside_cost = INT_MAX;
   peel_for_known_alignment.peel_info.count = 0;
@@ -1804,24 +1795,52 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
-         unless aligned. So we try to choose the best possible peeling.  */
-
-      /* Choose the best peeling from the hash table.  */
+         unless aligned.  So we try to choose the best possible peeling from
+	 the hash table.  */
       peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
 	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
-      dr0_known_align = peel_for_known_alignment.peel_info.dr;
     }
 
   /* Compare costs of peeling for known and unknown alignment. */
-  if (dr0_known_align != NULL
-      && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
-    {
-      dr0 = dr0_known_align;
-      if (!npeel)
-	do_peeling = false;
-    }
+  if (peel_for_known_alignment.peel_info.dr != NULL
+      && peel_for_unknown_alignment.inside_cost
+      >= peel_for_known_alignment.inside_cost)
+    best_peel = peel_for_known_alignment;
+
+  /* Calculate the penalty for no peeling, i.e. leaving everything
+     unaligned.
+     TODO: use something like an adapted vect_get_peeling_costs_all_drs.  */
+  unsigned nopeel_inside_cost = 0;
+  unsigned nopeel_outside_cost = 0;
+
+  stmt_vector_for_cost dummy;
+  dummy.create (2);
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    vect_get_data_access_cost (dr, &nopeel_inside_cost,
+			       &nopeel_outside_cost, &dummy);
+  dummy.release ();
+
+  /* Add epilogue costs.  As we do not peel for alignment here, no prologue
+     costs will be recorded.  */
+  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+  prologue_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  int dummy2;
+  nopeel_outside_cost += vect_get_known_peeling_cost
+    (loop_vinfo, 0, &dummy2,
+     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+     &prologue_cost_vec, &epilogue_cost_vec);
+
+  prologue_cost_vec.release ();
+  epilogue_cost_vec.release ();
+
+  npeel = best_peel.peel_info.npeel;
+  dr0 = best_peel.peel_info.dr;
 
-  if (dr0 == NULL)
+  /* If no peeling is not more expensive than the best peeling we
+     have so far, don't perform any peeling.  */
+  if (nopeel_inside_cost <= best_peel.inside_cost)
     do_peeling = false;
 
   if (do_peeling)
@@ -2000,7 +2019,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	      break;
 	    }
 
-	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
+	  enum dr_alignment_support supportable_dr_alignment =
+	    vect_supportable_dr_alignment (dr, false);
 
           if (!supportable_dr_alignment)
             {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 5/5] Vect peeling cost model
  2017-05-09 10:38               ` Richard Biener
                                   ` (4 preceding siblings ...)
  2017-05-11 11:20                 ` [PATCH 4/5] " Robin Dapp
@ 2017-05-11 11:59                 ` Robin Dapp
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 11:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 130 bytes --]

gcc/testsuite/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* gcc.target/s390/vector/vec-nopeel-2.c: New test.

[-- Attachment #2: gcc-peeling-p5.diff --]
[-- Type: text/x-patch, Size: 738 bytes --]

diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 0000000..9b67793
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target s390_vx } */
+/* { dg-options "-O2 -mzarch -march=z13 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+
+void foo(int *restrict a, int *restrict b, unsigned int n)
+{
+  for (unsigned int i = 0; i < n; i++)
+    b[i] = a[i] * 2 + 1;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" } } */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-11 11:17                 ` Robin Dapp
@ 2017-05-11 12:15                   ` Richard Biener
  2017-05-11 12:16                     ` Richard Biener
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Biener @ 2017-05-11 12:15 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 11, 2017 at 1:15 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Included the requested changes in the patches (to follow).  I removed
> the alignment count check now altogether.
>
>> I'm not sure why you test for unlimited_cost_model here as I said
>> elsewhere I'm not sure
>> what not cost modeling means for static decisions.  The purpose of
>> unlimited_cost_model
>> is to always vectorize when possible and omit the runtime
>> profitability check.  So for peeling
>> I'd just always use the cost model.  Thus please drop this check.
>
> Without that, I get one additional FAIL gcc.dg/vect/slp-25.c for x86.
> It is caused by choosing no peeling (inside costs 0) over peeling for
> known alignment with unlimited cost model (inside costs 0 as well).
> Costs 0 for no peeling are caused by count == 0 or rather ncopies = vf /
> nunits == 4 / 8 == 0 in record_stmt_costs ().  Shouldn't always hold
> ncopies > 0? Even 0.5 would have worked here to make no peeling more
> expensive than 0.

That's odd.  I can't get

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c       (revision 247882)
+++ gcc/tree-vect-stmts.c       (working copy)
@@ -95,6 +96,7 @@ record_stmt_cost (stmt_vector_for_cost *
                  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
                  int misalign, enum vect_cost_model_location where)
 {
+  gcc_assert (count > 0);
   if (body_cost_vec)
     {
       tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;

to ICE with the testcase (unpatched trunk)

Where's that record_stmt_cost call done?  You can't simply use vf/nunits
for SLP.

Richard.

> Test suite on s390x is clean.
>
> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-11 12:15                   ` Richard Biener
@ 2017-05-11 12:16                     ` Richard Biener
  2017-05-11 12:48                       ` Richard Biener
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Biener @ 2017-05-11 12:16 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 11, 2017 at 2:14 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, May 11, 2017 at 1:15 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Included the requested changes in the patches (to follow).  I removed
>> the alignment count check now altogether.
>>
>>> I'm not sure why you test for unlimited_cost_model here as I said
>>> elsewhere I'm not sure
>>> what not cost modeling means for static decisions.  The purpose of
>>> unlimited_cost_model
>>> is to always vectorize when possible and omit the runtime
>>> profitability check.  So for peeling
>>> I'd just always use the cost model.  Thus please drop this check.
>>
>> Without that, I get one additional FAIL gcc.dg/vect/slp-25.c for x86.
>> It is caused by choosing no peeling (inside costs 0) over peeling for
>> known alignment with unlimited cost model (inside costs 0 as well).
>> Costs 0 for no peeling are caused by count == 0 or rather ncopies = vf /
>> nunits == 4 / 8 == 0 in record_stmt_costs ().  Shouldn't always hold
>> ncopies > 0? Even 0.5 would have worked here to make no peeling more
>> expensive than 0.
>
> That's odd.  I can't get
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       (revision 247882)
> +++ gcc/tree-vect-stmts.c       (working copy)
> @@ -95,6 +96,7 @@ record_stmt_cost (stmt_vector_for_cost *
>                   enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
>                   int misalign, enum vect_cost_model_location where)
>  {
> +  gcc_assert (count > 0);
>    if (body_cost_vec)
>      {
>        tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>
> to ICE with the testcase (unpatched trunk)
>
> Where's that record_stmt_cost call done?  You can't simply use vf/nunits
> for SLP.

Ah, of course needs -fvect-cost-model.

I'll investigate.

Richard.

> Richard.
>
>> Test suite on s390x is clean.
>>
>> Regards
>>  Robin
>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC] S/390: Alignment peeling prolog generation
  2017-05-11 12:16                     ` Richard Biener
@ 2017-05-11 12:48                       ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-11 12:48 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 11, 2017 at 2:15 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, May 11, 2017 at 2:14 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, May 11, 2017 at 1:15 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>>> Included the requested changes in the patches (to follow).  I removed
>>> the alignment count check now altogether.
>>>
>>>> I'm not sure why you test for unlimited_cost_model here as I said
>>>> elsewhere I'm not sure
>>>> what not cost modeling means for static decisions.  The purpose of
>>>> unlimited_cost_model
>>>> is to always vectorize when possible and omit the runtime
>>>> profitability check.  So for peeling
>>>> I'd just always use the cost model.  Thus please drop this check.
>>>
>>> Without that, I get one additional FAIL gcc.dg/vect/slp-25.c for x86.
>>> It is caused by choosing no peeling (inside costs 0) over peeling for
>>> known alignment with unlimited cost model (inside costs 0 as well).
>>> Costs 0 for no peeling are caused by count == 0 or rather ncopies = vf /
>>> nunits == 4 / 8 == 0 in record_stmt_costs ().  Shouldn't always hold
>>> ncopies > 0? Even 0.5 would have worked here to make no peeling more
>>> expensive than 0.
>>
>> That's odd.  I can't get
>>
>> Index: gcc/tree-vect-stmts.c
>> ===================================================================
>> --- gcc/tree-vect-stmts.c       (revision 247882)
>> +++ gcc/tree-vect-stmts.c       (working copy)
>> @@ -95,6 +96,7 @@ record_stmt_cost (stmt_vector_for_cost *
>>                   enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
>>                   int misalign, enum vect_cost_model_location where)
>>  {
>> +  gcc_assert (count > 0);
>>    if (body_cost_vec)
>>      {
>>        tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
>>
>> to ICE with the testcase (unpatched trunk)
>>
>> Where's that record_stmt_cost call done?  You can't simply use vf/nunits
>> for SLP.
>
> Ah, of course needs -fvect-cost-model.
>
> I'll investigate.

Ugh.  The vect_peeling_hash_get_lowest_cost isn't handling SLP in any
way, there's quite
some refactoring necessary to fix that.

I suggest (eh) to do

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c   (revision 247734)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -1129,7 +1129,7 @@ vect_get_data_access_cost (struct data_r
   int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  int ncopies = vf / nunits;
+  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */

   if (DR_IS_READ (dr))
     vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,




> Richard.
>
>> Richard.
>>
>>> Test suite on s390x is clean.
>>>
>>> Regards
>>>  Robin
>>>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 4/5 v2] Vect peeling cost model
  2017-05-11 11:20                 ` [PATCH 4/5] " Robin Dapp
@ 2017-05-11 15:30                   ` Robin Dapp
  2017-05-12  9:36                     ` Richard Biener
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-11 15:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

Included the workaround for SLP now. With it, testsuite is clean on x86
as well.

gcc/ChangeLog:

2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_get_data_access_cost):
	Workaround for SLP handling.
	(vect_enhance_data_refs_alignment):
	Remove check for supportable_dr_alignment, compute costs for
	doing no peeling at all, compare to the best peeling costs so
	far and do no peeling if cheaper.

[-- Attachment #2: gcc-peeling-p4.diff --]
[-- Type: text/x-patch, Size: 9751 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 1bba2b9..805de5d 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1134,7 +1134,7 @@ vect_get_data_access_cost (struct data_reference *dr,
   int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  int ncopies = vf / nunits;
+  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */
 
   if (DR_IS_READ (dr))
     vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
@@ -1517,10 +1517,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 {
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
-  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1601,7 +1599,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
-      supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
       do_peeling = vector_alignment_reachable_p (dr);
       if (do_peeling)
         {
@@ -1690,16 +1687,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	      if (!first_store && DR_IS_WRITE (dr))
 		first_store = dr;
 
-              /* If there are both known and unknown misaligned accesses in the
-                 loop, we choose peeling amount according to the known
-                 accesses.  */
-              if (!supportable_dr_alignment)
-                {
-                  dr0 = dr;
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
-
 	      one_misalignment_unknown = true;
             }
         }
@@ -1721,81 +1708,85 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
-  unsigned int unknown_align_inside_cost = INT_MAX;
-  unsigned int unknown_align_outside_cost = INT_MAX;
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  struct _vect_peel_extended_info best_peel;
+
+  peel_for_unknown_alignment.inside_cost = INT_MAX;
+  peel_for_unknown_alignment.outside_cost = INT_MAX;
+  peel_for_unknown_alignment.peel_info.count = 0;
 
   if (do_peeling
-      && one_misalignment_unknown
-      && vect_supportable_dr_alignment (dr0, false))
+      && one_misalignment_unknown)
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
          misaligned stores are more expensive than misaligned loads (taking
          drs with same alignment into account).  */
-      if (first_store && DR_IS_READ (dr0))
-        {
-	  unsigned int load_inside_cost = 0;
-	  unsigned int load_outside_cost = 0;
-	  unsigned int store_inside_cost = 0;
-	  unsigned int store_outside_cost = 0;
-	  stmt_vector_for_cost dummy;
-	  dummy.create (2);
-	  vect_get_peeling_costs_all_drs (dr0,
-					  &load_inside_cost,
-					  &load_outside_cost,
-					  &dummy, vf / 2, vf);
-	  dummy.release ();
-
+      unsigned int load_inside_cost = 0;
+      unsigned int load_outside_cost = 0;
+      unsigned int store_inside_cost = 0;
+      unsigned int store_outside_cost = 0;
+
+      stmt_vector_for_cost dummy;
+      dummy.create (2);
+      vect_get_peeling_costs_all_drs (dr0,
+				      &load_inside_cost,
+				      &load_outside_cost,
+				      &dummy, vf / 2, vf);
+      dummy.release ();
+
+      if (first_store)
+	{
 	  dummy.create (2);
 	  vect_get_peeling_costs_all_drs (first_store,
 					  &store_inside_cost,
 					  &store_outside_cost,
 					  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+      else
+	{
+	  store_inside_cost = INT_MAX;
+	  store_outside_cost = INT_MAX;
+	}
 
-          if (load_inside_cost > store_inside_cost
-              || (load_inside_cost == store_inside_cost
-		  && load_outside_cost > store_outside_cost))
-	    {
-	      dr0 = first_store;
-	      unknown_align_inside_cost = store_inside_cost;
-	      unknown_align_outside_cost = store_outside_cost;
-	    }
-	  else
-	    {
-	      unknown_align_inside_cost = load_inside_cost;
-	      unknown_align_outside_cost = load_outside_cost;
-	    }
+      if (load_inside_cost > store_inside_cost
+	  || (load_inside_cost == store_inside_cost
+	      && load_outside_cost > store_outside_cost))
+	{
+	  dr0 = first_store;
+	  peel_for_unknown_alignment.inside_cost = store_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = store_outside_cost;
+	}
+      else
+	{
+	  peel_for_unknown_alignment.inside_cost = load_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = load_outside_cost;
+	}
 
-	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
-	  prologue_cost_vec.create (2);
-	  epilogue_cost_vec.create (2);
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
 
-	  int dummy2;
-	  unknown_align_outside_cost += vect_get_known_peeling_cost
-	    (loop_vinfo, vf / 2, &dummy2,
-	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-	     &prologue_cost_vec, &epilogue_cost_vec);
+      int dummy2;
+      peel_for_unknown_alignment.outside_cost += vect_get_known_peeling_cost
+	(loop_vinfo, vf / 2, &dummy2,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
 
-	  prologue_cost_vec.release ();
-	  epilogue_cost_vec.release ();
-        }
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
-	 if it may help improving load bandwith when we'd end up using
-	 unaligned loads.  */
-      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
-	  && (vect_supportable_dr_alignment (dr0, false)
-	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
-        do_peeling = false;
+      peel_for_unknown_alignment.peel_info.count = 1
+	+ STMT_VINFO_SAME_ALIGN_REFS
+	(vinfo_for_stmt (DR_STMT (dr0))).length ();
     }
 
-  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_unknown_alignment.peel_info.npeel = 0;
+  peel_for_unknown_alignment.peel_info.dr = dr0;
+
+  best_peel = peel_for_unknown_alignment;
+
   peel_for_known_alignment.inside_cost = INT_MAX;
   peel_for_known_alignment.outside_cost = INT_MAX;
   peel_for_known_alignment.peel_info.count = 0;
@@ -1804,24 +1795,52 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
-         unless aligned. So we try to choose the best possible peeling.  */
-
-      /* Choose the best peeling from the hash table.  */
+         unless aligned.  So we try to choose the best possible peeling from
+	 the hash table.  */
       peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
 	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
-      dr0_known_align = peel_for_known_alignment.peel_info.dr;
     }
 
   /* Compare costs of peeling for known and unknown alignment. */
-  if (dr0_known_align != NULL
-      && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
-    {
-      dr0 = dr0_known_align;
-      if (!npeel)
-	do_peeling = false;
-    }
+  if (peel_for_known_alignment.peel_info.dr != NULL
+      && peel_for_unknown_alignment.inside_cost
+      >= peel_for_known_alignment.inside_cost)
+    best_peel = peel_for_known_alignment;
+
+  /* Calculate the penalty for no peeling, i.e. leaving everything
+     unaligned.
+     TODO: use something like an adapted vect_get_peeling_costs_all_drs.  */
+  unsigned nopeel_inside_cost = 0;
+  unsigned nopeel_outside_cost = 0;
+
+  stmt_vector_for_cost dummy;
+  dummy.create (2);
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    vect_get_data_access_cost (dr, &nopeel_inside_cost,
+			       &nopeel_outside_cost, &dummy);
+  dummy.release ();
+
+  /* Add epilogue costs.  As we do not peel for alignment here, no prologue
+     costs will be recorded.  */
+  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+  prologue_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  int dummy2;
+  nopeel_outside_cost += vect_get_known_peeling_cost
+    (loop_vinfo, 0, &dummy2,
+     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+     &prologue_cost_vec, &epilogue_cost_vec);
+
+  prologue_cost_vec.release ();
+  epilogue_cost_vec.release ();
+
+  npeel = best_peel.peel_info.npeel;
+  dr0 = best_peel.peel_info.dr;
 
-  if (dr0 == NULL)
+  /* If no peeling is not more expensive than the best peeling we
+     have so far, don't perform any peeling.  */
+  if (nopeel_inside_cost <= best_peel.inside_cost)
     do_peeling = false;
 
   if (do_peeling)
@@ -2000,7 +2019,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 	      break;
 	    }
 
-	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
+	  enum dr_alignment_support supportable_dr_alignment =
+	    vect_supportable_dr_alignment (dr, false);
 
           if (!supportable_dr_alignment)
             {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5 v2] Vect peeling cost model
  2017-05-11 15:30                   ` [PATCH 4/5 v2] " Robin Dapp
@ 2017-05-12  9:36                     ` Richard Biener
  2017-05-23 15:58                       ` [PATCH 2/5 v3] " Robin Dapp
                                         ` (5 more replies)
  0 siblings, 6 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-12  9:36 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng

On Thu, May 11, 2017 at 5:21 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Included the workaround for SLP now. With it, testsuite is clean on x86
> as well.

All patches in the series are ok now.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2017-05-11  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_get_data_access_cost):
>         Workaround for SLP handling.
>         (vect_enhance_data_refs_alignment):
>         Remove check for supportable_dr_alignment, compute costs for
>         doing no peeling at all, compare to the best peeling costs so
>         far and do no peeling if cheaper.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 2/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
@ 2017-05-23 15:58                       ` Robin Dapp
  2017-05-23 19:25                         ` Richard Sandiford
  2017-05-23 15:58                       ` [PATCH 1/5 " Robin Dapp
                                         ` (4 subsequent siblings)
  5 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 15:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

gcc/ChangeLog:

2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
	Rename.
	(vect_get_peeling_costs_all_drs): Create function.
	(vect_peeling_hash_get_lowest_cost):
	Use vect_get_peeling_costs_all_drs.
	(vect_peeling_supportable): Create function.
	(vect_enhance_data_refs_alignment): Use
	vect_peeling_supportable.



[-- Attachment #2: gcc-peeling-p2.diff --]
[-- Type: text/x-patch, Size: 8234 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 874fdb5..fe398ea 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -903,7 +903,11 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
 }
 
 
-/* Function vect_update_misalignment_for_peel
+/* Function vect_update_misalignment_for_peel.
+   Sets DR's misalignment
+   - to 0 if it has the same alignment as DR_PEEL,
+   - to the misalignment computed using NPEEL if DR's salignment is known,
+   - to -1 (unknown) otherwise.
 
    DR - the data reference whose misalignment is to be adjusted.
    DR_PEEL - the data reference whose misalignment is being made
@@ -916,7 +920,7 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
                                    struct data_reference *dr_peel, int npeel)
 {
   unsigned int i;
-  vec<dr_p> same_align_drs;
+  vec<dr_p> same_aligned_drs;
   struct data_reference *current_dr;
   int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr))));
   int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel))));
@@ -932,9 +936,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
 
   /* It can be assumed that the data refs with the same alignment as dr_peel
      are aligned in the vector loop.  */
-  same_align_drs
+  same_aligned_drs
     = STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (DR_STMT (dr_peel)));
-  FOR_EACH_VEC_ELT (same_align_drs, i, current_dr)
+  FOR_EACH_VEC_ELT (same_aligned_drs, i, current_dr)
     {
       if (current_dr != dr)
         continue;
@@ -1234,27 +1238,23 @@ vect_peeling_hash_get_most_frequent (_vect_peel_info **slot,
   return 1;
 }
 
+/* Get the costs of peeling NPEEL iterations checking data access costs
+   for all data refs. */
 
-/* Traverse peeling hash table and calculate cost for each peeling option.
-   Find the one with the lowest cost.  */
-
-int
-vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
-				   _vect_peel_extended_info *min)
+static void
+vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+				unsigned int *inside_cost,
+				unsigned int *outside_cost,
+				stmt_vector_for_cost *body_cost_vec,
+				unsigned int npeel, unsigned int vf)
 {
-  vect_peel_info elem = *slot;
-  int save_misalignment, dummy;
-  unsigned int inside_cost = 0, outside_cost = 0, i;
-  gimple *stmt = DR_STMT (elem->dr);
+  gimple *stmt = DR_STMT (dr0);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  struct data_reference *dr;
-  stmt_vector_for_cost prologue_cost_vec, body_cost_vec, epilogue_cost_vec;
 
-  prologue_cost_vec.create (2);
-  body_cost_vec.create (2);
-  epilogue_cost_vec.create (2);
+  unsigned i;
+  data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
@@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
 	continue;
 
+      int save_misalignment;
       save_misalignment = DR_MISALIGNMENT (dr);
-      vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
-      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
-				 &body_cost_vec);
+      if (dr == dr0 && npeel == vf / 2)
+	SET_DR_MISALIGNMENT (dr, 0);
+      else
+	vect_update_misalignment_for_peel (dr, dr0, npeel);
+      vect_get_data_access_cost (dr, inside_cost, outside_cost,
+				 body_cost_vec);
       SET_DR_MISALIGNMENT (dr, save_misalignment);
     }
+}
+
+/* Traverse peeling hash table and calculate cost for each peeling option.
+   Find the one with the lowest cost.  */
+
+int
+vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
+				   _vect_peel_extended_info *min)
+{
+  vect_peel_info elem = *slot;
+  int dummy;
+  unsigned int inside_cost = 0, outside_cost = 0;
+  gimple *stmt = DR_STMT (elem->dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  stmt_vector_for_cost prologue_cost_vec, body_cost_vec,
+		       epilogue_cost_vec;
+
+  prologue_cost_vec.create (2);
+  body_cost_vec.create (2);
+  epilogue_cost_vec.create (2);
+
+  vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
+				  &body_cost_vec, elem->npeel, 0);
 
   outside_cost += vect_get_known_peeling_cost
     (loop_vinfo, elem->npeel, &dummy,
@@ -1292,7 +1320,8 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
   epilogue_cost_vec.release ();
 
   if (inside_cost < min->inside_cost
-      || (inside_cost == min->inside_cost && outside_cost < min->outside_cost))
+      || (inside_cost == min->inside_cost
+	  && outside_cost < min->outside_cost))
     {
       min->inside_cost = inside_cost;
       min->outside_cost = outside_cost;
@@ -1300,6 +1329,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
       min->body_cost_vec = body_cost_vec;
       min->peel_info.dr = elem->dr;
       min->peel_info.npeel = elem->npeel;
+      min->peel_info.count = elem->count;
     }
   else
     body_cost_vec.release ();
@@ -1342,6 +1372,52 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
    return res.peel_info.dr;
 }
 
+/* Return true if the new peeling NPEEL is supported.  */
+
+static bool
+vect_peeling_supportable (loop_vec_info loop_vinfo, struct data_reference *dr0,
+			  unsigned npeel)
+{
+  unsigned i;
+  struct data_reference *dr = NULL;
+  vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+  gimple *stmt;
+  stmt_vec_info stmt_info;
+  enum dr_alignment_support supportable_dr_alignment;
+
+  /* Ensure that all data refs can be vectorized after the peel.  */
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      int save_misalignment;
+
+      if (dr == dr0)
+	continue;
+
+      stmt = DR_STMT (dr);
+      stmt_info = vinfo_for_stmt (stmt);
+      /* For interleaving, only the alignment of the first access
+	 matters.  */
+      if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+	  && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
+	continue;
+
+      /* Strided accesses perform only component accesses, alignment is
+	 irrelevant for them.  */
+      if (STMT_VINFO_STRIDED_P (stmt_info)
+	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
+	continue;
+
+      save_misalignment = DR_MISALIGNMENT (dr);
+      vect_update_misalignment_for_peel (dr, dr0, npeel);
+      supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
+      SET_DR_MISALIGNMENT (dr, save_misalignment);
+
+      if (!supportable_dr_alignment)
+	return false;
+    }
+
+  return true;
+}
 
 /* Function vect_enhance_data_refs_alignment
 
@@ -1780,40 +1856,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                              "Try peeling by %d\n", npeel);
         }
 
-      /* Ensure that all data refs can be vectorized after the peel.  */
-      FOR_EACH_VEC_ELT (datarefs, i, dr)
-        {
-          int save_misalignment;
-
-	  if (dr == dr0)
-	    continue;
-
-	  stmt = DR_STMT (dr);
-	  stmt_info = vinfo_for_stmt (stmt);
-	  /* For interleaving, only the alignment of the first access
-            matters.  */
-	  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
-	      && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
-	    continue;
-
-	  /* Strided accesses perform only component accesses, alignment is
-	     irrelevant for them.  */
-	  if (STMT_VINFO_STRIDED_P (stmt_info)
-	      && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-	    continue;
-
-	  save_misalignment = DR_MISALIGNMENT (dr);
-	  vect_update_misalignment_for_peel (dr, dr0, npeel);
-	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
-	  SET_DR_MISALIGNMENT (dr, save_misalignment);
-
-	  if (!supportable_dr_alignment)
-	    {
-	      do_peeling = false;
-	      break;
-	    }
-	}
+      /* Ensure that all datarefs can be vectorized after the peel.  */
+      if (!vect_peeling_supportable (loop_vinfo, dr0, npeel))
+	do_peeling = false;
 
+      /* Check if all datarefs are supportable and log.  */
       if (do_peeling && known_alignment_for_access_p (dr0) && npeel == 0)
         {
           stat = vect_verify_datarefs_alignment (loop_vinfo);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
  2017-05-23 15:58                       ` [PATCH 2/5 v3] " Robin Dapp
@ 2017-05-23 15:58                       ` Robin Dapp
  2017-05-23 15:58                       ` [PATCH 0/5 " Robin Dapp
                                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 15:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 387 bytes --]

gcc/ChangeLog:

2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
	Create DR_HAS_NEGATIVE_STEP.
	(vect_update_misalignment_for_peel): Define DR_MISALIGNMENT.
	(vect_enhance_data_refs_alignment): Use.
	(vect_duplicate_ssa_name_ptr_info): Use.
	* tree-vectorizer.h (dr_misalignment): Use.
	(known_alignment_for_access_p): Use.

[-- Attachment #2: gcc-peeling-p1.diff --]
[-- Type: text/x-patch, Size: 5334 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 67cc969..874fdb5 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -717,7 +717,7 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
     loop = LOOP_VINFO_LOOP (loop_vinfo);
 
   /* Initialize misalignment to unknown.  */
-  SET_DR_MISALIGNMENT (dr, -1);
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 
   if (tree_fits_shwi_p (DR_STEP (dr)))
     misalign = DR_INIT (dr);
@@ -957,8 +957,9 @@ vect_update_misalignment_for_peel (struct data_reference *dr,
     }
 
   if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment to -1.\n");
-  SET_DR_MISALIGNMENT (dr, -1);
+    dump_printf_loc (MSG_NOTE, vect_location, "Setting misalignment " \
+		     "to unknown (-1).\n");
+  SET_DR_MISALIGNMENT (dr, DR_MISALIGNMENT_UNKNOWN);
 }
 
 
@@ -1526,32 +1527,31 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         {
           if (known_alignment_for_access_p (dr))
             {
-              unsigned int npeel_tmp;
+              unsigned int npeel_tmp = 0;
 	      bool negative = tree_int_cst_compare (DR_STEP (dr),
 						    size_zero_node) < 0;
 
-              /* Save info about DR in the hash table.  */
               vectype = STMT_VINFO_VECTYPE (stmt_info);
               nelements = TYPE_VECTOR_SUBPARTS (vectype);
               mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
                                                 TREE_TYPE (DR_REF (dr))));
-              npeel_tmp = (negative
-			   ? (mis - nelements) : (nelements - mis))
-		  & (nelements - 1);
+	      if (DR_MISALIGNMENT (dr) != 0)
+		npeel_tmp = (negative ? (mis - nelements)
+			     : (nelements - mis)) & (nelements - 1);
 
               /* For multiple types, it is possible that the bigger type access
                  will have more than one peeling option.  E.g., a loop with two
                  types: one of size (vector size / 4), and the other one of
                  size (vector size / 8).  Vectorization factor will 8.  If both
-                 access are misaligned by 3, the first one needs one scalar
+                 accesses are misaligned by 3, the first one needs one scalar
                  iteration to be aligned, and the second one needs 5.  But the
 		 first one will be aligned also by peeling 5 scalar
                  iterations, and in that case both accesses will be aligned.
                  Hence, except for the immediate peeling amount, we also want
                  to try to add full vector size, while we don't exceed
                  vectorization factor.
-                 We do this automatically for cost model, since we calculate cost
-                 for every peeling option.  */
+                 We do this automatically for cost model, since we calculate
+		 cost for every peeling option.  */
               if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 		{
 		  if (STMT_SLP_TYPE (stmt_info))
@@ -1559,17 +1559,15 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		      = (vf * GROUP_SIZE (stmt_info)) / nelements;
 		  else
 		    possible_npeel_number = vf / nelements;
-		}
 
-              /* Handle the aligned case. We may decide to align some other
-                 access, making DR unaligned.  */
-              if (DR_MISALIGNMENT (dr) == 0)
-                {
-                  npeel_tmp = 0;
-                  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
-                    possible_npeel_number++;
-                }
+		  /* NPEEL_TMP is 0 when there is no misalignment, also allow
+		     peeling off NELEMENTS below.  */
+		  if (DR_MISALIGNMENT (dr) == 0)
+		    possible_npeel_number++;
+		}
 
+	      /* Save info about DR in the hash table.  Also include peeling
+	         amounts according to the explanation above.  */
               for (j = 0; j < possible_npeel_number; j++)
                 {
                   vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
@@ -4182,7 +4180,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr,
   duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr));
   unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info));
   int misalign = DR_MISALIGNMENT (dr);
-  if (misalign == -1)
+  if (misalign == DR_MISALIGNMENT_UNKNOWN)
     mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
   else
     set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index c0bc493..e8d143a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1012,6 +1012,7 @@ dr_misalignment (struct data_reference *dr)
    taking into account peeling/versioning if applied.  */
 #define DR_MISALIGNMENT(DR) dr_misalignment (DR)
 #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
+#define DR_MISALIGNMENT_UNKNOWN (-1)
 
 /* Return TRUE if the data access is aligned, and FALSE otherwise.  */
 
@@ -1027,7 +1028,7 @@ aligned_access_p (struct data_reference *data_ref_info)
 static inline bool
 known_alignment_for_access_p (struct data_reference *data_ref_info)
 {
-  return (DR_MISALIGNMENT (data_ref_info) != -1);
+  return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
 }
 
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 0/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
  2017-05-23 15:58                       ` [PATCH 2/5 v3] " Robin Dapp
  2017-05-23 15:58                       ` [PATCH 1/5 " Robin Dapp
@ 2017-05-23 15:58                       ` Robin Dapp
  2017-05-24  7:51                         ` Richard Biener
  2017-06-03 17:12                         ` Andreas Schwab
  2017-05-23 15:59                       ` [PATCH 5/5 " Robin Dapp
                                         ` (2 subsequent siblings)
  5 siblings, 2 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 15:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

The last version of the patch series caused some regressions for ppc64.
This was largely due to incorrect handling of unsupportable alignment
and should be fixed with the new version.

p2 and p5 have not changed but I'm posting the whole series again for
reference.  p1 only changed comment wording, p3 was adapted to apply on
the trunk.

No regressions on s390x, x86-64 and ppc64.  Bootstrapped.

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 4/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
                                         ` (3 preceding siblings ...)
  2017-05-23 15:59                       ` [PATCH 5/5 " Robin Dapp
@ 2017-05-23 15:59                       ` Robin Dapp
  2017-05-31 13:56                         ` Christophe Lyon
  2017-05-23 16:02                       ` [PATCH 3/5 " Robin Dapp
  5 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 15:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 305 bytes --]

gcc/ChangeLog:

2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_get_data_access_cost):
	Workaround for SLP handling.
	(vect_enhance_data_refs_alignment):
	Compute costs for doing no peeling at all, compare to the best
	peeling costs so far and avoid peeling if cheaper.

[-- Attachment #2: gcc-peeling-p4.diff --]
[-- Type: text/x-patch, Size: 10285 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8cd6edd..05f944a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1134,7 +1134,7 @@ vect_get_data_access_cost (struct data_reference *dr,
   int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  int ncopies = vf / nunits;
+  int ncopies = MAX (1, vf / nunits); /* TODO: Handle SLP properly  */
 
   if (DR_IS_READ (dr))
     vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost,
@@ -1520,7 +1520,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
-  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1530,6 +1529,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   unsigned int npeel = 0;
   bool one_misalignment_known = false;
   bool one_misalignment_unknown = false;
+  bool one_dr_unsupportable = false;
+  struct data_reference *unsupportable_dr = NULL;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1687,20 +1688,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 		    dr0 = dr;
 		}
 
-	      if (!first_store && DR_IS_WRITE (dr))
-		first_store = dr;
+	      one_misalignment_unknown = true;
 
-              /* If there are both known and unknown misaligned accesses in the
-                 loop, we choose peeling amount according to the known
-                 accesses.  */
-              if (!supportable_dr_alignment)
-                {
-                  dr0 = dr;
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
+	      /* Check for data refs with unsupportable alignment that
+	         can be peeled.  */
+	      if (!supportable_dr_alignment)
+	      {
+		one_dr_unsupportable = true;
+		unsupportable_dr = dr;
+	      }
 
-	      one_misalignment_unknown = true;
+	      if (!first_store && DR_IS_WRITE (dr))
+		first_store = dr;
             }
         }
       else
@@ -1721,81 +1720,85 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
-  unsigned int unknown_align_inside_cost = INT_MAX;
-  unsigned int unknown_align_outside_cost = INT_MAX;
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  struct _vect_peel_extended_info peel_for_unknown_alignment;
+  struct _vect_peel_extended_info best_peel;
+
+  peel_for_unknown_alignment.inside_cost = INT_MAX;
+  peel_for_unknown_alignment.outside_cost = INT_MAX;
+  peel_for_unknown_alignment.peel_info.count = 0;
 
   if (do_peeling
-      && one_misalignment_unknown
-      && vect_supportable_dr_alignment (dr0, false))
+      && one_misalignment_unknown)
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
          misaligned stores are more expensive than misaligned loads (taking
          drs with same alignment into account).  */
-      if (first_store && DR_IS_READ (dr0))
-        {
-	  unsigned int load_inside_cost = 0;
-	  unsigned int load_outside_cost = 0;
-	  unsigned int store_inside_cost = 0;
-	  unsigned int store_outside_cost = 0;
-	  stmt_vector_for_cost dummy;
-	  dummy.create (2);
-	  vect_get_peeling_costs_all_drs (dr0,
-					  &load_inside_cost,
-					  &load_outside_cost,
-					  &dummy, vf / 2, vf);
-	  dummy.release ();
-
+      unsigned int load_inside_cost = 0;
+      unsigned int load_outside_cost = 0;
+      unsigned int store_inside_cost = 0;
+      unsigned int store_outside_cost = 0;
+
+      stmt_vector_for_cost dummy;
+      dummy.create (2);
+      vect_get_peeling_costs_all_drs (dr0,
+				      &load_inside_cost,
+				      &load_outside_cost,
+				      &dummy, vf / 2, vf);
+      dummy.release ();
+
+      if (first_store)
+	{
 	  dummy.create (2);
 	  vect_get_peeling_costs_all_drs (first_store,
 					  &store_inside_cost,
 					  &store_outside_cost,
 					  &dummy, vf / 2, vf);
 	  dummy.release ();
+	}
+      else
+	{
+	  store_inside_cost = INT_MAX;
+	  store_outside_cost = INT_MAX;
+	}
 
-          if (load_inside_cost > store_inside_cost
-              || (load_inside_cost == store_inside_cost
-		  && load_outside_cost > store_outside_cost))
-	    {
-	      dr0 = first_store;
-	      unknown_align_inside_cost = store_inside_cost;
-	      unknown_align_outside_cost = store_outside_cost;
-	    }
-	  else
-	    {
-	      unknown_align_inside_cost = load_inside_cost;
-	      unknown_align_outside_cost = load_outside_cost;
-	    }
+      if (load_inside_cost > store_inside_cost
+	  || (load_inside_cost == store_inside_cost
+	      && load_outside_cost > store_outside_cost))
+	{
+	  dr0 = first_store;
+	  peel_for_unknown_alignment.inside_cost = store_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = store_outside_cost;
+	}
+      else
+	{
+	  peel_for_unknown_alignment.inside_cost = load_inside_cost;
+	  peel_for_unknown_alignment.outside_cost = load_outside_cost;
+	}
 
-	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
-	  prologue_cost_vec.create (2);
-	  epilogue_cost_vec.create (2);
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
 
-	  int dummy2;
-	  unknown_align_outside_cost += vect_get_known_peeling_cost
-	    (loop_vinfo, vf / 2, &dummy2,
-	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-	     &prologue_cost_vec, &epilogue_cost_vec);
+      int dummy2;
+      peel_for_unknown_alignment.outside_cost += vect_get_known_peeling_cost
+	(loop_vinfo, vf / 2, &dummy2,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
 
-	  prologue_cost_vec.release ();
-	  epilogue_cost_vec.release ();
-        }
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
 
-      /* Use peeling only if it may help to align other accesses in the loop or
-	 if it may help improving load bandwith when we'd end up using
-	 unaligned loads.  */
-      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (STMT_VINFO_SAME_ALIGN_REFS
-	    (vinfo_for_stmt (DR_STMT (dr0))).length () == 0
-	  && (vect_supportable_dr_alignment (dr0, false)
-	      != dr_unaligned_supported
-	      || (DR_IS_READ (dr0)
-		  && (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		      == builtin_vectorization_cost (unaligned_load,
-						     dr0_vt, -1)))))
-        do_peeling = false;
+      peel_for_unknown_alignment.peel_info.count = 1
+	+ STMT_VINFO_SAME_ALIGN_REFS
+	(vinfo_for_stmt (DR_STMT (dr0))).length ();
     }
 
-  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_unknown_alignment.peel_info.npeel = 0;
+  peel_for_unknown_alignment.peel_info.dr = dr0;
+
+  best_peel = peel_for_unknown_alignment;
+
   peel_for_known_alignment.inside_cost = INT_MAX;
   peel_for_known_alignment.outside_cost = INT_MAX;
   peel_for_known_alignment.peel_info.count = 0;
@@ -1804,25 +1807,69 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
-         unless aligned. So we try to choose the best possible peeling.  */
-
-      /* Choose the best peeling from the hash table.  */
+         unless aligned.  So we try to choose the best possible peeling from
+	 the hash table.  */
       peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
 	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
-      dr0_known_align = peel_for_known_alignment.peel_info.dr;
     }
 
   /* Compare costs of peeling for known and unknown alignment. */
-  if (dr0_known_align != NULL
-      && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
+  if (peel_for_known_alignment.peel_info.dr != NULL
+      && peel_for_unknown_alignment.inside_cost
+      >= peel_for_known_alignment.inside_cost)
     {
-      dr0 = dr0_known_align;
-      if (!npeel)
+      best_peel = peel_for_known_alignment;
+
+      /* If the best peeling for known alignment has NPEEL == 0, perform no
+         peeling at all except if there is an unsupportable dr that we can
+         align.  */
+      if (best_peel.peel_info.npeel == 0 && !one_dr_unsupportable)
 	do_peeling = false;
     }
 
-  if (dr0 == NULL)
-    do_peeling = false;
+  /* If there is an unsupportable data ref, prefer this over all choices so far
+     since we'd have to discard a chosen peeling except when it accidentally
+     aligned the unsupportable data ref.  */
+  if (one_dr_unsupportable)
+    dr0 = unsupportable_dr;
+  else if (do_peeling)
+    {
+      /* Calculate the penalty for no peeling, i.e. leaving everything
+	 unaligned.
+	 TODO: Adapt vect_get_peeling_costs_all_drs and use here.  */
+      unsigned nopeel_inside_cost = 0;
+      unsigned nopeel_outside_cost = 0;
+
+      stmt_vector_for_cost dummy;
+      dummy.create (2);
+      FOR_EACH_VEC_ELT (datarefs, i, dr)
+	vect_get_data_access_cost (dr, &nopeel_inside_cost,
+				   &nopeel_outside_cost, &dummy);
+      dummy.release ();
+
+      /* Add epilogue costs.  As we do not peel for alignment here, no prologue
+	 costs will be recorded.  */
+      stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+      prologue_cost_vec.create (2);
+      epilogue_cost_vec.create (2);
+
+      int dummy2;
+      nopeel_outside_cost += vect_get_known_peeling_cost
+	(loop_vinfo, 0, &dummy2,
+	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	 &prologue_cost_vec, &epilogue_cost_vec);
+
+      prologue_cost_vec.release ();
+      epilogue_cost_vec.release ();
+
+      npeel = best_peel.peel_info.npeel;
+      dr0 = best_peel.peel_info.dr;
+
+      /* If no peeling is not more expensive than the best peeling we
+	 have so far, don't perform any peeling.  */
+      if (nopeel_inside_cost <= best_peel.inside_cost)
+	do_peeling = false;
+    }
 
   if (do_peeling)
     {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 5/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
                                         ` (2 preceding siblings ...)
  2017-05-23 15:58                       ` [PATCH 0/5 " Robin Dapp
@ 2017-05-23 15:59                       ` Robin Dapp
  2017-05-23 15:59                       ` [PATCH 4/5 " Robin Dapp
  2017-05-23 16:02                       ` [PATCH 3/5 " Robin Dapp
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 15:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 132 bytes --]

gcc/testsuite/ChangeLog:

2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* gcc.target/s390/vector/vec-nopeel-2.c: New test.



[-- Attachment #2: gcc-peeling-p5.diff --]
[-- Type: text/x-patch, Size: 738 bytes --]

diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 0000000..9b67793
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target s390_vx } */
+/* { dg-options "-O2 -mzarch -march=z13 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+
+void foo(int *restrict a, int *restrict b, unsigned int n)
+{
+  for (unsigned int i = 0; i < n; i++)
+    b[i] = a[i] * 2 + 1;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" } } */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 3/5 v3] Vect peeling cost model
  2017-05-12  9:36                     ` Richard Biener
                                         ` (4 preceding siblings ...)
  2017-05-23 15:59                       ` [PATCH 4/5 " Robin Dapp
@ 2017-05-23 16:02                       ` Robin Dapp
  5 siblings, 0 replies; 51+ messages in thread
From: Robin Dapp @ 2017-05-23 16:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

gcc/ChangeLog:

2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
	Return peeling info and set costs to zero for unlimited cost
	model.
	(vect_enhance_data_refs_alignment): Also inspect all datarefs
	with unknown misalignment. Compute and costs for unknown
	misalignment, compare them to the costs for known misalignment
	and choose the cheapest for peeling.



[-- Attachment #2: gcc-peeling-p3.diff --]
[-- Type: text/x-patch, Size: 10947 bytes --]

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index fe398ea..8cd6edd 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1342,7 +1342,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
    choosing an option with the lowest cost (if cost model is enabled) or the
    option that aligns as many accesses as possible.  */
 
-static struct data_reference *
+static struct _vect_peel_extended_info
 vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_htab,
 				       loop_vec_info loop_vinfo,
                                        unsigned int *npeel,
@@ -1365,11 +1365,13 @@ vect_peeling_hash_choose_best_peeling (hash_table<peel_info_hasher> *peeling_hta
        res.peel_info.count = 0;
        peeling_htab->traverse <_vect_peel_extended_info *,
 	   		       vect_peeling_hash_get_most_frequent> (&res);
+       res.inside_cost = 0;
+       res.outside_cost = 0;
      }
 
    *npeel = res.peel_info.npeel;
    *body_cost_vec = res.body_cost_vec;
-   return res.peel_info.dr;
+   return res;
 }
 
 /* Return true if the new peeling NPEEL is supported.  */
@@ -1518,6 +1520,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   enum dr_alignment_support supportable_dr_alignment;
   struct data_reference *dr0 = NULL, *first_store = NULL;
   struct data_reference *dr;
+  struct data_reference *dr0_known_align = NULL;
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
@@ -1525,7 +1528,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   gimple *stmt;
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
-  bool all_misalignments_unknown = true;
+  bool one_misalignment_known = false;
+  bool one_misalignment_unknown = false;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1651,11 +1655,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   npeel_tmp += nelements;
                 }
 
-              all_misalignments_unknown = false;
-              /* Data-ref that was chosen for the case that all the
-                 misalignments are unknown is not relevant anymore, since we
-                 have a data-ref with known alignment.  */
-              dr0 = NULL;
+	      one_misalignment_known = true;
             }
           else
             {
@@ -1663,35 +1663,32 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                  peeling for data-ref that has the maximum number of data-refs
                  with the same alignment, unless the target prefers to align
                  stores over load.  */
-              if (all_misalignments_unknown)
-                {
-		  unsigned same_align_drs
-		    = STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
-                  if (!dr0
-		      || same_align_drs_max < same_align_drs)
-                    {
-                      same_align_drs_max = same_align_drs;
-                      dr0 = dr;
-                    }
-		  /* For data-refs with the same number of related
-		     accesses prefer the one where the misalign
-		     computation will be invariant in the outermost loop.  */
-		  else if (same_align_drs_max == same_align_drs)
-		    {
-		      struct loop *ivloop0, *ivloop;
-		      ivloop0 = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr0));
-		      ivloop = outermost_invariant_loop_for_expr
-			  (loop, DR_BASE_ADDRESS (dr));
-		      if ((ivloop && !ivloop0)
-			  || (ivloop && ivloop0
-			      && flow_loop_nested_p (ivloop, ivloop0)))
-			dr0 = dr;
-		    }
+	      unsigned same_align_drs
+		= STMT_VINFO_SAME_ALIGN_REFS (stmt_info).length ();
+	      if (!dr0
+		  || same_align_drs_max < same_align_drs)
+		{
+		  same_align_drs_max = same_align_drs;
+		  dr0 = dr;
+		}
+	      /* For data-refs with the same number of related
+		 accesses prefer the one where the misalign
+		 computation will be invariant in the outermost loop.  */
+	      else if (same_align_drs_max == same_align_drs)
+		{
+		  struct loop *ivloop0, *ivloop;
+		  ivloop0 = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr0));
+		  ivloop = outermost_invariant_loop_for_expr
+		    (loop, DR_BASE_ADDRESS (dr));
+		  if ((ivloop && !ivloop0)
+		      || (ivloop && ivloop0
+			  && flow_loop_nested_p (ivloop, ivloop0)))
+		    dr0 = dr;
+		}
 
-                  if (!first_store && DR_IS_WRITE (dr))
-                    first_store = dr;
-                }
+	      if (!first_store && DR_IS_WRITE (dr))
+		first_store = dr;
 
               /* If there are both known and unknown misaligned accesses in the
                  loop, we choose peeling amount according to the known
@@ -1702,6 +1699,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                   if (!first_store && DR_IS_WRITE (dr))
                     first_store = dr;
                 }
+
+	      one_misalignment_unknown = true;
             }
         }
       else
@@ -1722,8 +1721,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
       || loop->inner)
     do_peeling = false;
 
+  unsigned int unknown_align_inside_cost = INT_MAX;
+  unsigned int unknown_align_outside_cost = INT_MAX;
+
   if (do_peeling
-      && all_misalignments_unknown
+      && one_misalignment_unknown
       && vect_supportable_dr_alignment (dr0, false))
     {
       /* Check if the target requires to prefer stores over loads, i.e., if
@@ -1731,62 +1733,51 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
          drs with same alignment into account).  */
       if (first_store && DR_IS_READ (dr0))
         {
-          unsigned int load_inside_cost = 0, load_outside_cost = 0;
-          unsigned int store_inside_cost = 0, store_outside_cost = 0;
-          unsigned int load_inside_penalty = 0, load_outside_penalty = 0;
-          unsigned int store_inside_penalty = 0, store_outside_penalty = 0;
+	  unsigned int load_inside_cost = 0;
+	  unsigned int load_outside_cost = 0;
+	  unsigned int store_inside_cost = 0;
+	  unsigned int store_outside_cost = 0;
 	  stmt_vector_for_cost dummy;
 	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (dr0,
+					  &load_inside_cost,
+					  &load_outside_cost,
+					  &dummy, vf / 2, vf);
+	  dummy.release ();
 
-          vect_get_data_access_cost (dr0, &load_inside_cost, &load_outside_cost,
-				     &dummy);
-          vect_get_data_access_cost (first_store, &store_inside_cost,
-				     &store_outside_cost, &dummy);
-
+	  dummy.create (2);
+	  vect_get_peeling_costs_all_drs (first_store,
+					  &store_inside_cost,
+					  &store_outside_cost,
+					  &dummy, vf / 2, vf);
 	  dummy.release ();
 
-          /* Calculate the penalty for leaving FIRST_STORE unaligned (by
-             aligning the load DR0).  */
-          load_inside_penalty = store_inside_cost;
-          load_outside_penalty = store_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-			  DR_STMT (first_store))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                load_inside_penalty += load_inside_cost;
-                load_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                load_inside_penalty += store_inside_cost;
-                load_outside_penalty += store_outside_cost;
-              }
-
-          /* Calculate the penalty for leaving DR0 unaligned (by
-             aligning the FIRST_STORE).  */
-          store_inside_penalty = load_inside_cost;
-          store_outside_penalty = load_outside_cost;
-          for (i = 0;
-	       STMT_VINFO_SAME_ALIGN_REFS (vinfo_for_stmt (
-		      DR_STMT (dr0))).iterate (i, &dr);
-               i++)
-            if (DR_IS_READ (dr))
-              {
-                store_inside_penalty += load_inside_cost;
-                store_outside_penalty += load_outside_cost;
-              }
-            else
-              {
-                store_inside_penalty += store_inside_cost;
-                store_outside_penalty += store_outside_cost;
-              }
-
-          if (load_inside_penalty > store_inside_penalty
-              || (load_inside_penalty == store_inside_penalty
-                  && load_outside_penalty > store_outside_penalty))
-            dr0 = first_store;
+          if (load_inside_cost > store_inside_cost
+              || (load_inside_cost == store_inside_cost
+		  && load_outside_cost > store_outside_cost))
+	    {
+	      dr0 = first_store;
+	      unknown_align_inside_cost = store_inside_cost;
+	      unknown_align_outside_cost = store_outside_cost;
+	    }
+	  else
+	    {
+	      unknown_align_inside_cost = load_inside_cost;
+	      unknown_align_outside_cost = load_outside_cost;
+	    }
+
+	  stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
+	  prologue_cost_vec.create (2);
+	  epilogue_cost_vec.create (2);
+
+	  int dummy2;
+	  unknown_align_outside_cost += vect_get_known_peeling_cost
+	    (loop_vinfo, vf / 2, &dummy2,
+	     &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+	     &prologue_cost_vec, &epilogue_cost_vec);
+
+	  prologue_cost_vec.release ();
+	  epilogue_cost_vec.release ();
         }
 
       /* Use peeling only if it may help to align other accesses in the loop or
@@ -1804,22 +1795,35 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
         do_peeling = false;
     }
 
-  if (do_peeling && !dr0)
+  struct _vect_peel_extended_info peel_for_known_alignment;
+  peel_for_known_alignment.inside_cost = INT_MAX;
+  peel_for_known_alignment.outside_cost = INT_MAX;
+  peel_for_known_alignment.peel_info.count = 0;
+  peel_for_known_alignment.peel_info.dr = NULL;
+
+  if (do_peeling && one_misalignment_known)
     {
       /* Peeling is possible, but there is no data access that is not supported
          unless aligned. So we try to choose the best possible peeling.  */
 
-      /* We should get here only if there are drs with known misalignment.  */
-      gcc_assert (!all_misalignments_unknown);
-
       /* Choose the best peeling from the hash table.  */
-      dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
-						   loop_vinfo, &npeel,
-						   &body_cost_vec);
-      if (!dr0 || !npeel)
-        do_peeling = false;
+      peel_for_known_alignment = vect_peeling_hash_choose_best_peeling
+	(&peeling_htab, loop_vinfo, &npeel, &body_cost_vec);
+      dr0_known_align = peel_for_known_alignment.peel_info.dr;
+    }
+
+  /* Compare costs of peeling for known and unknown alignment. */
+  if (dr0_known_align != NULL
+      && unknown_align_inside_cost >= peel_for_known_alignment.inside_cost)
+    {
+      dr0 = dr0_known_align;
+      if (!npeel)
+	do_peeling = false;
     }
 
+  if (dr0 == NULL)
+    do_peeling = false;
+
   if (do_peeling)
     {
       stmt = DR_STMT (dr0);

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/5 v3] Vect peeling cost model
  2017-05-23 15:58                       ` [PATCH 2/5 v3] " Robin Dapp
@ 2017-05-23 19:25                         ` Richard Sandiford
  2017-05-24  7:37                           ` Robin Dapp
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Sandiford @ 2017-05-23 19:25 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

Robin Dapp <rdapp@linux.vnet.ibm.com> writes:
> @@ -1272,12 +1272,40 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
>  	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
>  	continue;
>  
> +      int save_misalignment;
>        save_misalignment = DR_MISALIGNMENT (dr);
> -      vect_update_misalignment_for_peel (dr, elem->dr, elem->npeel);
> -      vect_get_data_access_cost (dr, &inside_cost, &outside_cost,
> -				 &body_cost_vec);
> +      if (dr == dr0 && npeel == vf / 2)
> +	SET_DR_MISALIGNMENT (dr, 0);
> +      else
> +	vect_update_misalignment_for_peel (dr, dr0, npeel);
> +      vect_get_data_access_cost (dr, inside_cost, outside_cost,
> +				 body_cost_vec);
>        SET_DR_MISALIGNMENT (dr, save_misalignment);
>      }

Not sure I've understood the series TBH, but is the npeel == vf / 2
there specifically for the "unknown number of peels" case?  How do
we distinguish that from the case in which the number of peels is
known to be vf / 2 at compile time?  Or have I missed the point
completely? (probably yes, sorry!)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/5 v3] Vect peeling cost model
  2017-05-23 19:25                         ` Richard Sandiford
@ 2017-05-24  7:37                           ` Robin Dapp
  2017-05-24  7:53                             ` Richard Sandiford
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-24  7:37 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel,
	richard.sandiford

> Not sure I've understood the series TBH, but is the npeel == vf / 2
> there specifically for the "unknown number of peels" case?  How do
> we distinguish that from the case in which the number of peels is
> known to be vf / 2 at compile time?  Or have I missed the point
> completely? (probably yes, sorry!)

Good point, that's not totally waterproof for future uses of
vect_get_peeling_costs_all_drs ().  Currently, however, only when
peeling for unknown alignment vf != 0 will be passed to it (and vf == 0
for the known alignment case), so we can distinguish the cases.

In future, the whole vf/2 handling should be improved anyway since e.g.
it is hardcoded here as well as in tree-vect-loop.c.  npeel = 0 also has
a double meaning, namely not peeling when peeling for known alignment
and peeling vf/2 iters when peeling for unknown alignment.  Room for
improvement I guess :)

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-05-23 15:58                       ` [PATCH 0/5 " Robin Dapp
@ 2017-05-24  7:51                         ` Richard Biener
  2017-05-24 11:57                           ` Robin Dapp
  2017-06-03 17:12                         ` Andreas Schwab
  1 sibling, 1 reply; 51+ messages in thread
From: Richard Biener @ 2017-05-24  7:51 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

On Tue, May 23, 2017 at 5:57 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> The last version of the patch series caused some regressions for ppc64.
> This was largely due to incorrect handling of unsupportable alignment
> and should be fixed with the new version.
>
> p2 and p5 have not changed but I'm posting the whole series again for
> reference.  p1 only changed comment wording, p3 was adapted to apply on
> the trunk.
>
> No regressions on s390x, x86-64 and ppc64.  Bootstrapped.

So what did actually change?  I'd rather not diff the diffs.  Can you provide
an incremental change, aka p6 that would apply to the previous series instead?

Thanks,
Richard.

> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/5 v3] Vect peeling cost model
  2017-05-24  7:37                           ` Robin Dapp
@ 2017-05-24  7:53                             ` Richard Sandiford
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Sandiford @ 2017-05-24  7:53 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

Robin Dapp <rdapp@linux.vnet.ibm.com> writes:
>> Not sure I've understood the series TBH, but is the npeel == vf / 2
>> there specifically for the "unknown number of peels" case?  How do
>> we distinguish that from the case in which the number of peels is
>> known to be vf / 2 at compile time?  Or have I missed the point
>> completely? (probably yes, sorry!)
>
> Good point, that's not totally waterproof for future uses of
> vect_get_peeling_costs_all_drs ().  Currently, however, only when
> peeling for unknown alignment vf != 0 will be passed to it (and vf == 0
> for the known alignment case), so we can distinguish the cases.

Ah, makes sense now, thanks.  Would you mind putting something like
that last sentence in a comment?

> In future, the whole vf/2 handling should be improved anyway since e.g.
> it is hardcoded here as well as in tree-vect-loop.c.  npeel = 0 also has
> a double meaning, namely not peeling when peeling for known alignment
> and peeling vf/2 iters when peeling for unknown alignment.  Room for
> improvement I guess :)

Yeah :-)  But thanks for the series, looks like a nice improvement.

Richard

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-05-24  7:51                         ` Richard Biener
@ 2017-05-24 11:57                           ` Robin Dapp
  2017-05-24 13:56                             ` Richard Biener
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-24 11:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

> So what did actually change?  I'd rather not diff the diffs.  Can you
> provide an incremental change, aka p6 that would apply to the
> previous series instead?

-p6.diff attached which also addresses Richard's remark regarding vf/2.
Note that this applies to the old series but the old series itself (-p3)
doesn't apply to trunk anymore (because of the change in
vect_enhance_data_refs_alignment).

Regards
 Robin

--

gcc/ChangeLog:

2017-05-24  Robin Dapp  <rdapp@linux.vnet.ibm.com>

	* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs):
	Introduce unknown_misalignment parameter and remove vf.
	(vect_peeling_hash_get_lowest_cost):
	Pass unknown_misalignemtn parameter.
	(vect_enhance_data_refs_alignment):
	Fix unsupportable data ref treatment.


[-- Attachment #2: gcc-peeling-p6.diff --]
[-- Type: text/x-patch, Size: 9387 bytes --]

*** /tmp/qBXCWe_tree-vect-data-refs.c	2017-05-24 13:44:37.939055376 +0200
--- gcc/tree-vect-data-refs.c	2017-05-24 13:44:12.039055376 +0200
***************
*** 1239,1252 ****
  }
  
  /* Get the costs of peeling NPEEL iterations checking data access costs
!    for all data refs. */
  
  static void
  vect_get_peeling_costs_all_drs (struct data_reference *dr0,
  				unsigned int *inside_cost,
  				unsigned int *outside_cost,
  				stmt_vector_for_cost *body_cost_vec,
! 				unsigned int npeel, unsigned int vf)
  {
    gimple *stmt = DR_STMT (dr0);
    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
--- 1239,1254 ----
  }
  
  /* Get the costs of peeling NPEEL iterations checking data access costs
!    for all data refs.  If UNKNOWN_MISALIGNMENT is true, we assume DR0's
!    misalignment will be zero after peeling.  */
  
  static void
  vect_get_peeling_costs_all_drs (struct data_reference *dr0,
  				unsigned int *inside_cost,
  				unsigned int *outside_cost,
  				stmt_vector_for_cost *body_cost_vec,
! 				unsigned int npeel,
! 				bool unknown_misalignment)
  {
    gimple *stmt = DR_STMT (dr0);
    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
***************
*** 1274,1280 ****
  
        int save_misalignment;
        save_misalignment = DR_MISALIGNMENT (dr);
!       if (dr == dr0 && npeel == vf / 2)
  	SET_DR_MISALIGNMENT (dr, 0);
        else
  	vect_update_misalignment_for_peel (dr, dr0, npeel);
--- 1276,1282 ----
  
        int save_misalignment;
        save_misalignment = DR_MISALIGNMENT (dr);
!       if (unknown_misalignment && dr == dr0)
  	SET_DR_MISALIGNMENT (dr, 0);
        else
  	vect_update_misalignment_for_peel (dr, dr0, npeel);
***************
*** 1305,1311 ****
    epilogue_cost_vec.create (2);
  
    vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
! 				  &body_cost_vec, elem->npeel, 0);
  
    outside_cost += vect_get_known_peeling_cost
      (loop_vinfo, elem->npeel, &dummy,
--- 1307,1313 ----
    epilogue_cost_vec.create (2);
  
    vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost,
! 				  &body_cost_vec, elem->npeel, false);
  
    outside_cost += vect_get_known_peeling_cost
      (loop_vinfo, elem->npeel, &dummy,
***************
*** 1517,1522 ****
--- 1519,1525 ----
  {
    vec<data_reference_p> datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
    struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+   enum dr_alignment_support supportable_dr_alignment;
    struct data_reference *dr0 = NULL, *first_store = NULL;
    struct data_reference *dr;
    unsigned int i, j;
***************
*** 1528,1533 ****
--- 1531,1538 ----
    unsigned int npeel = 0;
    bool one_misalignment_known = false;
    bool one_misalignment_unknown = false;
+   bool one_dr_unsupportable = false;
+   struct data_reference *unsupportable_dr = NULL;
    unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
    unsigned possible_npeel_number = 1;
    tree vectype;
***************
*** 1599,1604 ****
--- 1604,1610 ----
  	  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
  	continue;
  
+       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
        do_peeling = vector_alignment_reachable_p (dr);
        if (do_peeling)
          {
***************
*** 1637,1644 ****
  		  else
  		    possible_npeel_number = vf / nelements;
  
! 		  /* NPEEL_TMP is 0 when there is no misalignment, increment
! 		     the peeling amount by one in order to ...  */
  		  if (DR_MISALIGNMENT (dr) == 0)
  		    possible_npeel_number++;
  		}
--- 1643,1650 ----
  		  else
  		    possible_npeel_number = vf / nelements;
  
! 		  /* NPEEL_TMP is 0 when there is no misalignment, but also
! 		     allow peeling NELEMENTS.  */
  		  if (DR_MISALIGNMENT (dr) == 0)
  		    possible_npeel_number++;
  		}
***************
*** 1684,1693 ****
  		    dr0 = dr;
  		}
  
  	      if (!first_store && DR_IS_WRITE (dr))
  		first_store = dr;
- 
- 	      one_misalignment_unknown = true;
              }
          }
        else
--- 1690,1707 ----
  		    dr0 = dr;
  		}
  
+ 	      one_misalignment_unknown = true;
+ 
+ 	      /* Check for data refs with unsupportable alignment that
+ 	         can be peeled.  */
+ 	      if (!supportable_dr_alignment)
+ 	      {
+ 		one_dr_unsupportable = true;
+ 		unsupportable_dr = dr;
+ 	      }
+ 
  	      if (!first_store && DR_IS_WRITE (dr))
  		first_store = dr;
              }
          }
        else
***************
*** 1732,1738 ****
        vect_get_peeling_costs_all_drs (dr0,
  				      &load_inside_cost,
  				      &load_outside_cost,
! 				      &dummy, vf / 2, vf);
        dummy.release ();
  
        if (first_store)
--- 1746,1752 ----
        vect_get_peeling_costs_all_drs (dr0,
  				      &load_inside_cost,
  				      &load_outside_cost,
! 				      &dummy, vf / 2, true);
        dummy.release ();
  
        if (first_store)
***************
*** 1741,1747 ****
  	  vect_get_peeling_costs_all_drs (first_store,
  					  &store_inside_cost,
  					  &store_outside_cost,
! 					  &dummy, vf / 2, vf);
  	  dummy.release ();
  	}
        else
--- 1755,1761 ----
  	  vect_get_peeling_costs_all_drs (first_store,
  					  &store_inside_cost,
  					  &store_outside_cost,
! 					  &dummy, vf / 2, true);
  	  dummy.release ();
  	}
        else
***************
*** 1805,1847 ****
    if (peel_for_known_alignment.peel_info.dr != NULL
        && peel_for_unknown_alignment.inside_cost
        >= peel_for_known_alignment.inside_cost)
!     best_peel = peel_for_known_alignment;
  
!   /* Calculate the penalty for no peeling, i.e. leaving everything
!      unaligned.
!      TODO: use something like an adapted vect_get_peeling_costs_all_drs.  */
!   unsigned nopeel_inside_cost = 0;
!   unsigned nopeel_outside_cost = 0;
  
!   stmt_vector_for_cost dummy;
!   dummy.create (2);
!   FOR_EACH_VEC_ELT (datarefs, i, dr)
!     vect_get_data_access_cost (dr, &nopeel_inside_cost,
! 			       &nopeel_outside_cost, &dummy);
!   dummy.release ();
! 
!   /* Add epilogue costs.  As we do not peel for alignment here, no prologue
!      costs will be recorded.  */
!   stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
!   prologue_cost_vec.create (2);
!   epilogue_cost_vec.create (2);
  
!   int dummy2;
!   nopeel_outside_cost += vect_get_known_peeling_cost
!     (loop_vinfo, 0, &dummy2,
!      &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
!      &prologue_cost_vec, &epilogue_cost_vec);
  
!   prologue_cost_vec.release ();
!   epilogue_cost_vec.release ();
  
!   npeel = best_peel.peel_info.npeel;
!   dr0 = best_peel.peel_info.dr;
  
!   /* If no peeling is not more expensive than the best peeling we
!      have so far, don't perform any peeling.  */
!   if (nopeel_inside_cost <= best_peel.inside_cost)
!     do_peeling = false;
  
    if (do_peeling)
      {
--- 1819,1877 ----
    if (peel_for_known_alignment.peel_info.dr != NULL
        && peel_for_unknown_alignment.inside_cost
        >= peel_for_known_alignment.inside_cost)
!     {
!       best_peel = peel_for_known_alignment;
  
!       /* If the best peeling for known alignment has NPEEL == 0, perform no
!          peeling at all except if there is an unsupportable dr that we can
!          align.  */
!       if (best_peel.peel_info.npeel == 0 && !one_dr_unsupportable)
! 	do_peeling = false;
!     }
  
!   /* If there is an unsupportable data ref, prefer this over all choices so far
!      since we'd have to discard a chosen peeling except when it accidentally
!      aligned the unsupportable data ref.  */
!   if (one_dr_unsupportable)
!     dr0 = unsupportable_dr;
!   else if (do_peeling)
!     {
!       /* Calculate the penalty for no peeling, i.e. leaving everything
! 	 unaligned.
! 	 TODO: Adapt vect_get_peeling_costs_all_drs and use here.  */
!       unsigned nopeel_inside_cost = 0;
!       unsigned nopeel_outside_cost = 0;
  
!       stmt_vector_for_cost dummy;
!       dummy.create (2);
!       FOR_EACH_VEC_ELT (datarefs, i, dr)
! 	vect_get_data_access_cost (dr, &nopeel_inside_cost,
! 				   &nopeel_outside_cost, &dummy);
!       dummy.release ();
  
!       /* Add epilogue costs.  As we do not peel for alignment here, no prologue
! 	 costs will be recorded.  */
!       stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec;
!       prologue_cost_vec.create (2);
!       epilogue_cost_vec.create (2);
! 
!       int dummy2;
!       nopeel_outside_cost += vect_get_known_peeling_cost
! 	(loop_vinfo, 0, &dummy2,
! 	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
! 	 &prologue_cost_vec, &epilogue_cost_vec);
  
!       prologue_cost_vec.release ();
!       epilogue_cost_vec.release ();
  
!       npeel = best_peel.peel_info.npeel;
!       dr0 = best_peel.peel_info.dr;
! 
!       /* If no peeling is not more expensive than the best peeling we
! 	 have so far, don't perform any peeling.  */
!       if (nopeel_inside_cost <= best_peel.inside_cost)
! 	do_peeling = false;
!     }
  
    if (do_peeling)
      {
***************
*** 2019,2026 ****
  	      break;
  	    }
  
! 	  enum dr_alignment_support supportable_dr_alignment =
! 	    vect_supportable_dr_alignment (dr, false);
  
            if (!supportable_dr_alignment)
              {
--- 2049,2055 ----
  	      break;
  	    }
  
! 	  supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
  
            if (!supportable_dr_alignment)
              {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-05-24 11:57                           ` Robin Dapp
@ 2017-05-24 13:56                             ` Richard Biener
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Biener @ 2017-05-24 13:56 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches, Bin.Cheng, Andreas Krebbel

On Wed, May 24, 2017 at 1:54 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> So what did actually change?  I'd rather not diff the diffs.  Can you
>> provide an incremental change, aka p6 that would apply to the
>> previous series instead?
>
> -p6.diff attached which also addresses Richard's remark regarding vf/2.
> Note that this applies to the old series but the old series itself (-p3)
> doesn't apply to trunk anymore (because of the change in
> vect_enhance_data_refs_alignment).

The series is ok.

Thanks,
Richard.

> Regards
>  Robin
>
> --
>
> gcc/ChangeLog:
>
> 2017-05-24  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs):
>         Introduce unknown_misalignment parameter and remove vf.
>         (vect_peeling_hash_get_lowest_cost):
>         Pass unknown_misalignemtn parameter.
>         (vect_enhance_data_refs_alignment):
>         Fix unsupportable data ref treatment.
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5 v3] Vect peeling cost model
  2017-05-23 15:59                       ` [PATCH 4/5 " Robin Dapp
@ 2017-05-31 13:56                         ` Christophe Lyon
  2017-05-31 14:37                           ` Robin Dapp
  0 siblings, 1 reply; 51+ messages in thread
From: Christophe Lyon @ 2017-05-31 13:56 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

Hi,

On 23 May 2017 at 17:59, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> gcc/ChangeLog:
>
> 2017-05-23  Robin Dapp  <rdapp@linux.vnet.ibm.com>
>
>         * tree-vect-data-refs.c (vect_get_data_access_cost):
>         Workaround for SLP handling.
>         (vect_enhance_data_refs_alignment):
>         Compute costs for doing no peeling at all, compare to the best
>         peeling costs so far and avoid peeling if cheaper.

Since this commit (r248678), I've noticed regressions on some arm targets.
  Executed from: gcc.dg/tree-ssa/tree-ssa.exp
    gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment
of access forced using peeling" 1
    gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect
"Vectorizing an unaligned access" 0
    gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment
of access forced using peeling" 1
    gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect
"Vectorizing an unaligned access" 0

For instance with --target arm-linux-gnueabihf --with-cpu=cortex-a5
--with-fpu=vfpv3-d16-fp16
(using cortex-a9+neon makes the test pass).

Thanks,

Christophe

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5 v3] Vect peeling cost model
  2017-05-31 13:56                         ` Christophe Lyon
@ 2017-05-31 14:37                           ` Robin Dapp
  2017-05-31 14:49                             ` Christophe Lyon
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-05-31 14:37 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

> Since this commit (r248678), I've noticed regressions on some arm targets.
>   Executed from: gcc.dg/tree-ssa/tree-ssa.exp
>     gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment
> of access forced using peeling" 1
>     gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect
> "Vectorizing an unaligned access" 0
>     gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment
> of access forced using peeling" 1
>     gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect
> "Vectorizing an unaligned access" 0
> 
> For instance with --target arm-linux-gnueabihf --with-cpu=cortex-a5
> --with-fpu=vfpv3-d16-fp16
> (using cortex-a9+neon makes the test pass).

I do not have access to an arm machine for testing but could these
regressions be "ok" as in we no longer perform peeling because costs for
not peeling <= costs for peeling and we still vectorize? (Just guessing)
Or are these real regressions that prevent vectorization? Does the
"vectorized 1 loops" check fail?

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/5 v3] Vect peeling cost model
  2017-05-31 14:37                           ` Robin Dapp
@ 2017-05-31 14:49                             ` Christophe Lyon
  0 siblings, 0 replies; 51+ messages in thread
From: Christophe Lyon @ 2017-05-31 14:49 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

On 31 May 2017 at 16:27, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Since this commit (r248678), I've noticed regressions on some arm targets.
>>   Executed from: gcc.dg/tree-ssa/tree-ssa.exp
>>     gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment
>> of access forced using peeling" 1
>>     gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect
>> "Vectorizing an unaligned access" 0
>>     gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment
>> of access forced using peeling" 1
>>     gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect
>> "Vectorizing an unaligned access" 0
>>
>> For instance with --target arm-linux-gnueabihf --with-cpu=cortex-a5
>> --with-fpu=vfpv3-d16-fp16
>> (using cortex-a9+neon makes the test pass).
>
> I do not have access to an arm machine for testing but could these
> regressions be "ok" as in we no longer perform peeling because costs for
> not peeling <= costs for peeling and we still vectorize? (Just guessing)
> Or are these real regressions that prevent vectorization? Does the
> "vectorized 1 loops" check fail?

I know it's not very practical, and I would also have to start a manual build
with the right config to get all the details because all my builds are
in temporary
workspaces.

I reported only the regressions, so yes "vectorized 1 loops" still passes.

Thanks,

Christophe

>
> Regards
>  Robin
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-05-23 15:58                       ` [PATCH 0/5 " Robin Dapp
  2017-05-24  7:51                         ` Richard Biener
@ 2017-06-03 17:12                         ` Andreas Schwab
  2017-06-06  7:13                           ` Robin Dapp
  1 sibling, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2017-06-03 17:12 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

> No regressions on s390x, x86-64 and ppc64.  Bootstrapped.

Patch 6 breaks no-vfa-vect-57.c on powerpc.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-06-03 17:12                         ` Andreas Schwab
@ 2017-06-06  7:13                           ` Robin Dapp
  2017-06-06 17:26                             ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-06-06  7:13 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

> Patch 6 breaks no-vfa-vect-57.c on powerpc.

Which CPU model (power6/7/8?) and which compile options (-maltivec/
-mpower8-vector?) have been used for running and compiling the test?  As
discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80925
this has an influence on the cost function and therefore on the test result.

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-06-06  7:13                           ` Robin Dapp
@ 2017-06-06 17:26                             ` Andreas Schwab
  2017-06-07 10:50                               ` Robin Dapp
  0 siblings, 1 reply; 51+ messages in thread
From: Andreas Schwab @ 2017-06-06 17:26 UTC (permalink / raw)
  To: Robin Dapp; +Cc: Richard Biener, GCC Patches, Bin.Cheng, Andreas Krebbel

http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-06-06 17:26                             ` Andreas Schwab
@ 2017-06-07 10:50                               ` Robin Dapp
  2017-06-07 11:43                                 ` Andreas Schwab
  0 siblings, 1 reply; 51+ messages in thread
From: Robin Dapp @ 2017-06-07 10:50 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GCC Patches

> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html

What machine is this running on? power4 BE? The tests are compiled with
--with-cpu-64=power4 apparently.  I cannot reproduce this on power7
-m32.  Is it possible to get more detailed logs or machine access to
reproduce?

Regards
 Robin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/5 v3] Vect peeling cost model
  2017-06-07 10:50                               ` Robin Dapp
@ 2017-06-07 11:43                                 ` Andreas Schwab
  0 siblings, 0 replies; 51+ messages in thread
From: Andreas Schwab @ 2017-06-07 11:43 UTC (permalink / raw)
  To: Robin Dapp; +Cc: GCC Patches

On Jun 07 2017, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:

>> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html
>
> What machine is this running on?

On a G5.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-06-07 11:43 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 14:38 [RFC] S/390: Alignment peeling prolog generation Robin Dapp
2017-04-11 14:57 ` Bin.Cheng
2017-04-11 15:03   ` Robin Dapp
2017-04-11 15:07     ` Bin.Cheng
2017-04-11 16:25   ` Richard Biener
2017-04-12  7:51     ` Robin Dapp
2017-04-12  7:58       ` Richard Biener
2017-05-04  9:04         ` Robin Dapp
2017-05-05 11:04           ` Richard Biener
2017-05-08 16:12             ` Robin Dapp
2017-05-09 10:38               ` Richard Biener
2017-05-11 11:17                 ` Robin Dapp
2017-05-11 12:15                   ` Richard Biener
2017-05-11 12:16                     ` Richard Biener
2017-05-11 12:48                       ` Richard Biener
2017-05-11 11:17                 ` [PATCH 1/5] Vect peeling cost model Robin Dapp
2017-05-11 11:18                 ` [PATCH 2/5] " Robin Dapp
2017-05-11 11:19                 ` [PATCH 3/5] " Robin Dapp
2017-05-11 11:20                 ` [PATCH 4/5] " Robin Dapp
2017-05-11 15:30                   ` [PATCH 4/5 v2] " Robin Dapp
2017-05-12  9:36                     ` Richard Biener
2017-05-23 15:58                       ` [PATCH 2/5 v3] " Robin Dapp
2017-05-23 19:25                         ` Richard Sandiford
2017-05-24  7:37                           ` Robin Dapp
2017-05-24  7:53                             ` Richard Sandiford
2017-05-23 15:58                       ` [PATCH 1/5 " Robin Dapp
2017-05-23 15:58                       ` [PATCH 0/5 " Robin Dapp
2017-05-24  7:51                         ` Richard Biener
2017-05-24 11:57                           ` Robin Dapp
2017-05-24 13:56                             ` Richard Biener
2017-06-03 17:12                         ` Andreas Schwab
2017-06-06  7:13                           ` Robin Dapp
2017-06-06 17:26                             ` Andreas Schwab
2017-06-07 10:50                               ` Robin Dapp
2017-06-07 11:43                                 ` Andreas Schwab
2017-05-23 15:59                       ` [PATCH 5/5 " Robin Dapp
2017-05-23 15:59                       ` [PATCH 4/5 " Robin Dapp
2017-05-31 13:56                         ` Christophe Lyon
2017-05-31 14:37                           ` Robin Dapp
2017-05-31 14:49                             ` Christophe Lyon
2017-05-23 16:02                       ` [PATCH 3/5 " Robin Dapp
2017-05-11 11:59                 ` [PATCH 5/5] " Robin Dapp
2017-05-08 16:13             ` [PATCH 3/4] " Robin Dapp
2017-05-09 10:41               ` Richard Biener
2017-05-08 16:27             ` [PATCH 4/4] " Robin Dapp
2017-05-09 10:55               ` Richard Biener
2017-05-04  9:04         ` [PATCH 1/3] " Robin Dapp
2017-05-05 10:32           ` Richard Biener
2017-05-04  9:07         ` [PATCH 2/3] " Robin Dapp
2017-05-05 10:37           ` Richard Biener
2017-05-04  9:14         ` [PATCH 3/3] " Robin Dapp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).