From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-451427-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 33083 invoked by alias); 11 Apr 2017 14:38:53 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 32635 invoked by uid 89); 11 Apr 2017 14:38:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=drs, robin, 14186, dictate
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Apr 2017 14:38:51 +0000
Received: from pps.filterd (m0098393.ppops.net [127.0.0.1])	by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v3BEYY9o101332	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 10:38:51 -0400
Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109])	by mx0a-001b2d01.pphosted.com with ESMTP id 29rxeqgmwq-1	(version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 10:38:51 -0400
Received: from localhost	by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <rdapp@linux.vnet.ibm.com>;	Tue, 11 Apr 2017 15:38:49 +0100
Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194)	by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Tue, 11 Apr 2017 15:38:46 +0100
Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59])	by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v3BEcjjX14876956	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 14:38:45 GMT
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id F1DEEA4051	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 15:37:56 +0100 (BST)
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id E65B6A405F	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 15:37:56 +0100 (BST)
Received: from oc6142347168.ibm.com (unknown [9.152.212.223])	by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP	for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 15:37:56 +0100 (BST)
From: Robin Dapp <rdapp@linux.vnet.ibm.com>
Subject: [RFC] S/390: Alignment peeling prolog generation
To: GCC Patches <gcc-patches@gcc.gnu.org>
Date: Tue, 11 Apr 2017 14:38:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------3E03BCFBD719FBE4A2361FE3"
X-TM-AS-GCONF: 00
x-cbid: 17041114-0012-0000-0000-0000050810C2
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17041114-0013-0000-0000-0000180378DB
Message-Id: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-11_13:,, signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704110113
X-SW-Source: 2017-04/txt/msg00526.txt.bz2

This is a multi-part message in MIME format.
--------------3E03BCFBD719FBE4A2361FE3
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Content-length: 1193

Hi,

when looking at various vectorization examples on s390x I noticed that
we still peel vf/2 iterations for alignment even though vectorization
costs of unaligned loads and stores are the same as normal loads/stores.

A simple example is

void foo(int *restrict a, int *restrict b, unsigned int n)
{
  for (unsigned int i = 0; i < n; i++)
    {
      b[i] = a[i] * 2 + 1;
    }
}

which gets peeled unless __builtin_assume_aligned (a, 8) is used.

In tree-vect-data-refs.c there are several checks that involve costs  in
the peeling decision none of which seems to suffice in this case. For a
loop with only read DRs there is a check that has been triggering (i.e.
disable peeling) since we implemented the vectorization costs.

Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
should still dictate to never peel. I attached a tentative patch for
discussion which fixes the problem by checking the costs for npeel = 0
and npeel = vf/2 after ensuring we support all misalignments. Is there a
better way and place to do it? Are we missing something somewhere else
that would preclude the peeling from happening?

This is not indended for stage 4 obviously :)

Regards
 Robin

--------------3E03BCFBD719FBE4A2361FE3
Content-Type: text/x-patch;
 name="gcc-omit-peeling.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="gcc-omit-peeling.diff"
Content-length: 2442

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 3fc762a..795c22c 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1418,6 +1418,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
   stmt_vec_info stmt_info;
   unsigned int npeel = 0;
   bool all_misalignments_unknown = true;
+  bool all_misalignments_supported = true;
   unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
@@ -1547,6 +1548,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
                 }
 
               all_misalignments_unknown = false;
+
               /* Data-ref that was chosen for the case that all the
                  misalignments are unknown is not relevant anymore, since we
                  have a data-ref with known alignment.  */
@@ -1609,6 +1611,24 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
               break;
             }
         }
+
+      /* Check if target supports misaligned data access for current data
+	 reference.  */
+      vectype = STMT_VINFO_VECTYPE (stmt_info);
+      machine_mode mode = TYPE_MODE (vectype);
+      if (targetm.vectorize.
+	  support_vector_misalignment (mode, TREE_TYPE (DR_REF (dr)),
+				       DR_MISALIGNMENT (dr), false))
+	{
+	  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
+				    dr, 0);
+	  /* Also insert vf/2 peeling that will be used when all
+	     misalignments are unknown. */
+	  vect_peeling_hash_insert (&peeling_htab, loop_vinfo,
+				    dr, vf / 2);
+	}
+      else
+	all_misalignments_supported = false;
     }
 
   /* Check if we can possibly peel the loop.  */
@@ -1687,6 +1707,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
             dr0 = first_store;
         }
 
+      /* If the target supports accessing all data references in a misaligned
+	 way, check costs to see if we can leave them unaligned and do not
+	 perform any peeling.  */
+      if (all_misalignments_supported)
+	{
+	  dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab,
+						       loop_vinfo, &npeel,
+						       &body_cost_vec);
+	  if (!dr0 || !npeel)
+	    do_peeling = false;
+	}
+
       /* In case there are only loads with different unknown misalignments, use
          peeling only if it may help to align other accesses in the loop or
 	 if it may help improving load bandwith when we'd end up using

--------------3E03BCFBD719FBE4A2361FE3--