From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 33083 invoked by alias); 11 Apr 2017 14:38:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 32635 invoked by uid 89); 11 Apr 2017 14:38:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=drs, robin, 14186, dictate X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Apr 2017 14:38:51 +0000 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v3BEYY9o101332 for ; Tue, 11 Apr 2017 10:38:51 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 29rxeqgmwq-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 11 Apr 2017 10:38:51 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 11 Apr 2017 15:38:49 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 11 Apr 2017 15:38:46 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v3BEcjjX14876956 for ; Tue, 11 Apr 2017 14:38:45 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F1DEEA4051 for ; Tue, 11 Apr 2017 15:37:56 +0100 (BST) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E65B6A405F for ; Tue, 11 Apr 2017 15:37:56 +0100 (BST) Received: from oc6142347168.ibm.com (unknown [9.152.212.223]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP for ; Tue, 11 Apr 2017 15:37:56 +0100 (BST) From: Robin Dapp Subject: [RFC] S/390: Alignment peeling prolog generation To: GCC Patches Date: Tue, 11 Apr 2017 14:38:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------3E03BCFBD719FBE4A2361FE3" X-TM-AS-GCONF: 00 x-cbid: 17041114-0012-0000-0000-0000050810C2 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17041114-0013-0000-0000-0000180378DB Message-Id: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-11_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704110113 X-SW-Source: 2017-04/txt/msg00526.txt.bz2 This is a multi-part message in MIME format. --------------3E03BCFBD719FBE4A2361FE3 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-length: 1193 Hi, when looking at various vectorization examples on s390x I noticed that we still peel vf/2 iterations for alignment even though vectorization costs of unaligned loads and stores are the same as normal loads/stores. A simple example is void foo(int *restrict a, int *restrict b, unsigned int n) { for (unsigned int i = 0; i < n; i++) { b[i] = a[i] * 2 + 1; } } which gets peeled unless __builtin_assume_aligned (a, 8) is used. In tree-vect-data-refs.c there are several checks that involve costs in the peeling decision none of which seems to suffice in this case. For a loop with only read DRs there is a check that has been triggering (i.e. disable peeling) since we implemented the vectorization costs. Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs should still dictate to never peel. I attached a tentative patch for discussion which fixes the problem by checking the costs for npeel = 0 and npeel = vf/2 after ensuring we support all misalignments. Is there a better way and place to do it? Are we missing something somewhere else that would preclude the peeling from happening? This is not indended for stage 4 obviously :) Regards Robin --------------3E03BCFBD719FBE4A2361FE3 Content-Type: text/x-patch; name="gcc-omit-peeling.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="gcc-omit-peeling.diff" Content-length: 2442 diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 3fc762a..795c22c 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1418,6 +1418,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) stmt_vec_info stmt_info; unsigned int npeel = 0; bool all_misalignments_unknown = true; + bool all_misalignments_supported = true; unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); unsigned possible_npeel_number = 1; tree vectype; @@ -1547,6 +1548,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) } all_misalignments_unknown = false; + /* Data-ref that was chosen for the case that all the misalignments are unknown is not relevant anymore, since we have a data-ref with known alignment. */ @@ -1609,6 +1611,24 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) break; } } + + /* Check if target supports misaligned data access for current data + reference. */ + vectype = STMT_VINFO_VECTYPE (stmt_info); + machine_mode mode = TYPE_MODE (vectype); + if (targetm.vectorize. + support_vector_misalignment (mode, TREE_TYPE (DR_REF (dr)), + DR_MISALIGNMENT (dr), false)) + { + vect_peeling_hash_insert (&peeling_htab, loop_vinfo, + dr, 0); + /* Also insert vf/2 peeling that will be used when all + misalignments are unknown. */ + vect_peeling_hash_insert (&peeling_htab, loop_vinfo, + dr, vf / 2); + } + else + all_misalignments_supported = false; } /* Check if we can possibly peel the loop. */ @@ -1687,6 +1707,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) dr0 = first_store; } + /* If the target supports accessing all data references in a misaligned + way, check costs to see if we can leave them unaligned and do not + perform any peeling. */ + if (all_misalignments_supported) + { + dr0 = vect_peeling_hash_choose_best_peeling (&peeling_htab, + loop_vinfo, &npeel, + &body_cost_vec); + if (!dr0 || !npeel) + do_peeling = false; + } + /* In case there are only loads with different unknown misalignments, use peeling only if it may help to align other accesses in the loop or if it may help improving load bandwith when we'd end up using --------------3E03BCFBD719FBE4A2361FE3--