From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91794 invoked by alias); 11 Apr 2017 14:57:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 87839 invoked by uid 89); 11 Apr 2017 14:57:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy= X-HELO: mail-ua0-f169.google.com Received: from mail-ua0-f169.google.com (HELO mail-ua0-f169.google.com) (209.85.217.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Apr 2017 14:57:30 +0000 Received: by mail-ua0-f169.google.com with SMTP id 49so59622548uau.2 for ; Tue, 11 Apr 2017 07:57:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wFe0E5UN1Z8IRfYvwU/nTS0FrBp6l5FN4QAmnJW2Zc4=; b=L57S4jCIENmaX6Y0isp/M+A6pKv1EiJSyYh+4OIDyVOs5CmszcrxYcEzV4e5aOlyxL co7x7owvziAPRbyFVEmcoJ7R5fG19vnsumg3ABtNJY8sW9JgrlFDmwu1ZLocJdlGY/Tl ggwLV5wMraydxqs94v6y/bgvBvQwe0ECFrEQ7yptNbiFOa7eHN6vQ4pN+5EqJLkEaAE8 VUOlixTc48kulgx8tVt4XSSs+q2Tik8wWJEkwiH77i7TIKMtSDilCIOF5dLaCmajpjYe lFEf6E7cHkh+5t1d2zD9j+Wz22NcLR6CFoKNEqqOSOob0R/FwsRhk/UjmyJjUe4YCIYp 1vBg== X-Gm-Message-State: AN3rC/5o8qX/6FjtEtU8VgWSTirNiIptBwUO6uJDTc9WvuwtCWcA8xqtp1k8p7zyz2M2P+k/or3uin4o/EFEEA== X-Received: by 10.176.23.97 with SMTP id k33mr9419997uaf.132.1491922649753; Tue, 11 Apr 2017 07:57:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.12.129 with HTTP; Tue, 11 Apr 2017 07:57:29 -0700 (PDT) In-Reply-To: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com> References: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com> From: "Bin.Cheng" Date: Tue, 11 Apr 2017 14:57:00 -0000 Message-ID: Subject: Re: [RFC] S/390: Alignment peeling prolog generation To: Robin Dapp Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-04/txt/msg00529.txt.bz2 On Tue, Apr 11, 2017 at 3:38 PM, Robin Dapp wrote: > Hi, > > when looking at various vectorization examples on s390x I noticed that > we still peel vf/2 iterations for alignment even though vectorization > costs of unaligned loads and stores are the same as normal loads/stores. > > A simple example is > > void foo(int *restrict a, int *restrict b, unsigned int n) > { > for (unsigned int i = 0; i < n; i++) > { > b[i] = a[i] * 2 + 1; > } > } > > which gets peeled unless __builtin_assume_aligned (a, 8) is used. > > In tree-vect-data-refs.c there are several checks that involve costs in > the peeling decision none of which seems to suffice in this case. For a > loop with only read DRs there is a check that has been triggering (i.e. > disable peeling) since we implemented the vectorization costs. > > Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs > should still dictate to never peel. I attached a tentative patch for > discussion which fixes the problem by checking the costs for npeel = 0 > and npeel = vf/2 after ensuring we support all misalignments. Is there a > better way and place to do it? Are we missing something somewhere else > that would preclude the peeling from happening? > > This is not indended for stage 4 obviously :) Hi Robin, Seems Richi added code like below comparing costs between aligned and unsigned loads, and only peeling if it's beneficial: /* In case there are only loads with different unknown misalignments, use peeling only if it may help to align other accesses in the loop or if it may help improving load bandwith when we'd end up using unaligned loads. */ tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0))); if (!first_store && !STMT_VINFO_SAME_ALIGN_REFS ( vinfo_for_stmt (DR_STMT (dr0))).length () && (vect_supportable_dr_alignment (dr0, false) != dr_unaligned_supported || (builtin_vectorization_cost (vector_load, dr0_vt, 0) == builtin_vectorization_cost (unaligned_load, dr0_vt, -1)))) do_peeling = false; I think similar codes can be added for store cases too. Thanks, bin > > Regards > Robin