From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-451430-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 91794 invoked by alias); 11 Apr 2017 14:57:32 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 87839 invoked by uid 89); 11 Apr 2017 14:57:31 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=
X-HELO: mail-ua0-f169.google.com
Received: from mail-ua0-f169.google.com (HELO mail-ua0-f169.google.com) (209.85.217.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 11 Apr 2017 14:57:30 +0000
Received: by mail-ua0-f169.google.com with SMTP id 49so59622548uau.2        for <gcc-patches@gcc.gnu.org>; Tue, 11 Apr 2017 07:57:31 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=wFe0E5UN1Z8IRfYvwU/nTS0FrBp6l5FN4QAmnJW2Zc4=;        b=L57S4jCIENmaX6Y0isp/M+A6pKv1EiJSyYh+4OIDyVOs5CmszcrxYcEzV4e5aOlyxL         co7x7owvziAPRbyFVEmcoJ7R5fG19vnsumg3ABtNJY8sW9JgrlFDmwu1ZLocJdlGY/Tl         ggwLV5wMraydxqs94v6y/bgvBvQwe0ECFrEQ7yptNbiFOa7eHN6vQ4pN+5EqJLkEaAE8         VUOlixTc48kulgx8tVt4XSSs+q2Tik8wWJEkwiH77i7TIKMtSDilCIOF5dLaCmajpjYe         lFEf6E7cHkh+5t1d2zD9j+Wz22NcLR6CFoKNEqqOSOob0R/FwsRhk/UjmyJjUe4YCIYp         1vBg==
X-Gm-Message-State: AN3rC/5o8qX/6FjtEtU8VgWSTirNiIptBwUO6uJDTc9WvuwtCWcA8xqtp1k8p7zyz2M2P+k/or3uin4o/EFEEA==
X-Received: by 10.176.23.97 with SMTP id k33mr9419997uaf.132.1491922649753; Tue, 11 Apr 2017 07:57:29 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.12.129 with HTTP; Tue, 11 Apr 2017 07:57:29 -0700 (PDT)
In-Reply-To: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com>
References: <0296a54f-cb8d-d9b8-380a-9cc553dbb6da@linux.vnet.ibm.com>
From: "Bin.Cheng" <amker.cheng@gmail.com>
Date: Tue, 11 Apr 2017 14:57:00 -0000
Message-ID: <CAHFci282BedKpc99pxk1+PLHc7OxkE0bFZHRn1rsE9X+-ihuDQ@mail.gmail.com>
Subject: Re: [RFC] S/390: Alignment peeling prolog generation
To: Robin Dapp <rdapp@linux.vnet.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2017-04/txt/msg00529.txt.bz2

On Tue, Apr 11, 2017 at 3:38 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Hi,
>
> when looking at various vectorization examples on s390x I noticed that
> we still peel vf/2 iterations for alignment even though vectorization
> costs of unaligned loads and stores are the same as normal loads/stores.
>
> A simple example is
>
> void foo(int *restrict a, int *restrict b, unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i++)
>     {
>       b[i] = a[i] * 2 + 1;
>     }
> }
>
> which gets peeled unless __builtin_assume_aligned (a, 8) is used.
>
> In tree-vect-data-refs.c there are several checks that involve costs  in
> the peeling decision none of which seems to suffice in this case. For a
> loop with only read DRs there is a check that has been triggering (i.e.
> disable peeling) since we implemented the vectorization costs.
>
> Here, we have DR_MISALIGNMENT (dr) == -1 for all DRs but the costs
> should still dictate to never peel. I attached a tentative patch for
> discussion which fixes the problem by checking the costs for npeel = 0
> and npeel = vf/2 after ensuring we support all misalignments. Is there a
> better way and place to do it? Are we missing something somewhere else
> that would preclude the peeling from happening?
>
> This is not indended for stage 4 obviously :)
Hi Robin,
Seems Richi added code like below comparing costs between aligned and
unsigned loads, and only peeling if it's beneficial:

      /* In case there are only loads with different unknown misalignments, use
         peeling only if it may help to align other accesses in the loop or
     if it may help improving load bandwith when we'd end up using
     unaligned loads.  */
      tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
      if (!first_store
      && !STMT_VINFO_SAME_ALIGN_REFS (
          vinfo_for_stmt (DR_STMT (dr0))).length ()
      && (vect_supportable_dr_alignment (dr0, false)
          != dr_unaligned_supported
          || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
          == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
        do_peeling = false;

I think similar codes can be added for store cases too.

Thanks,
bin
>
> Regards
>  Robin