From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-379747-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7334 invoked by alias); 7 Oct 2014 07:46:59 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 7319 invoked by uid 89); 7 Oct 2014 07:46:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-wi0-f171.google.com
Received: from mail-wi0-f171.google.com (HELO mail-wi0-f171.google.com) (209.85.212.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 07 Oct 2014 07:46:56 +0000
Received: by mail-wi0-f171.google.com with SMTP id em10so6888183wid.16        for <gcc-patches@gcc.gnu.org>; Tue, 07 Oct 2014 00:46:53 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.194.58.229 with SMTP id u5mr1179812wjq.133.1412668013619; Tue, 07 Oct 2014 00:46:53 -0700 (PDT)
Received: by 10.194.20.74 with HTTP; Tue, 7 Oct 2014 00:46:53 -0700 (PDT)
In-Reply-To: <CAFiYyc19j5ZdBnQRWevtjQZpx_unrjxOH4VT+nnVGVCUTCO4zA@mail.gmail.com>
References: <541AC4D2.9040901@arm.com>	<CAFiYyc0wF_ep0sdTiqgn8KV5Jxzfg4BxmkXJPN7wKJ91Qotakw@mail.gmail.com>	<5432D1A5.6080208@arm.com>	<CAFiYyc19j5ZdBnQRWevtjQZpx_unrjxOH4VT+nnVGVCUTCO4zA@mail.gmail.com>
Date: Tue, 07 Oct 2014 07:46:00 -0000
Message-ID: <CAFiYyc2sJTMqH_68nWF4Zq=h_7nDE-DMswyCtJFQE2CV2PtXpQ@mail.gmail.com>
Subject: Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114
From: Richard Biener <richard.guenther@gmail.com>
To: Alan Lawrence <alan.lawrence@arm.com>
Cc: Marcus Shawcroft <Marcus.Shawcroft@arm.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2014-10/txt/msg00522.txt.bz2

On Tue, Oct 7, 2014 at 9:45 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and
>> 6,
>> which have been previously approved, in that sequence. (Of those, all bar
>> patch
>> 2 are AArch64 only.) I think this is better than maintaining an
>> ever-expanding
>> patch series.
>
> Agreed.
>
>> Then I'll get to work on migrating all backends to the new _scal_ optab (and
>> removing the vector optab). Certainly I'd like to replace vec_shr/l with
>> vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching!
>
> I suppose we all are.  It will last until end of October at least
> (stage1 of gcc 4.9
> ended Nov 22th, certainly a bit late).
>
> I do expect we will continue merging already developed / posted stuff through
> stage3 (as usual).
>
> That said, it would be really nice to get rid of VEC_RSHIFT_EXPR.

And you can fix performance regressions you introduce (badly handled
VEC_PERM) until the GCC 5 release happens (and even after that).
Heh.  Easy way out ;)

Richard.

> Thanks,
> Richard.
>
>> --Alan
>>
>>
>>
>>
>> Richard Biener wrote:
>>>
>>> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <alan.lawrence@arm.com>
>>> wrote:
>>>>
>>>> The end goal here is to remove this code from tree-vect-loop.c
>>>> (vect_create_epilog_for_reduction):
>>>>
>>>>       if (BYTES_BIG_ENDIAN)
>>>>         bitpos = size_binop (MULT_EXPR,
>>>>                              bitsize_int (TYPE_VECTOR_SUBPARTS (vectype)
>>>> -
>>>> 1),
>>>>                              TYPE_SIZE (scalar_type));
>>>>       else
>>>>
>>>> as this is the root cause of PR/61114 (see testcase there, failing on all
>>>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard
>>>> Biener,
>>>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
>>>> suspicious". The code snippet above is used on two paths:
>>>>
>>>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
>>>> reduc_[us](plus|min|max)_optab.
>>>> The optab is documented as "the scalar result is stored in the least
>>>> significant bits of operand 0", but the tree code as "the first element
>>>> in
>>>> the vector holding the result of the reduction of all elements of the
>>>> operand". This mismatch means that when the tree code is folded, the code
>>>> snippet above reads the result from the wrong end of the vector.
>>>>
>>>> The strategy (as per
>>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define
>>>> new
>>>> tree codes and optabs that produce scalar results directly; this seems
>>>> better than tying (the element of the vector into which the result is
>>>> placed) to (the endianness of the target), and avoids generating extra
>>>> moves
>>>> on current bigendian targets. However, the previous optabs are retained
>>>> for
>>>> now as a migration strategy so as not to break existing backends; moving
>>>> individual platforms over will follow.
>>>>
>>>> A complication here is on AArch64, where we directly generate
>>>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
>>>> remove this folding in order to decouple the midend and AArch64 backend.
>>>
>>>
>>> Sounds fine.  I hope we can transition all backends for 5.0 and remove
>>> the vector variant optabs (maybe renaming the scalar ones).
>>>
>>>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
>>>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the
>>>> optab
>>>> is defined in an endianness-dependent way, leading to significant
>>>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab
>>>> is
>>>> never used!). Few platforms appear to handle vec_shr_optab (and fewer
>>>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
>>>> the existing optab to be endianness-neutral.
>>>>
>>>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
>>>> updates that implementation to fit the new endianness-neutral
>>>> specification,
>>>> serving as a guide for other existing backends. Patches/RFCs 15 and 16
>>>> are
>>>> equivalents for MIPS and PowerPC; I haven't tested these but hope they
>>>> act
>>>> as useful pointers for the port maintainers.
>>>>
>>>> Finally patch 14 cleans up the affected part of tree-vect-loop.c
>>>> (vect_create_epilog_for_reduction).
>>>
>>>
>>> As said during the individual patches review I'd like the vectorizer to
>>> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
>>> only whole-element amounts).  This means we can remove
>>> VEC_RSHIFT_EXPR.  It also means that if the backend defines
>>> vec_perm_const (which it really should) it can handle the special
>>> permutes that boil down to a possibly more efficient vector shift
>>> there (a good optimization anyway).  Until it does that all backends
>>> would at least create correct code (with the endian dependent
>>> vec_shr removed).
>>>
>>> Richard.
>>>
>>>> --Alan
>>>>
>>>
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium.  Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No:  2557590
>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No:  2548782
>>