From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7334 invoked by alias); 7 Oct 2014 07:46:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7319 invoked by uid 89); 7 Oct 2014 07:46:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wi0-f171.google.com Received: from mail-wi0-f171.google.com (HELO mail-wi0-f171.google.com) (209.85.212.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 07 Oct 2014 07:46:56 +0000 Received: by mail-wi0-f171.google.com with SMTP id em10so6888183wid.16 for ; Tue, 07 Oct 2014 00:46:53 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.194.58.229 with SMTP id u5mr1179812wjq.133.1412668013619; Tue, 07 Oct 2014 00:46:53 -0700 (PDT) Received: by 10.194.20.74 with HTTP; Tue, 7 Oct 2014 00:46:53 -0700 (PDT) In-Reply-To: References: <541AC4D2.9040901@arm.com> <5432D1A5.6080208@arm.com> Date: Tue, 07 Oct 2014 07:46:00 -0000 Message-ID: Subject: Re: [PATCH 0/14+2][Vectorizer] Made reductions endianness-neutral, fixes PR/61114 From: Richard Biener To: Alan Lawrence Cc: Marcus Shawcroft , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2014-10/txt/msg00522.txt.bz2 On Tue, Oct 7, 2014 at 9:45 AM, Richard Biener wrote: > On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence wrote: >> Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and >> 6, >> which have been previously approved, in that sequence. (Of those, all bar >> patch >> 2 are AArch64 only.) I think this is better than maintaining an >> ever-expanding >> patch series. > > Agreed. > >> Then I'll get to work on migrating all backends to the new _scal_ optab (and >> removing the vector optab). Certainly I'd like to replace vec_shr/l with >> vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching! > > I suppose we all are. It will last until end of October at least > (stage1 of gcc 4.9 > ended Nov 22th, certainly a bit late). > > I do expect we will continue merging already developed / posted stuff through > stage3 (as usual). > > That said, it would be really nice to get rid of VEC_RSHIFT_EXPR. And you can fix performance regressions you introduce (badly handled VEC_PERM) until the GCC 5 release happens (and even after that). Heh. Easy way out ;) Richard. > Thanks, > Richard. > >> --Alan >> >> >> >> >> Richard Biener wrote: >>> >>> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence >>> wrote: >>>> >>>> The end goal here is to remove this code from tree-vect-loop.c >>>> (vect_create_epilog_for_reduction): >>>> >>>> if (BYTES_BIG_ENDIAN) >>>> bitpos = size_binop (MULT_EXPR, >>>> bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) >>>> - >>>> 1), >>>> TYPE_SIZE (scalar_type)); >>>> else >>>> >>>> as this is the root cause of PR/61114 (see testcase there, failing on all >>>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard >>>> Biener, >>>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is >>>> suspicious". The code snippet above is used on two paths: >>>> >>>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = >>>> reduc_[us](plus|min|max)_optab. >>>> The optab is documented as "the scalar result is stored in the least >>>> significant bits of operand 0", but the tree code as "the first element >>>> in >>>> the vector holding the result of the reduction of all elements of the >>>> operand". This mismatch means that when the tree code is folded, the code >>>> snippet above reads the result from the wrong end of the vector. >>>> >>>> The strategy (as per >>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define >>>> new >>>> tree codes and optabs that produce scalar results directly; this seems >>>> better than tying (the element of the vector into which the result is >>>> placed) to (the endianness of the target), and avoids generating extra >>>> moves >>>> on current bigendian targets. However, the previous optabs are retained >>>> for >>>> now as a migration strategy so as not to break existing backends; moving >>>> individual platforms over will follow. >>>> >>>> A complication here is on AArch64, where we directly generate >>>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily >>>> remove this folding in order to decouple the midend and AArch64 backend. >>> >>> >>> Sounds fine. I hope we can transition all backends for 5.0 and remove >>> the vector variant optabs (maybe renaming the scalar ones). >>> >>>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. >>>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the >>>> optab >>>> is defined in an endianness-dependent way, leading to significant >>>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab >>>> is >>>> never used!). Few platforms appear to handle vec_shr_optab (and fewer >>>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change >>>> the existing optab to be endianness-neutral. >>>> >>>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 >>>> updates that implementation to fit the new endianness-neutral >>>> specification, >>>> serving as a guide for other existing backends. Patches/RFCs 15 and 16 >>>> are >>>> equivalents for MIPS and PowerPC; I haven't tested these but hope they >>>> act >>>> as useful pointers for the port maintainers. >>>> >>>> Finally patch 14 cleans up the affected part of tree-vect-loop.c >>>> (vect_create_epilog_for_reduction). >>> >>> >>> As said during the individual patches review I'd like the vectorizer to >>> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with >>> only whole-element amounts). This means we can remove >>> VEC_RSHIFT_EXPR. It also means that if the backend defines >>> vec_perm_const (which it really should) it can handle the special >>> permutes that boil down to a possibly more efficient vector shift >>> there (a good optimization anyway). Until it does that all backends >>> would at least create correct code (with the endian dependent >>> vec_shr removed). >>> >>> Richard. >>> >>>> --Alan >>>> >>> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2557590 >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2548782 >>