From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15337 invoked by alias); 5 Sep 2017 08:47:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 4735 invoked by uid 89); 5 Sep 2017 08:45:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-ua0-f182.google.com Received: from mail-ua0-f182.google.com (HELO mail-ua0-f182.google.com) (209.85.217.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 05 Sep 2017 08:45:50 +0000 Received: by mail-ua0-f182.google.com with SMTP id g47so6887749uad.0 for ; Tue, 05 Sep 2017 01:45:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Xxj+QqkpWa651aPjIXeZZbaZdqs/BRB58XpXv/cnwYs=; b=fUTv68EKAp00BkJXUKsfHcHnqL9E7hrf3Jv3+138jtY/CYzKW5LJopkCvd70StYZd0 jkgPOZzGYp2sENJ+ji8PjZZYYU9V9+2pGPQ/A10zj6iZPLDPINUCLB4bGiNxXZJeVvOs cdgJhgYueVYlkdjE9kPlx+AwQBJerMymQE/Z4y57zSrd4A780WO8DFd3P2PzH7HrN8su Qm859lFYBodghQuhLZCVupLmeJbdgb+YC1LOKZWgZ6NS4DmDJ2THTZRmPfJlVZpjAKFa he/DaIa3thc14KX2wTVvSq1HlJ6+rUuVR6e/7qevi9TwJsQ1njpKT3hPE9I21hgfCXS2 Cuqw== X-Gm-Message-State: AHPjjUijTcbkQgZNmLBQ28eEpTvQY9rANHBMzBmLlrzOk1RJAP15nEgh XqYeNvEmdaBTtsuETrje2gefqupXLAi6 X-Google-Smtp-Source: ADKCNb4OFYlG3EVU8Ez9Yt9fi3UJF74Q2qejLGLOYgTBZpiXG1dpw+pNfcmBX1Pp+GvGSPhMmbQhrncSCJ2XUc+Cqxs= X-Received: by 10.176.4.53 with SMTP id 50mr2171654uav.46.1504601148094; Tue, 05 Sep 2017 01:45:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.81.3 with HTTP; Tue, 5 Sep 2017 01:45:47 -0700 (PDT) In-Reply-To: <59AD6894.7010605@foss.arm.com> References: <59AD6894.7010605@foss.arm.com> From: Christophe Lyon Date: Tue, 05 Sep 2017 08:47:00 -0000 Message-ID: Subject: Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) To: Kyrill Tkachov Cc: Bernd Edlinger , Ramana Radhakrishnan , GCC Patches , Richard Earnshaw , Wilco Dijkstra Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-09/txt/msg00221.txt.bz2 Hi Bernd, On 4 September 2017 at 16:52, Kyrill Tkachov wrote: > > On 29/04/17 18:45, Bernd Edlinger wrote: >> >> Ping... >> >> I attached a rebased version since there was a merge conflict in >> the xordi3 pattern, otherwise the patch is still identical. >> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2 >> early when the target has no neon or iwmmxt. >> >> >> Thanks >> Bernd. >> >> >> >> On 11/28/16 20:42, Bernd Edlinger wrote: >>> >>> On 11/25/16 12:30, Ramana Radhakrishnan wrote: >>>> >>>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger >>>> wrote: >>>>> >>>>> Hi! >>>>> >>>>> This improves the stack usage on the sha512 test case for the case >>>>> without hardware fpu and without iwmmxt by splitting all di-mode >>>>> patterns right while expanding which is similar to what the >>>>> shift-pattern >>>>> does. It does nothing in the case iwmmxt and fpu=neon or vfp as well >>>>> as >>>>> thumb1. >>>>> >>>> I would go further and do this in the absence of Neon, the VFP unit >>>> being there doesn't help with DImode operations i.e. we do not have 64 >>>> bit integer arithmetic instructions without Neon. The main reason why >>>> we have the DImode patterns split so late is to give a chance for >>>> folks who want to do 64 bit arithmetic in Neon a chance to make this >>>> work as well as support some of the 64 bit Neon intrinsics which IIRC >>>> map down to these instructions. Doing this just for soft-float doesn't >>>> improve the default case only. I don't usually test iwmmxt and I'm not >>>> sure who has the ability to do so, thus keeping this restriction for >>>> iwMMX is fine. >>>> >>>> >>> Yes I understand, thanks for pointing that out. >>> >>> I was not aware what iwmmxt exists at all, but I noticed that most >>> 64bit expansions work completely different, and would break if we split >>> the pattern early. >>> >>> I can however only look at the assembler outout for iwmmxt, and make >>> sure that the stack usage does not get worse. >>> >>> Thus the new version of the patch keeps only thumb1, neon and iwmmxt as >>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack >>> for the test cases, and vfp and soft-float at around 270 bytes stack >>> usage. >>> >>>>> It reduces the stack usage from 2300 to near optimal 272 bytes (!). >>>>> >>>>> Note this also splits many ldrd/strd instructions and therefore I will >>>>> post a followup-patch that mitigates this effect by enabling the >>>>> ldrd/strd >>>>> peephole optimization after the necessary reg-testing. >>>>> >>>>> >>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>> >>>> What do you mean by arm-linux-gnueabihf - when folks say that I >>>> interpret it as --with-arch=armv7-a --with-float=hard >>>> --with-fpu=vfpv3-d16 or (--with-fpu=neon). >>>> >>>> If you've really bootstrapped and regtested it on armhf, doesn't this >>>> patch as it stand have no effect there i.e. no change ? >>>> arm-linux-gnueabihf usually means to me someone has configured with >>>> --with-float=hard, so there are no regressions in the hard float ABI >>>> case, >>>> >>> I know it proves little. When I say arm-linux-gnueabihf >>> I do in fact mean --enable-languages=all,ada,go,obj-c++ >>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 >>> --with-float=hard. >>> >>> My main interest in the stack usage is of course not because of linux, >>> but because of eCos where we have very small task stacks and in fact >>> no fpu support by the O/S at all, so that patch is exactly what we need. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf >>> Is it OK for trunk? > > > The code is ok. > AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code as > the splits > will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and > TARGET_IWMMXT2 > is irrelevant here). > > So this is ok for trunk. > Thanks, and sorry again for the delay. > Kyrill > This patch (r251663) causes a regression on armeb-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a9 --with-fpu vfpv3-d16-fp16 FAIL: gcc.dg/vect/vect-singleton_1.c (internal compiler error) FAIL: gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects (internal compiler error) the test passes if gcc is configured --with-fpu neon-fp16 Christophe >>> >>> Thanks >>> Bernd. > >