From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2401 invoked by alias); 9 Apr 2013 12:04:38 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 2387 invoked by uid 89); 9 Apr 2013 12:04:38 -0000 X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL,TW_CP autolearn=no version=3.3.1 Received: from popelka.ms.mff.cuni.cz (HELO popelka.ms.mff.cuni.cz) (195.113.20.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 09 Apr 2013 12:04:37 +0000 Received: from domone.kolej.mff.cuni.cz (popelka.ms.mff.cuni.cz [195.113.20.131]) by popelka.ms.mff.cuni.cz (Postfix) with ESMTPS id 71C8E695DE; Tue, 9 Apr 2013 14:04:32 +0200 (CEST) Received: by domone.kolej.mff.cuni.cz (Postfix, from userid 1000) id B10756046C; Tue, 9 Apr 2013 14:04:18 +0200 (CEST) Date: Tue, 09 Apr 2013 12:04:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: Richard Earnshaw Cc: "Joseph S. Myers" , "Shih-Yuan Lee (FourDollars)" , "patches@eglibc.org" , "libc-ports@sourceware.org" , "rex.tsai@canonical.com" , "jesse.sung@canonical.com" , "yc.cheng@canonical.com" , Shih-Yuan Lee Subject: Re: [PATCH] ARM: NEON detected memcpy. Message-ID: <20130409120418.GA6855@domone.kolej.mff.cuni.cz> References: <5163D9B8.7030008@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5163D9B8.7030008@arm.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-04/txt/msg00032.txt.bz2 On Tue, Apr 09, 2013 at 10:04:56AM +0100, Richard Earnshaw wrote: > On 03/04/13 16:08, Joseph S. Myers wrote: > >I was previously told by people at ARM that NEON memcpy wasn't a good idea > >in practice because of raised power consumption, context switch costs etc. > >from using NEON in processes that otherwise didn't use it, even if it > >appeared superficially beneficial in benchmarks. > > What really matters is system power increase vs performance gain and > what you might be able to save if you finish sooner. If a 10% > improvement to memcpy performance comes at a 12% increase in CPU > power, then that might seem like a net loss. But if the CPU is only > 50% of the system power, then the increase in system power increase > is just half of that (ie 6%), but the performance improvement will > still be 10%. Note that 20% is just an example to make the figures > easier here, I've no idea what the real numbers are, and they will > be hightly dependent on the other components in the system: a > back-lit display, in particular, will use a significant amount of > power. > I did say similar thing. I also added treshold idea. >From my previous mail: " You need to compare neon/other implementation speed. Then determine size where neon is faster if we include energy cost and context switch. My first estimate is use neon when larger than 4096 bytes. However to determine context switch cost of neon you must account network effect. If you use neon in one function that is called sufficiently often (to always save registers) then adding neon implementation for additional functions does not increase cost. " Ondra > It's also necessary to think about how the Neon unit in the > processor is managed. Is it power gated or simply clock gated. > Power gated regions are likely to have long power-up times (relative > to normal CPU operations), but clock-gated regions are typically > instantaneously available. > > Finally, you need to consider whether the unit is likely to be > already in use. With the increasing trend to using the hard-float > ABI, VFP (and Neon) are generally much more widely used in code now > than they were, so the other potential cost of using Neon (lazy > context switching) is also likely to be a non-issue, than if the > unit is almost never touched. > memcpy is after strcmp second most often called string function. > R.