From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ecos-devel-return-1671-listarch-ecos-devel=sources.redhat.com@ecos.sourceware.org>
Received: (qmail 30549 invoked by alias); 20 Oct 2009 10:17:38 -0000
Received: (qmail 30541 invoked by uid 22791); 20 Oct 2009 10:17:37 -0000
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from hagrid.ecoscentric.com (HELO mail.ecoscentric.com) (212.13.207.197)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 20 Oct 2009 10:17:32 +0000
Received: from localhost (hagrid.ecoscentric.com [127.0.0.1]) 	by mail.ecoscentric.com (Postfix) with ESMTP id 642362F78024; 	Tue, 20 Oct 2009 11:17:30 +0100 (BST)
Received: from mail.ecoscentric.com ([127.0.0.1]) 	by localhost (hagrid.ecoscentric.com [127.0.0.1]) (amavisd-new, port 10024) 	with ESMTP id cs-el7yo+z87; Tue, 20 Oct 2009 11:17:28 +0100 (BST)
Message-ID: <4ADD8E47.1080305@ecoscentric.com>
Date: Tue, 20 Oct 2009 10:17:00 -0000
From: Ross Younger <wry@ecoscentric.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: Jonathan Larmour <jifl@jifvik.org>
CC: Rutger Hofman <rutger@cs.vu.nl>, =?ISO-8859-1?Q?J=FCrgen_Lambrecht?=  <J.Lambrecht@televic.com>,   eCos developers <ecos-devel@ecos.sourceware.org>,  Deroo Stijn <S.Deroo@televic.com>
Subject: Re: NAND technical review
References: <4ACB4B58.2040804@ecoscentric.com> <4ACC61F0.3020303@televic.com> <4AD3E92E.5020301@jifvik.org> <4AD47ADE.9010606@cs.vu.nl> <4AD6A7EC.8080703@jifvik.org> <4ADC452B.5040706@ecoscentric.com> <4ADD14E1.3050702@jifvik.org>
In-Reply-To: <4ADD14E1.3050702@jifvik.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact ecos-devel-help@ecos.sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <ecos-devel.ecos.sourceware.org>
List-Subscribe: <mailto:ecos-devel-subscribe@ecos.sourceware.org>
List-Post: <mailto:ecos-devel@ecos.sourceware.org>
List-Help: <mailto:ecos-devel-help@ecos.sourceware.org>, <http://ecos.sourceware.org/lists.html#faqs>
Sender: ecos-devel-owner@ecos.sourceware.org
X-SW-Source: 2009-10/txt/msg00040.txt.bz2

Jonathan Larmour wrote:
> To double check, you mean reading was slowest, programming was faster
> and erasing was fastest, even apparently faster than what may be the
> theoretical fastest time? (I use the term "fast" advisedly, mark).
> 
> Are you sure there isn't a problem with your driver to cause such
> figures? :-)

Those are the raw numbers. Yes, I agree that they don't appear to make
sense. As I said, profiling - which will include figuring out what's going
on here - is languishing on the todo list ...


> I wonder if Rutger has the ability to compare with his YAFFS throughput.
> OTOH, as you say, the controller plays a large part, and there's no
> common ground with R so it's entirely possible no comparison can be fair
> for either implementation.

The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only
through fileio so ought to be trivially portable. It doesn't appear in my
last drop on the bz ticket, but will when I get round to freshening it.


>> After I taught the library to use h/w
>> ECC I immediately saw a 46% speedup on reads and 38% on writes when
>> compared with software ECC [...]
> 
> Just to be sure, are the differences measured by these percentages
> purely in terms of overall data throughput per time?

These are from my raw NAND benchmarks (tests/rwbenchmark.c) which measure
the end-to-end time taken for a whole cyg_nand_page_read() / write /
block_erase call to return.


> I'm very interested in the fact that software changes you made, had such
> a relatively large change to the performance. 


> [hardware ECC]
> Hence my surprise at E not having support, even in principle, before!
> But clearly you're at the stage where stuff is nearly working. 

I was surprised too; but then I had been operating under the general mantra
of "first make it work, then make it work fast" and the speed work is still
in progress ...

To be clear: hwecc _is_ working well, on this customer port, and getting it
going on the STM3210E is on the cards so I have something I can usefully
share publicly.


> Just as an aside, you may find that improving eCos more generally to
> have e.g. assembler optimised implementation of memcpy/memmove/memset
> (and possibly others) may improve performance of these and other things
> across the board. GCC's intrinsics can only do so much. (FAOD actual
> implementations to use (at least to start with) can be found in newlib.

The speedups in my NAND driver on this board came from a straightforward
Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR;
16-way and 32-way unrolls seemed to add a smidgen more performance but
increased code size perhaps disproportionately. (Using the existing VECTOR
macro but with -funroll-loops gave a similar speed-up but more noticeable
code bloat across the board.)

The word copies in newlib's memcpy et al look like they would boost
performance generally, but I have attempted to avoid copying data around as
far as possible in my layer. I don't see them as helping at all with NAND
device access: you have to make a sequence of 8-bit or 16-bit writes to the
MMIO register, and that's that. This is pretty much the same situation as
Tom Duff found himself in ...

To try and fit with the eCos philosophy, I've left the localised unroll as a
CDL option in this driver, defaulting to off. I expect similar unrolls would
be profitable in other NAND drivers, but a more generalised solution might
be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to
configure whether and how far it was unrolled?


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com