From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30549 invoked by alias); 20 Oct 2009 10:17:38 -0000 Received: (qmail 30541 invoked by uid 22791); 20 Oct 2009 10:17:37 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from hagrid.ecoscentric.com (HELO mail.ecoscentric.com) (212.13.207.197) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 20 Oct 2009 10:17:32 +0000 Received: from localhost (hagrid.ecoscentric.com [127.0.0.1]) by mail.ecoscentric.com (Postfix) with ESMTP id 642362F78024; Tue, 20 Oct 2009 11:17:30 +0100 (BST) Received: from mail.ecoscentric.com ([127.0.0.1]) by localhost (hagrid.ecoscentric.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cs-el7yo+z87; Tue, 20 Oct 2009 11:17:28 +0100 (BST) Message-ID: <4ADD8E47.1080305@ecoscentric.com> Date: Tue, 20 Oct 2009 10:17:00 -0000 From: Ross Younger User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Jonathan Larmour CC: Rutger Hofman , =?ISO-8859-1?Q?J=FCrgen_Lambrecht?= , eCos developers , Deroo Stijn Subject: Re: NAND technical review References: <4ACB4B58.2040804@ecoscentric.com> <4ACC61F0.3020303@televic.com> <4AD3E92E.5020301@jifvik.org> <4AD47ADE.9010606@cs.vu.nl> <4AD6A7EC.8080703@jifvik.org> <4ADC452B.5040706@ecoscentric.com> <4ADD14E1.3050702@jifvik.org> In-Reply-To: <4ADD14E1.3050702@jifvik.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact ecos-devel-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: ecos-devel-owner@ecos.sourceware.org X-SW-Source: 2009-10/txt/msg00040.txt.bz2 Jonathan Larmour wrote: > To double check, you mean reading was slowest, programming was faster > and erasing was fastest, even apparently faster than what may be the > theoretical fastest time? (I use the term "fast" advisedly, mark). > > Are you sure there isn't a problem with your driver to cause such > figures? :-) Those are the raw numbers. Yes, I agree that they don't appear to make sense. As I said, profiling - which will include figuring out what's going on here - is languishing on the todo list ... > I wonder if Rutger has the ability to compare with his YAFFS throughput. > OTOH, as you say, the controller plays a large part, and there's no > common ground with R so it's entirely possible no comparison can be fair > for either implementation. The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only through fileio so ought to be trivially portable. It doesn't appear in my last drop on the bz ticket, but will when I get round to freshening it. >> After I taught the library to use h/w >> ECC I immediately saw a 46% speedup on reads and 38% on writes when >> compared with software ECC [...] > > Just to be sure, are the differences measured by these percentages > purely in terms of overall data throughput per time? These are from my raw NAND benchmarks (tests/rwbenchmark.c) which measure the end-to-end time taken for a whole cyg_nand_page_read() / write / block_erase call to return. > I'm very interested in the fact that software changes you made, had such > a relatively large change to the performance. > [hardware ECC] > Hence my surprise at E not having support, even in principle, before! > But clearly you're at the stage where stuff is nearly working. I was surprised too; but then I had been operating under the general mantra of "first make it work, then make it work fast" and the speed work is still in progress ... To be clear: hwecc _is_ working well, on this customer port, and getting it going on the STM3210E is on the cards so I have something I can usefully share publicly. > Just as an aside, you may find that improving eCos more generally to > have e.g. assembler optimised implementation of memcpy/memmove/memset > (and possibly others) may improve performance of these and other things > across the board. GCC's intrinsics can only do so much. (FAOD actual > implementations to use (at least to start with) can be found in newlib. The speedups in my NAND driver on this board came from a straightforward Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR; 16-way and 32-way unrolls seemed to add a smidgen more performance but increased code size perhaps disproportionately. (Using the existing VECTOR macro but with -funroll-loops gave a similar speed-up but more noticeable code bloat across the board.) The word copies in newlib's memcpy et al look like they would boost performance generally, but I have attempted to avoid copying data around as far as possible in my layer. I don't see them as helping at all with NAND device access: you have to make a sequence of 8-bit or 16-bit writes to the MMIO register, and that's that. This is pretty much the same situation as Tom Duff found himself in ... To try and fit with the eCos philosophy, I've left the localised unroll as a CDL option in this driver, defaulting to off. I expect similar unrolls would be profitable in other NAND drivers, but a more generalised solution might be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to configure whether and how far it was unrolled? Ross -- Embedded Software Engineer, eCosCentric Limited. Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK. Registered in England no. 4422071. www.ecoscentric.com