From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1802 invoked by alias); 21 Oct 2009 02:06:25 -0000 Received: (qmail 1791 invoked by uid 22791); 21 Oct 2009 02:06:24 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from virtual.bogons.net (HELO virtual.bogons.net) (193.178.223.136) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 21 Oct 2009 02:06:20 +0000 Received: from jifvik.dyndns.org (jifvik.dyndns.org [85.158.45.40]) by virtual.bogons.net (8.10.2+Sun/8.11.2) with ESMTP id n9L26G412335; Wed, 21 Oct 2009 03:06:16 +0100 (BST) Received: from [172.31.1.126] (neelix.jifvik.org [172.31.1.126]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by jifvik.dyndns.org (Postfix) with ESMTP id E43A03FEB; Wed, 21 Oct 2009 03:06:12 +0100 (BST) Message-ID: <4ADE6C92.9060300@jifvik.org> Date: Wed, 21 Oct 2009 02:06:00 -0000 From: Jonathan Larmour User-Agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501) MIME-Version: 1.0 To: Ross Younger Cc: Rutger Hofman , eCos developers Subject: Re: NAND technical review References: <4ACB4B58.2040804@ecoscentric.com> <4ACC61F0.3020303@televic.com> <4AD3E92E.5020301@jifvik.org> <4AD47ADE.9010606@cs.vu.nl> <4AD6A7EC.8080703@jifvik.org> <4ADC452B.5040706@ecoscentric.com> <4ADD14E1.3050702@jifvik.org> <4ADD8E47.1080305@ecoscentric.com> In-Reply-To: <4ADD8E47.1080305@ecoscentric.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact ecos-devel-help@ecos.sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: ecos-devel-owner@ecos.sourceware.org X-SW-Source: 2009-10/txt/msg00043.txt.bz2 Ross Younger wrote: > Jonathan Larmour wrote: > >>To double check, you mean reading was slowest, programming was faster >>and erasing was fastest, even apparently faster than what may be the >>theoretical fastest time? (I use the term "fast" advisedly, mark). >> >>Are you sure there isn't a problem with your driver to cause such >>figures? :-) > > > Those are the raw numbers. Yes, I agree that they don't appear to make > sense. As I said, profiling - which will include figuring out what's going > on here - is languishing on the todo list ... Ok, although I think I may have to take those particular figures with a pinch of salt, given they are.... unexpected. >>I wonder if Rutger has the ability to compare with his YAFFS throughput. >>OTOH, as you say, the controller plays a large part, and there's no >>common ground with R so it's entirely possible no comparison can be fair >>for either implementation. > > > The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only > through fileio so ought to be trivially portable. It doesn't appear in my > last drop on the bz ticket, but will when I get round to freshening it. Ok. Although I'm not sure how long these discussions will continue for. Although not ideal, running it on both the synth targets may be the only way to compare. > To be clear: hwecc _is_ working well, on this customer port, and getting it > going on the STM3210E is on the cards so I have something I can usefully > share publicly. Can you at least shed light on the API changes (by cut'n'pasting relevant sections of headers/code even if not the whole thing)? I feel this is a key thing to get clarity on since I don't have a view on that yet and it's an important feature. I know doc etc. will want updating, but in reality we can probably get a good idea from an overview of the code, even if it's not a complete self-contained package drop. It may also save you time. I don't doubt it works on your port, but I think I need to get a view on how well it would fit with other hardware ECC systems which I know about and those which R goes to pains to support. I also realise people have busy lives so no worries if you can't do it for a few days if there's more than a little effort involved; although I'd have thought it wouldn't be much. >>Just as an aside, you may find that improving eCos more generally to >>have e.g. assembler optimised implementation of memcpy/memmove/memset >>(and possibly others) may improve performance of these and other things >>across the board. GCC's intrinsics can only do so much. (FAOD actual >>implementations to use (at least to start with) can be found in newlib. > > > The speedups in my NAND driver on this board came from a straightforward > Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR; > 16-way and 32-way unrolls seemed to add a smidgen more performance but > increased code size perhaps disproportionately. (Using the existing VECTOR > macro but with -funroll-loops gave a similar speed-up but more noticeable > code bloat across the board.) OOI you could add it to the per-package CFLAGS (unless you meant you already did and the bloat was noticeable even just there). > The word copies in newlib's memcpy et al look like they would boost > performance generally, but I have attempted to avoid copying data around as > far as possible in my layer. I was mostly thinking of YAFFS in fact (and fileio on top), although I haven't really looked at the extent they depend on bulk memory moves. > To try and fit with the eCos philosophy, I've left the localised unroll as a > CDL option in this driver, defaulting to off. I expect similar unrolls would > be profitable in other NAND drivers, but a more generalised solution might > be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to > configure whether and how far it was unrolled? Possible, although there are probably bigger fish to fry. Jifl -- --["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine