public inbox for ecos-devel@sourceware.org
 help / color / mirror / Atom feed
* NAND technical review
@ 2009-10-02 15:51 Jonathan Larmour
  2009-10-06 13:51 ` Ross Younger
  2009-10-16  7:29 ` Simon Kallweit
  0 siblings, 2 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-02 15:51 UTC (permalink / raw)
  To: Ross Younger, eCos developers; +Cc: Rutger Hofman, Simon Kallweit

As per my ecos-discuss mail just now, I would like to get going straight 
away with a public discussion of the _technical_ merits of both NAND 
implementations. There is a risk of rehashing old ground, but I'm sure in 
both cases things have moved on a bit since the last time round, not least 
in response to comments, so it would also be good to clarify the current 
state.

I think at first the ball is really in Ross/eCosCentric's court to give 
the technical rationale for the decision, so I'd like to ask him first to 
give his rationale and his own perspective of the comparison of the 
pros/cons. I think the primary onus of the legwork is on eCosCentric, not 
least because they saw Rutger's version before implementation - although 
that was an early version, so it's entirely possible things have changed 
now. Obviously I would especially like Rutger's view on whether any 
purported benefits of eCosCentric's implementation are really the case, 
and any claimed disadvantages of his own are plausible. I suspect some of 
this to come down to subjective opinions of course.

But this is an open discussion, so I'd appreciate anyone's views. I'd 
especially value Simon Kallweit's views as someone who has actually used 
both code implementations which gives him a very good perspective. 
Although if anyone wants to contribute, please keep it on topic, within 
this thread, and technical.

Thanks. Over to Ross....

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-02 15:51 NAND technical review Jonathan Larmour
@ 2009-10-06 13:51 ` Ross Younger
  2009-10-07  3:12   ` Jonathan Larmour
                     ` (3 more replies)
  2009-10-16  7:29 ` Simon Kallweit
  1 sibling, 4 replies; 58+ messages in thread
From: Ross Younger @ 2009-10-06 13:51 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: eCos developers

Jonathan Larmour wrote:
> I think at first the ball is really in Ross/eCosCentric's court to give
> the technical rationale for the decision, so I'd like to ask him first
> to give his rationale and his own perspective of the comparison of the
> pros/cons.

Here goes with a comparison between the two in something close to their
current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
(Rutger) from time to time.

Note that this is only really a comparison of the two NAND layers. I have
not attempted to compare the two YAFFS porting layers, though I do mention
them in a couple of places where it seemed relevant.

BTW: I will be off-net tomorrow and all next week, so please don't think I
am ignoring the discussion...


1. NAND 101 -------------------------------------------------------------

(Those familiar with NAND chips can skip this section, but I appreciate
that not everybody on-list is in the business of writing NAND device
drivers :-) )

(i) Conceptual

A chip comprises a number of blocks (a round power of two).

Each block comprises a number of pages (another power of two).

Each page has a "main" data area (512 or 2048 bytes on current devices) and
a "spare" - aka out-of-band or OOB - area (16 or 64 bytes respectively).
It's up to the driver and application to decide how they will use the spare
area, but it's usual for some of it to be given over to storing ECC data,
and there is space for a factory-bad marker (see below).

Programming the chip must be performed a page at a time (sometimes a 512
byte subpage).

Erasing must be performed a whole block at a time.

By way of illustration, in the chip spec sheet I have to hand (Samsung
K9F1G08 series):
* 1 page = 2k byte + 64 spare
* 1 block = 64 pages
* The whole chip has 1024 blocks, making for 128MB (1Gbit) of data and 4MB
(32Mbit) of spare area.


Now, I mentioned ECC data. NAND technology has a number of underlying
limitations, importantly that it has reliability issues. I don't have a full
picture - the manufacturers seem to be understandably coy - but my
understanding is that on each page, a driver ought to be able to cope with a
single bit having flipped either on programming or on reading. The
recommended way to achieve this is by storing an ECC in the spare area: the
algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
bytes of data and able to correct a 1 bit error and detect a 2 bit error.

There is also the question of bad blocks. Again, full details are sketchy. A
chip may be shipped with a number of "factory-bad" blocks (e.g. up to 20 on
this Samsung chip); they are marked as such in their spare area. (What
constitutes a "bad" block is not published; one imagines that the factory
have access to more test information than users do and that there may be
statistical techniques involved in judging the likely reliability of the
block.) Blocks may also fail during the life of the device, usually by the
chip reporting a failure during a program or erase operation. Because of
this, the manufacturers recommend that chip drivers scan the device for
factory-bad markers then create and maintain a Bad Block Table throughout
the life of of the device. How this is done is not prescribed, but the
behaviour of the Linux MTD layer is something approximating a de facto standard.


(ii) Chip comms protocol

Getting data into and out of the chip involves a simple protocol sequence.

Commands are single bytes; addresses are sequences of a few bytes depending
on the chip size and the operation invoked.

For example, to read a page of data on the spec sheet I have to hand is:
* Write 0x00 into the command latch
* Write the four address bytes in turn into the address latch
* Write 0x30 into the command latch
* Chip signals Busy; wait for it to signal Ready
* Read out (up to) 2112 bytes of data.

However, not all chips are quite the same. The ONFI initiative is an attempt
to standardise chip protocols and most new chips should comply with it. A
number of chips on the market are _nearly_ ONFI-compliant: deviations
typically occur over the format of the ReadID response and that of an
address. I believe that older chips did their own thing entirely.


(iii) Electrical

Most, if not all, NAND chips have the same broad electrical interface.

There is a master Chip Enable line; nothing happens if this is not active.

Data flows into and out of the chip via its data bus, which is 8 or 16 bits
wide, mediated by Read Enable and Write Enable lines.

Commands and addresses are sent on the data bus, but routed to the
appropriate latches by asserting the Address Latch Enable or Command Latch
Enable lines at the same time.

There is also a ready/busy line which the driver can use to tell when an
operation is in progress. Typical operation times from the Samsung spec
sheet I have to hand are 25us for a page read, 300us for a page program, and
2ms for a block erase.


(iv) Board hook-up

What's more interesting is how the lines are hooked up to the board.

It is quite commonplace for a board based on a SoC to make good use of an
onboard memory controller or dedicated NAND controller. This allows the
controller to be programmed with the electrical profile the chip expects,
which makes life easy for the device driver: often, you just have to write
bytes to the relevant MMIO register address as fast as you wish and the
controller takes care of the rest.

If the NAND lines are connected to the CPU only as GPIO, the driver has a
lot of work to do in conforming to the correct signal profile at every step
of the chip protocol. (I haven't had to produce such a port, and I don't
think Rutger has needed one either, though he has produced an untested
example driver.)

In the case of a dedicated NAND controller, it is common to provide
hardware-assistance for ECC calculation. Where available, this provides a
significant speed-up (about 40% per page in my benchmarking).

Sometimes the ready/busy line isn't wired in or requires a jumper to be set
to route it. This can be worked around: for a read operation, one can just
insert a delay loop for the prescribed maximum time, while for programs and
erases, most (all?) chips have a "Read Status" command which can be used to
query whether the operation has completed.

It can be beneficial to be able to set up the ready/busy line as an
interrupt source, as opposed to having to poll it. Whilst there is an
overhead involved in context-switching, if other application threads have
much to do it may be advantageous overall for the thread waiting for the
NAND to sleep until woken by interrupt.

Of course, it is possible to put multiple chips on a board. In that case
there needs to be a way to route between them; I would expect this to be
done with the Chip Select line, addressed either by different MMIO addresses
or a separate GPIO or CPLD step. Theoretically, multiple chips could be
hooked up in parallel to give something that looks like a 16 or 32-bit
"wide" chip, but I have never encountered this in the NAND world, and it
would impose a certain extra level of complexity on the driver.


2. Application interface -----------------------------------------------

Both layers have broadly similar application interfaces.

In both layers, an application must first use a `lookup' call which provides
a pointer to a device context struct. In Rutger's layer, devices are
identified by device number; in eCosCentric's, by a textual name set in the
board HAL.

Both layers provide a means of finding out about the device. R's provides
a call which returns an info block; E's provides macros which retrieve
information from the device struct (which may also be queried directly).

The basic operations required are reading a page, programming a page and
erasing a block, and both layers provide these.

The page-oriented operations optionally allow read/write of the page spare
area. These operations also automatically calculate and check an ECC, if the
device has been configured to do so. Rutger's layer has an extra hook in
place where an application may explicitly request the use of cached reading
and writing where the device supports this.

Both layers also support the necessary ancillary operations of querying the
status of a block in the bad-block table, and marking a block as bad.


(a) Partitions

E's application interface also provides logic implementing partitions.
That is to say, all access to a NAND array must be via a `partition';
the NAND layer sanity-checks whether the requested flash page or block
address is within the given partition. This is quite a lightweight
layer and hasn't added much overhead of either code footprint or
execution time.

The presence of partitions in E's model was controversial, as are its
fine details. Nevertheless, some notion of partitioning turns out to be
essential on some boards. In some recent work for a customer we identified
three separate regions of NAND: somewhere to put the boot loader (primary,
as booted by ROM, and RedBoot), somewhere for the application image itself
(perhaps FIS-like rather than a full filesystem), and a filesystem for the
application to use as it pleases.


R's interface does not have such a facility. It appears that, in the event
that the flash is shared between two or more logical regions, it's up to
higher-level code to be configured with the correct block ranges to use.


(b) Dynamic memory allocation

R's layer mandates the provision of malloc and free, or compatible
functions. These must be provided to the cyg_nand_init() call.

E's doesn't; instead it declares a small number of static buffers.

Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
issue because the memory needs of that layer are well-bounded; I think I
broadly agree, though the situation is not ideal in that it forces somebody
who wants to use a lean, mean eCos configuration to work around.

Also note that if you're going to run a full file system like YAFFS, you
can't avoid needing malloc, but in an application making simpler use of
NAND, it's an overhead that you may prefer to avoid.


3. Driver model --------------------------------------------------------

The major architectural difference between the two NAND layers is in their
driver models and the degree of abstraction enforced.

In Rutger's layer, controllers and chips are both formally abstracted. The
application talks to the Abstract NAND Chip, which has (hard-coded) the
basic sequences of commands, addresses and data required to talk to a NAND
chip. This layer talks to a controller driver, which provides the nuts and
bolts of reading and writing to the device. The chip driver is also called
by the ANC layer, and provides the really chip-specific parts.

The call flow looks something like this (best viewed in fixed-width font):

Application --(H)-> ANC --(L)-> Controller driver
                       \
                        \-(C)-> Chip driver

H: high-level interface (read page, program page, erase block; chip
(de)selection)
L: low-level interface (read/write commands, addresses, data; query the busy
line)
C: chip-specific details (chip init, parse ReadID, query factory-bad marker)


In eCosCentric's layer, a NAND driver is a single abstraction covering chip
init and querying the factory-bad status as well as the high level functions
(reading a page, etc). It is left to the driver to determine the sequence of
commands to send. How the driver interacts with the device is considered to
be a contract only between the driver and the relevant platform HAL, so is
not formally abstracted by the NAND layer.

E's chip drivers are written as .inl files, intended to be included by the
relevant platform HALs by whichever source file provides the required
low-level functions. The lack of a formal abstraction is an attempt to
provide a leaner and meaner experience at runtime: the low-level functions
can be (and indeed are, so far) provided as static inlines.

The flow looks like this:

Application --(H1)-> NAND layer --(H2)-> NAND driver --(L*)-> Platform HAL

H1: high-level calls (read page, program page, erase block)
H2: high-level calls (as H1, plus device init and query factory-bad marker)
L*: low-level calls, like L above but not formally abstracted


The two models have pros and cons in both directions.

- As hinted at above, the static inline model of E's low-level access
functions is expected to turn out to have a lower function call (and,
generally, code size) overhead than R's.

- R's model shares the command sequence logic amongst all chips,
differentiating only between small- and large-page devices. (I do not know
whether this is correct for all current chips, though going forwards seems
less likely to be an issue as fully-ONFI-compliant chips become the norm.)
If multiple chips of different types are present in a build, E's model
potentially duplicates code (though this could be worked around; also, an
ONFI driver ought to be written).

- A corollary of arguably inconsequential import: R's model forces the synth
driver to emulate an entire NAND chip and its protocol. E's synth doesn't
need to.

- E's high-level driver interface makes it harder to add new functions
later, necessitating a change to that API (H2 above). R's does not; the
requisite logic would only need to be added to the ANC. It is not thought
that more than a handful such changes will ever be required, and it may be
possible to maintain backwards compatibility. (As a case in point, support
for hardware ECC is currently work-in-progress within eCosCentric, and does
require such a change, but now is not the right time to discuss that.)


It would perhaps be interesting to compare the complexities of drivers for
the two models, but it's not readily apparent how we would do that fairly.

Perhaps porting a driver from one NAND layer to the other would be a useful
exercise, and would also allow us to compare code sizes. Any suggestions or
(he says hopefully) volunteers? I've got a lot on my plate this month...


4. Feature/implementation differences ------------------------------------

(I don't consider these to be significant issues; whilst noteworthy, I don't
think they would take much effort to resolve.)

(a) Documentation

The two layers' documentation differ in their depth and layout; these are
difficult for me to compare objectively, and I would suggest that a fresh
pair of eyes compare them.

I can only offer the comment that I documented the E layer bearing in mind
what I considered to be missing from the R layer documentation: it was not
clear how the controller and chip layers inter-related, nor where to start
in creating a driver. (I also had a lot less experience of NAND chips then
than I do now, and what I need to know now is different from what a newbie
would.)

(b) Availability of drivers

R provides support for:
- One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
- One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
presumably only tested on the x8 chip on the BlackFin board?)
- A synthetic controller/chip package
- A template for a GPIO-based controller (untested, intended as an example only)

I seem to remember rumours of the existence of a driver for a further
chip+board combination, but I haven't seen it.

E provides support for:
- Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
complete, based on work by Simon K; some enhancements planned)
- Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
NANDxxxx3A (small page, x8) (based on work by Simon K)
- Synthetic target. This offers more features than R's: bad block injection,
logging, and a GUI interface via the synth I/O auxiliary.
- Further (customer-confidential) board ports.

(c) RedBoot support

E have added some commands for NAND operations and tested on the EA LPC2468
board. (YAFFS support works via the existing RB fileio layer; nothing really
needed to be done.)

(d) Degree of testing

There are presumably differences of coverage here; both E and R assert they
have carried out stress tests. Properly comparing the depth of the two would
be a job for fresh eyes.

E have:
- a handful of unit and functional tests of the NAND layer, and a benchmarker
- a number of YAFFS functional tests, one of which includes benchmarking,
and a further severe YAFFS stress test: these indirectly test the NAND
layer. (The latter has been run under the synth driver with bad-block
injection turned on, and has revealed some subtle bugs which we probably
wouldn't otherwise have caught.)
- the ability to run continual test cycles in their test farm


5. Works in progress -----------------------------------------------------

I can of course only comment on eCosCentric's plans, but the following work
is in the pipeline:

* Expansion of the device interface to better allow efficient hardware ECC
support (in progress)
* Hardware ECC for the STM3210E board driver
* Performance tuning of software ECC and of NAND low-level drivers
* Partition addressing: make addressing relative to the start of the
partition, once and for all
* Simple raw NAND "filesystem" for use by RedBoot (see
http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
are the latest public mails but not the latest version of my thinking, which
I will update in due course)
* More RedBoot NAND utility commands
* Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
filesystem
* Part-page read support (would provide a big speed-up to parts of YAFFS2
inbandTags mode as needed by small-page devices like that on the STM3210E)

--------------------------------------------------------------------------


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-06 13:51 ` Ross Younger
@ 2009-10-07  3:12   ` Jonathan Larmour
  2009-10-07 16:22     ` Rutger Hofman
       [not found]     ` <4ACDF868.7050706@ecoscentric.com>
  2009-10-07  9:40   ` Jürgen Lambrecht
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-07  3:12 UTC (permalink / raw)
  To: Ross Younger; +Cc: eCos developers, Rutger Hofman

Hi Ross,

First thanks very much for all this. Quite a bit to digest but only 
because it's extremely useful. Sorry for the number of questions I have - 
it's not meant to be inquisitorial, but obviously I need to get to the 
bottom of certain issues.

I've added Rutger to the CC as he may be able to comment on some of the 
issues I raise.

You can assume tacit acceptance/understanding of whatever I haven't 
commented on.

Ross Younger wrote:
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).

FWIW, Rutger is now up to r666.

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
> typically occur over the format of the ReadID response and that of an
> address. I believe that older chips did their own thing entirely.

Good ONFI support should be the highest priority as that's the way 
everything is likely to go, although we do need the others too. OTOH, my 
experience of NOR flash chip interfaces is that standard specs are all 
well and good, but manufacturers still like to add their own touches. So I 
suspect ONFI will probably correspond to a common subset of functionality, 
but more would want to be done to improve support for individual chips in 
due course.

> It can be beneficial to be able to set up the ready/busy line as an
> interrupt source, as opposed to having to poll it. Whilst there is an
> overhead involved in context-switching, if other application threads have
> much to do it may be advantageous overall for the thread waiting for the
> NAND to sleep until woken by interrupt.

Personally I would expect use as an interrupt line as the main role of the 
ready line.

> Of course, it is possible to put multiple chips on a board. In that case
> there needs to be a way to route between them; I would expect this to be
> done with the Chip Select line, addressed either by different MMIO addresses
> or a separate GPIO or CPLD step. Theoretically, multiple chips could be
> hooked up in parallel to give something that looks like a 16 or 32-bit
> "wide" chip, but I have never encountered this in the NAND world, and it
> would impose a certain extra level of complexity on the driver.

Have you found on-chip (SoC's) NAND controllers permit such a 
configuration? If not, I would assume that it's not an expected hardware 
configuration. Rutger's layer does allow multiple chips per controller, 
but AFAICT that's just in the straightforward way.

What problems would you see, if any, using your layer with the same 
controller and two completely different chips, of different geometry? Can 
you still have a common codebase with other (different) platforms?

Is anyone aware of NAND chips with different sized blocks? Analogous to 
bootblocks with NOR (I haven't, but others will undoubtedly have seen more 
parts than I). Although it's possible that even if they're not around or 
common now, they may be in future. Unfortunately from what I can tell 
neither layer would be able to support that directly, although I think it 
may be possible for the eCosCentric layer to allow the driver to pretend 
there is a different NAND chip. Do you think so too?

> 2. Application interface -----------------------------------------------
> 
> Both layers have broadly similar application interfaces.
> 
> In both layers, an application must first use a `lookup' call which provides
> a pointer to a device context struct. In Rutger's layer, devices are
> identified by device number; in eCosCentric's, by a textual name set in the
> board HAL.

A device number does seem to be a bit limiting, and less deterministic. 
OTOH, a textual name arguably adds a little extra complexity.

I note Rutger's layer needs an explicit init call, whereas yours DTRT 
using a constructor, which is good.

> The basic operations required are reading a page, programming a page and
> erasing a block, and both layers provide these.

However I believe Rutger's supports partial page writes (use of 'column'), 
whereas I don't believe eCosCentric's does.

> The page-oriented operations optionally allow read/write of the page spare
> area. These operations also automatically calculate and check an ECC, if the
> device has been configured to do so. Rutger's layer has an extra hook in
> place where an application may explicitly request the use of cached reading
> and writing where the device supports this.

That seems like a useful potential optimisation, exploiting underlying 
capabilities. Any reason you didn't implement this?

I could also believe that NAND controllers can also optimise by doing 
multiple block reads, where this hint would also prove useful.

> Both layers also support the necessary ancillary operations of querying the
> status of a block in the bad-block table, and marking a block as bad.

Does your implementation _require_ a BBT in its current implementation? 
For simpler NAND usage, it may be overkill e.g. an application where the 
number of rewrites is very small, so the factory bad markers may be 
considered sufficient.

> (a) Partitions
[snip]
> R's interface does not have such a facility. It appears that, in the event
> that the flash is shared between two or more logical regions, it's up to
> higher-level code to be configured with the correct block ranges to use.

In yours, the block ranges must be configured in CDL. Is there much 
difference? I can see an advantage in writing platform-independent test 
programs. But in applications within products possibly less so. Especially 
since the flash geometry, including size, can be programmatically queried.

If there was to be a single firmware supporting multiple board 
revisions/configurations (as can definitely happen), which could include 
different sizes of NAND, I think R's implementation would be able to adapt 
better than E's, as the high-level program can divide up the sizes based 
on what it sees.

> (b) Dynamic memory allocation
> 
> R's layer mandates the provision of malloc and free, or compatible
> functions. These must be provided to the cyg_nand_init() call.

That's unfortunate - that limits its use in smaller boot loaders - a key 
application.

> E's doesn't; instead it declares a small number of static buffers.

I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are no 
other variables. Again I'm thinking of the scenario of single firmware - 
different board revs. Can you confirm?

> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
> issue because the memory needs of that layer are well-bounded; I think I
> broadly agree, though the situation is not ideal in that it forces somebody
> who wants to use a lean, mean eCos configuration to work around.

The overhead of including something like malloc/free in the image may 
compare badly with the amount of memory R's needs to allocate in the first 
place. I also note that if R's implementation has program verifies enabled 
it allocates and frees a page _every_ time. If nothing else this could 
lead to heap fragmentation.

OTOH your implementation doesn't supports program verifies in the higher 
level anyway (I note your code comment about it being unnecessary as the 
device should report a successful program - your faith in correct hardware 
behaviour is considerable :-) ).

> Also note that if you're going to run a full file system like YAFFS, you
> can't avoid needing malloc, but in an application making simpler use of
> NAND, it's an overhead that you may prefer to avoid.

It's true that YAFFS is likely to be the most common application though.

> 3. Driver model --------------------------------------------------------
> 
[snip]
> 
> In eCosCentric's layer, a NAND driver is a single abstraction covering chip
> init and querying the factory-bad status as well as the high level functions
> (reading a page, etc). It is left to the driver to determine the sequence of
> commands to send. How the driver interacts with the device is considered to
> be a contract only between the driver and the relevant platform HAL, so is
> not formally abstracted by the NAND layer.

Indeed it's not dissimilar to the existing NOR flash layer.

> - R's model shares the command sequence logic amongst all chips,
> differentiating only between small- and large-page devices. (I do not know
> whether this is correct for all current chips, though going forwards seems
> less likely to be an issue as fully-ONFI-compliant chips become the norm.)

Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it may 
be too prescriptive to be robustly future-proof.

> If multiple chips of different types are present in a build, E's model
> potentially duplicates code (though this could be worked around; also, an
> ONFI driver ought to be written).

Worked around in a way likely to increase single-device footprint though. 
Shame about the lack of OFNI driver, although I guess the parts still 
aren't widely used which can't help. The Samsung K9 is close at least.

> - A corollary of arguably inconsequential import: R's model forces the synth
> driver to emulate an entire NAND chip and its protocol. E's synth doesn't
> need to.

One could say that makes it a more realistic emulation. But yes I can see 
disadvantages with a somewhat rigid world view. Thinking out loud, I 
wonder if Rutger's layer could work with something like Samsung OneNAND.

> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC. It is not thought
> that more than a handful such changes will ever be required, and it may be
> possible to maintain backwards compatibility. (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)

In my view allowing hardware ECC support is a vital part of an API. If an 
API doesn't permit exploiting hardware ECC that would be quite a negative. 
R's does appear to. OTOH I can't imagine it being a difficult thing to add 
in yours. In fact, because of the requirement for the drivers to call 
CYG_NAND_FUNS, it doesn't seem difficult at all to be backwardly 
compatible. Am I right? Nevertheless, it would be unfortunate to have an 
API which already needs its low level driver interface updating to a rev 2.

Incidentally I note Rutger has a "Samsung" ECC implementation, whereas you 
support Samsung K9 chips, but use the normal ECC algorithm. Did Samsung 
change their practice?

> 4. Feature/implementation differences ------------------------------------
> 
> (I don't consider these to be significant issues; whilst noteworthy, I don't
> think they would take much effort to resolve.)
> 
> (a) Documentation
> 
> The two layers' documentation differ in their depth and layout; these are
> difficult for me to compare objectively, and I would suggest that a fresh
> pair of eyes compare them.

Your documentation does appear very thorough and well-structured (although 
the Samsung and EA LPC2468 docs really should be broken out into their own 
packages). Rutger's does also seem fine though so I don't think there's a 
strong difference either way.

> I can only offer the comment that I documented the E layer bearing in mind
> what I considered to be missing from the R layer documentation: it was not
> clear how the controller and chip layers inter-related, nor where to start
> in creating a driver. (I also had a lot less experience of NAND chips then
> than I do now, and what I need to know now is different from what a newbie
> would.)

It's possible that those layer interrelations were at the level where 
really the code would be the better guide. Although there's always room 
for improvement.

That being said, experience shows that the best "documentation" for driver 
internals (i.e. beneath the application API) is in fact real concrete 
drivers, which brings us to...

> (b) Availability of drivers
> 
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)
> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
> 
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.
> 
> E provides support for:
> - Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
> complete, based on work by Simon K; some enhancements planned)
> - Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
> NANDxxxx3A (small page, x8) (based on work by Simon K)
> - Synthetic target. This offers more features than R's: bad block injection,
> logging, and a GUI interface via the synth I/O auxiliary.
> - Further (customer-confidential) board ports.

I would certainly appreciate feedback from anyone who has used R's layer. 
What you say would seem to imply that both small page and OFNI are 
untested in R's layer.

> (c) RedBoot support
> 
> E have added some commands for NAND operations and tested on the EA LPC2468
> board. (YAFFS support works via the existing RB fileio layer; nothing really
> needed to be done.)

I think that patch needs some work (I can go into detail if you like), but 
it's presence is still a positive thing.

> (d) Degree of testing
> 
> There are presumably differences of coverage here; both E and R assert they
> have carried out stress tests. Properly comparing the depth of the two would
> be a job for fresh eyes.
> 
> E have:
> - a handful of unit and functional tests of the NAND layer, and a benchmarker
> - a number of YAFFS functional tests, one of which includes benchmarking,
> and a further severe YAFFS stress test: these indirectly test the NAND
> layer. (The latter has been run under the synth driver with bad-block
> injection turned on, and has revealed some subtle bugs which we probably
> wouldn't otherwise have caught.)
> - the ability to run continual test cycles in their test farm

Bad block injection sounds like an extremely useful feature. I infer from 
the latter that we're now talking about many hours of testing?

I'd need feedback from Rutger as to what level of testing has been done 
with his.

> 5. Works in progress -----------------------------------------------------
> 
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
> 
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)

Rough ETA? All I'm interested in knowing is whether the device interface 
changes for this are likely to be concluded within the timeframe of this 
discussion.

> * Partition addressing: make addressing relative to the start of the
> partition, once and for all

That's quite a major API change, which seems problematic to me.

> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)

Do you foresee this happening within any particular timeframe? Do you 
expect the changes to be backwardly compatible?

If you got this far, well done! Since you say you'll be away, you may 
prefer to reply to this email in sections rather than sucking up your time 
and doing it all at once.

Thanks in advance.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: NAND technical review
  2009-10-06 13:51 ` Ross Younger
  2009-10-07  3:12   ` Jonathan Larmour
@ 2009-10-07  9:40   ` Jürgen Lambrecht
  2009-10-07 16:27     ` Rutger Hofman
  2009-10-13  2:44     ` Jonathan Larmour
  2009-10-07 12:11   ` Rutger Hofman
  2009-10-08  8:16   ` Jürgen Lambrecht
  3 siblings, 2 replies; 58+ messages in thread
From: Jürgen Lambrecht @ 2009-10-07  9:40 UTC (permalink / raw)
  To: Ross Younger, Rutger Hofman
  Cc: Jonathan Larmour, eCos developers, Deroo Stijn

Ross Younger wrote:
> Jonathan Larmour wrote:
>   
>> I think at first the ball is really in Ross/eCosCentric's court to give
>> the technical rationale for the decision, so I'd like to ask him first
>> to give his rationale and his own perspective of the comparison of the
>> pros/cons.
>>     
>
> Here goes with a comparison between the two in something close to their
> current states (my 26/08 push to bugzilla 1000770, and Rutger's r659).
> For brevity, I will refer to the two layers as "E" (eCosCentric) and "R"
> (Rutger) from time to time.
>
> Note that this is only really a comparison of the two NAND layers. I have
> not attempted to compare the two YAFFS porting layers, though I do mention
> them in a couple of places where it seemed relevant.
>
> BTW: I will be off-net tomorrow and all next week, so please don't think I
> am ignoring the discussion...
>   
<snip>

> (a) Partitions
>
> E's application interface also provides logic implementing partitions.
> That is to say, all access to a NAND array must be via a `partition';
> the NAND layer sanity-checks whether the requested flash page or block
> address is within the given partition. This is quite a lightweight
> layer and hasn't added much overhead of either code footprint or
> execution time.
>
> The presence of partitions in E's model was controversial, as are its
> fine details. Nevertheless, some notion of partitioning turns out to be
> essential on some boards. In some recent work for a customer we identified
> three separate regions of NAND: somewhere to put the boot loader (primary,
> as booted by ROM, and RedBoot), somewhere for the application image itself
> (perhaps FIS-like rather than a full filesystem), and a filesystem for the
> application to use as it pleases.
>
>
> R's interface does not have such a facility. It appears that, in the event
> that the flash is shared between two or more logical regions, it's up to
> higher-level code to be configured with the correct block ranges to use.
>
>
> (b) Dynamic memory allocation
>
> R's layer mandates the provision of malloc and free, or compatible
> functions. These must be provided to the cyg_nand_init() call.
>
> E's doesn't; instead it declares a small number of static buffers.
>
> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major
> issue because the memory needs of that layer are well-bounded; I think I
> broadly agree, though the situation is not ideal in that it forces somebody
> who wants to use a lean, mean eCos configuration to work around.
>
> Also note that if you're going to run a full file system like YAFFS, you
> can't avoid needing malloc, but in an application making simpler use of
> NAND, it's an overhead that you may prefer to avoid.
>
>
> 3. Driver model --------------------------------------------------------
>
> The major architectural difference between the two NAND layers is in their
> driver models and the degree of abstraction enforced.
>
> In Rutger's layer, controllers and chips are both formally abstracted. The
> application talks to the Abstract NAND Chip, which has (hard-coded) the
> basic sequences of commands, addresses and data required to talk to a NAND
> chip. This layer talks to a controller driver, which provides the nuts and
> bolts of reading and writing to the device. The chip driver is also called
> by the ANC layer, and provides the really chip-specific parts.
>
> The call flow looks something like this (best viewed in fixed-width font):
>
> Application --(H)-> ANC --(L)-> Controller driver
>                        \
>                         \-(C)-> Chip driver
>
> H: high-level interface (read page, program page, erase block; chip
> (de)selection)
> L: low-level interface (read/write commands, addresses, data; query the busy
> line)
> C: chip-specific details (chip init, parse ReadID, query factory-bad marker)
>
>
> In eCosCentric's layer, a NAND driver is a single abstraction covering chip
> init and querying the factory-bad status as well as the high level functions
> (reading a page, etc). It is left to the driver to determine the sequence of
> commands to send. How the driver interacts with the device is considered to
> be a contract only between the driver and the relevant platform HAL, so is
> not formally abstracted by the NAND layer.
>
> E's chip drivers are written as .inl files, intended to be included by the
> relevant platform HALs by whichever source file provides the required
> low-level functions. The lack of a formal abstraction is an attempt to
> provide a leaner and meaner experience at runtime: the low-level functions
> can be (and indeed are, so far) provided as static inlines.
>
> The flow looks like this:
>
> Application --(H1)-> NAND layer --(H2)-> NAND driver --(L*)-> Platform HAL
>
> H1: high-level calls (read page, program page, erase block)
> H2: high-level calls (as H1, plus device init and query factory-bad marker)
> L*: low-level calls, like L above but not formally abstracted
>
>
> The two models have pros and cons in both directions.
>
> - As hinted at above, the static inline model of E's low-level access
> functions is expected to turn out to have a lower function call (and,
> generally, code size) overhead than R's.
>
> - R's model shares the command sequence logic amongst all chips,
> differentiating only between small- and large-page devices. (I do not know
> whether this is correct for all current chips, though going forwards seems
> less likely to be an issue as fully-ONFI-compliant chips become the norm.)
> If multiple chips of different types are present in a build, E's model
> potentially duplicates code (though this could be worked around; also, an
> ONFI driver ought to be written).
>
> - A corollary of arguably inconsequential import: R's model forces the synth
> driver to emulate an entire NAND chip and its protocol. E's synth doesn't
> need to.
>
> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC. It is not thought
> that more than a handful such changes will ever be required, and it may be
> possible to maintain backwards compatibility. (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)
>
>
>   
Therefore we prefer R's model.

Is it possible that R's model follows better the "general" structure of 
drivers in eCos?
I mean: (I follow our CVS, could maybe differ from the final commit of 
Rutger to eCos)
1. with the low-level chip-specific code in /devs 
(devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and 
devs/flash/micron/nand)
2. with the "middleware" in /io (io/flash_nand/current/src and there 
/anc, /chip, /controller)
3. with the high-level code in /fs

Is it correct that R's abstraction makes it possible to add partitioning 
easily?
(because that is an interesting feature of E's implementation)

We also prefer R's model of course because we started with R's model and 
use it now.
> It would perhaps be interesting to compare the complexities of drivers for
> the two models, but it's not readily apparent how we would do that fairly.
>
> Perhaps porting a driver from one NAND layer to the other would be a useful
> exercise, and would also allow us to compare code sizes. Any suggestions or
> (he says hopefully) volunteers? I've got a lot on my plate this month...
>   
same for us, no time now - beginning of next year?
>
> 4. Feature/implementation differences ------------------------------------
>
> (I don't consider these to be significant issues; whilst noteworthy, I don't
> think they would take much effort to resolve.)
>
> (a) Documentation
>
> The two layers' documentation differ in their depth and layout; these are
> difficult for me to compare objectively, and I would suggest that a fresh
> pair of eyes compare them.
>
> I can only offer the comment that I documented the E layer bearing in mind
> what I considered to be missing from the R layer documentation: it was not
> clear how the controller and chip layers inter-related, nor where to start
> in creating a driver. (I also had a lot less experience of NAND chips then
> than I do now, and what I need to know now is different from what a newbie
> would.)
>
> (b) Availability of drivers
>
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
>   
- Two: also our "automatic announcement" board to store mp3's with an 
Atmel ARM9 AT91SAM9260 with 16MB of SDRAM.
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)
>   
- Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB 
page size, x8)
Because if this chip, Rutger adapted the hardware ECC controller code, 
because our chip uses more bits (for details, ask Stijn or Rutger).
> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
>
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.
>
> E provides support for:
> - Two boards: Embedded Artists LPC2468 (very well tested); STM3210E (largely
> complete, based on work by Simon K; some enhancements planned)
> - Two chips: Samsung K9 family (large page, only x8 done so far); ST-Micro
> NANDxxxx3A (small page, x8) (based on work by Simon K)
> - Synthetic target. This offers more features than R's: bad block injection,
> logging, and a GUI interface via the synth I/O auxiliary.
> - Further (customer-confidential) board ports.
>
> (c) RedBoot support
>
> E have added some commands for NAND operations and tested on the EA LPC2468
> board. (YAFFS support works via the existing RB fileio layer; nothing really
> needed to be done.)
>
> (d) Degree of testing
>
> There are presumably differences of coverage here; both E and R assert they
> have carried out stress tests. Properly comparing the depth of the two would
> be a job for fresh eyes.
>
> E have:
> - a handful of unit and functional tests of the NAND layer, and a benchmarker
> - a number of YAFFS functional tests, one of which includes benchmarking,
> and a further severe YAFFS stress test: these indirectly test the NAND
> layer. (The latter has been run under the synth driver with bad-block
> injection turned on, and has revealed some subtle bugs which we probably
> wouldn't otherwise have caught.)
> - the ability to run continual test cycles in their test farm
>   
We have it very well tested, amongst others
- an automatic (continual) nand-flash test in a clima chamber
- stress tests: putting it full over and over again via FTP (both with 
af few big and many small files) and check the heap remaining:
  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
  Conclusion: storing smaller files needs more heap, but we still have 
plenty left with our 16MB
  * Write a bundle of files over and over again on the filesystem. We 
put everytime 1000 files of 100.000 bytes filesize on the flash drive.
- used in the final mp3-player application

Kind regards,
Jürgen
>
> 5. Works in progress -----------------------------------------------------
>
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
>
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)
> * Hardware ECC for the STM3210E board driver
> * Performance tuning of software ECC and of NAND low-level drivers
> * Partition addressing: make addressing relative to the start of the
> partition, once and for all
> * Simple raw NAND "filesystem" for use by RedBoot (see
> http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
> are the latest public mails but not the latest version of my thinking, which
> I will update in due course)
> * More RedBoot NAND utility commands
> * Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
> filesystem
> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)
>
> --------------------------------------------------------------------------
>
>
> Ross
>
> --
> Embedded Software Engineer, eCosCentric Limited.
> Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
> Registered in England no. 4422071.                  www.ecoscentric.com
>   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-06 13:51 ` Ross Younger
  2009-10-07  3:12   ` Jonathan Larmour
  2009-10-07  9:40   ` Jürgen Lambrecht
@ 2009-10-07 12:11   ` Rutger Hofman
  2009-10-08 12:31     ` Ross Younger
  2009-10-08  8:16   ` Jürgen Lambrecht
  3 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-07 12:11 UTC (permalink / raw)
  To: Ross Younger; +Cc: Jonathan Larmour, eCos developers

Ross Younger wrote:
[snip]
> Getting data into and out of the chip involves a simple protocol sequence.
> 
> Commands are single bytes; addresses are sequences of a few bytes depending
> on the chip size and the operation invoked.
> 
> For example, to read a page of data on the spec sheet I have to hand is:
> * Write 0x00 into the command latch
> * Write the four address bytes in turn into the address latch
> * Write 0x30 into the command latch
> * Chip signals Busy; wait for it to signal Ready
> * Read out (up to) 2112 bytes of data.

AFAIK, there are two kinds of chips on the market: Large-page chips (2K 
data pages) and Small-page chips (512B pages). These speak a different 
command language, but in their wiring they are the same. The large-page 
chips are (nearly) ONFI-compliant, the Small-page chip command language 
is different. Ancient chips aside, if a chip gives its Device Type Byte, 
NAND flash code can look up in its tables what the chip parameters are 
(page size, block size, number of blocks, 8 or 16 bit data bus, etc). 
Miracle: Device Type Bytes are shared across manufacturers, so the table 
is limited in size.

I saw an annoucement of 4K-page chips, but the datasheets are 
confidential. Is there anybody who can comment on these?

> However, not all chips are quite the same. The ONFI initiative is an attempt
> to standardise chip protocols and most new chips should comply with it. A
> number of chips on the market are _nearly_ ONFI-compliant: deviations
> typically occur over the format of the ReadID response and that of an
> address. I believe that older chips did their own thing entirely.

[snip]

> 3. Driver model --------------------------------------------------------
> 
> The major architectural difference between the two NAND layers is in their
> driver models and the degree of abstraction enforced.
> 
> In Rutger's layer, controllers and chips are both formally abstracted. The
> application talks to the Abstract NAND Chip, which has (hard-coded) the
> basic sequences of commands, addresses and data required to talk to a NAND
> chip. This layer talks to a controller driver, which provides the nuts and
> bolts of reading and writing to the device. The chip driver is also called
> by the ANC layer, and provides the really chip-specific parts.
> 
> The call flow looks something like this (best viewed in fixed-width font):
> 
> Application --(H)-> ANC --(L)-> Controller driver
>                        \
>                         \-(C)-> Chip driver

The code attempts at both flexibility and code reuse. Its structure is 
as follows:

Application --(H)-> ANC --(H2)-> Controller Common --(L)-> Controller 
device-specific --(L)-> Chip

= ANC just wants to hide the presence of multiple controllers and 
multiple chips, in any degree of heterogeneity.

= Controller Common implements the command languages for Large-page 
chips and Small-page chips, does ECC generation/checking/repair. Its API 
is much like the ANC's API: page_read, page_write, block_erase, but on a 
specific controller+chip.

= Controller device-specific is (usually) the only part that must be 
ported for a new controller/board/setup. Its API is in terms of the 
commands described by Ross: push a command on the chip's bus, push/read 
data on the chip's bus etc. The sample GPIO driver that I bundled shows 
how little work can be involved in doing a port. I think that support 
for hardware ECC of some controllers may add more to the device-specific 
code than the command implementation!

= Chip has support for ONFI, Large-page, and Small page. Only for chips 
that don't fit in these categories (and there will be museums that have 
them) require writing a chip driver.

I realize that support for various chip and ECC types increases the 
code. It will be trivial to add a few #ifdef's to disable unneeded code 
for your configuration; the .cdl can specify what is needed (like: only 
large-page 'regular' chips, which means: no small-page, no ONFI 
interrogation).

[snip]

> - E's high-level driver interface makes it harder to add new functions
> later, necessitating a change to that API (H2 above). R's does not; the
> requisite logic would only need to be added to the ANC.

'ANC' should read: Controller Common code.

> ... (As a case in point, support
> for hardware ECC is currently work-in-progress within eCosCentric, and does
> require such a change, but now is not the right time to discuss that.)

Use of the hardware ECC support for R's BlackFin's on-board ECC was 
included in R from the start. The interface between Common Controller 
and device-specific controller code is designed to support this flexibly.

> It would perhaps be interesting to compare the complexities of drivers for
> the two models, but it's not readily apparent how we would do that fairly.
> 
> Perhaps porting a driver from one NAND layer to the other would be a useful
> exercise, and would also allow us to compare code sizes. Any suggestions or
> (he says hopefully) volunteers? I've got a lot on my plate this month...

Yes, this would definitely be interesting. Would there be benefits in 
R's attempts at ease-of-port and code reuse.

> (b) Availability of drivers
> 
> R provides support for:
> - One board: BlackFin EZ-Kit BF548 (which is not in anoncvs?)
> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
> presumably only tested on the x8 chip on the BlackFin board?)

Correction: any 'regular' chip, of which the ST Micro is an example.

I tested on my synth target with x8 and x16 chips, also different ones 
in one 'board'. I tested with various page sizes, also different ones on 
one 'board'.

> - A synthetic controller/chip package
> - A template for a GPIO-based controller (untested, intended as an example only)
> 
> I seem to remember rumours of the existence of a driver for a further
> chip+board combination, but I haven't seen it.

See Jurgen Lambrecht's response.

[snip]

> 5. Works in progress -----------------------------------------------------
> 
> I can of course only comment on eCosCentric's plans, but the following work
> is in the pipeline:
> 
> * Expansion of the device interface to better allow efficient hardware ECC
> support (in progress)
> * Hardware ECC for the STM3210E board driver
> * Performance tuning of software ECC and of NAND low-level drivers
> * Partition addressing: make addressing relative to the start of the
> partition, once and for all
> * Simple raw NAND "filesystem" for use by RedBoot (see
> http://ecos.sourceware.org/ml/ecos-devel/2009-07/msg00004.html et seq; those
> are the latest public mails but not the latest version of my thinking, which
> I will update in due course)
> * More RedBoot NAND utility commands
> * Support for booting Linux off NAND and for sharing a (YAFFS) NAND-resident
> filesystem
> * Part-page read support (would provide a big speed-up to parts of YAFFS2
> inbandTags mode as needed by small-page devices like that on the STM3210E)

R is designed with support for hardware ECC in mind.

R has part-read and part-write support. One thing that has always 
puzzled me is how this interacts with ECC. ECC often works on a complete 
subpage, like 256 bytes on a 2KB page chip; then I understand. But what 
if the read/write is not of such a subpage?

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-07  3:12   ` Jonathan Larmour
@ 2009-10-07 16:22     ` Rutger Hofman
  2009-10-08  7:15       ` Jürgen Lambrecht
  2009-10-15  3:49       ` Jonathan Larmour
       [not found]     ` <4ACDF868.7050706@ecoscentric.com>
  1 sibling, 2 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-07 16:22 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

I should have stated this in my first mail...

I am not at all qualified to say anything about E's work, because I 
didn't have time to do any kind of review of it. So, I will mainly limit 
myself to comments on things that concern R's work, and where I say 
anything on E it will be based on the E's mails on the list.

Jonathan Larmour wrote:
[snip]
> A device number does seem to be a bit limiting, and less deterministic. 
> OTOH, a textual name arguably adds a little extra complexity.

This will be straightforward to change either way.

> I note Rutger's layer needs an explicit init call, whereas yours DTRT using a constructor, which is good.

I followed flash v2 in this. If the experts think a constructor is 
better, that's easy to change too.

> Does your implementation _require_ a BBT in its current implementation? 
> For simpler NAND usage, it may be overkill e.g. an application where the 
> number of rewrites is very small, so the factory bad markers may be 
> considered sufficient.

This is a bit hairy in my opinion, and one reason is that there is no 
Standard Layout for the spare areas. One case where a BBT is forced: my 
BlackFin NFC can be used to boot from NAND, but it enforces a spare 
layout that is incompatible with MTD or anybody. It is even incompatible 
with most chips' specification that the first byte of spare in the first 
page of the block is the Bad Block Marker. BlackFin's boot layout uses 
this first byte in a way that suits it, and it may be 0 -- which would 
otherwise mean Bad Block.

Also, what to do if a block grows bad during usage, and that block 
doesn't allow writing a marker in its spare area? BBT seems a solution.

>> (b) Dynamic memory allocation
>>
>> R's layer mandates the provision of malloc and free, or compatible
>> functions. These must be provided to the cyg_nand_init() call.
> 
> That's unfortunate - that limits its use in smaller boot loaders - a key 
> application.

Well, it is certainly possible to calculate statically how much space 
R's NAND layer is going to use, to allocate that statically, and write a 
tiny function to hand it out piecemeal at the NAND layer's request. 
There is no call to free() here except at shutdown, so nothing 
malloc-like is necessary. (An exception is in the debug handling, see 
below.)

>> E's doesn't; instead it declares a small number of static buffers.
> 
> I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are 
> no other variables. Again I'm thinking of the scenario of single 
> firmware - different board revs. Can you confirm?
> 
>> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a 
>> major
>> issue because the memory needs of that layer are well-bounded; I think I
>> broadly agree, though the situation is not ideal in that it forces 
>> somebody
>> who wants to use a lean, mean eCos configuration to work around.
> 
> The overhead of including something like malloc/free in the image may 
> compare badly with the amount of memory R's needs to allocate in the 
> first place. I also note that if R's implementation has program verifies 
> enabled it allocates and frees a page _every_ time. If nothing else this 
> could lead to heap fragmentation.

Program verifies should be considered a very deep debugging trait. 
Still, another possible implementation for this page buffer would be on 
the stack (not!), or in the controller struct. That would grow then by 
8KB + spare.

[snip]

>> - R's model shares the command sequence logic amongst all chips,
>> differentiating only between small- and large-page devices. (I do not 
>> know
>> whether this is correct for all current chips, though going forwards 
>> seems
>> less likely to be an issue as fully-ONFI-compliant chips become the 
>> norm.)
> 
> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it 
> may be too prescriptive to be robustly future-proof.

Well, there is no way I can see into the future, but I definitely think 
that the wire command model for NAND chips is going to stay -- it is in 
ONFI, after all. Besides, all except the 1 or 2 most pioneering museum 
NAND chips use it too. There are chips that use a different interface, 
like SSD or MMC or OneNand, but then these chips come with on-chip bad 
block management, wear leveling of some kind, and are completely 
different in the way they must be handled. I'd say E's and R's 
implementations are concerned only with 'raw' NAND chips.

> One could say that makes it a more realistic emulation. But yes I can 
> see disadvantages with a somewhat rigid world view. Thinking out loud, I 
> wonder if Rutger's layer could work with something like Samsung OneNAND.

See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says: 
"MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array 
using a NOR Flash interface."

> Incidentally I note Rutger has a "Samsung" ECC implementation, whereas 
> you support Samsung K9 chips, but use the normal ECC algorithm. Did 
> Samsung change their practice?

The ECC algorithm is not something that is related to chips. It is 
either software, or it is in the controller's ECC hardware and may need 
software support. Controller EEC hardware seems to use one of two public 
algorithms that are known as 'Toshiba ECC' and 'Samsung ECC'.

> I would certainly appreciate feedback from anyone who has used R's 
> layer. What you say would seem to imply that both small page and OFNI 
> are untested in R's layer.

That is correct. I would love some small-page testing. I have seen no 
ONFI chips on the market yet, so testing will be future work for both E 
and R.

> I'd need feedback from Rutger as to what level of testing has been done 
> with his.

I ran YAFFS tests, some took more than an hour to complete on my 
BlackFin. But for serious testing, see Jurgen Lambrecht's mail.

Rutger


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-07  9:40   ` Jürgen Lambrecht
@ 2009-10-07 16:27     ` Rutger Hofman
  2009-10-13  2:44     ` Jonathan Larmour
  1 sibling, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-07 16:27 UTC (permalink / raw)
  To: Jürgen Lambrecht
  Cc: Ross Younger, Jonathan Larmour, eCos developers, Deroo Stijn

Jürgen Lambrecht wrote:
> Ross Younger wrote:
>> Jonathan Larmour wrote:
[snip]

> Is it possible that R's model follows better the "general" structure of 
> drivers in eCos?
> I mean: (I follow our CVS, could maybe differ from the final commit of 
> Rutger to eCos)
> 1. with the low-level chip-specific code in /devs 
> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and 
> devs/flash/micron/nand)
> 2. with the "middleware" in /io (io/flash_nand/current/src and there 
> /anc, /chip, /controller)
> 3. with the high-level code in /fs

As far as I know, this has been the case for some releases already.

> Is it correct that R's abstraction makes it possible to add partitioning 
> easily?
> (because that is an interesting feature of E's implementation)

I think it would not be hard to add. It might involve a change in API 
though, which is no problem as long as the number of clients is small, 
and all the more when those clients desire it.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: NAND technical review
  2009-10-07 16:22     ` Rutger Hofman
@ 2009-10-08  7:15       ` Jürgen Lambrecht
  2009-10-15  3:53         ` Jonathan Larmour
  2009-10-15  3:49       ` Jonathan Larmour
  1 sibling, 1 reply; 58+ messages in thread
From: Jürgen Lambrecht @ 2009-10-08  7:15 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Jonathan Larmour, Ross Younger, eCos developers

Rutger Hofman wrote:

<snip>
>>> - R's model shares the command sequence logic amongst all chips,
>>> differentiating only between small- and large-page devices. (I do not
>>> know
>>> whether this is correct for all current chips, though going forwards
>>> seems
>>> less likely to be an issue as fully-ONFI-compliant chips become the
>>> norm.)
>>>       
>> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it
>> may be too prescriptive to be robustly future-proof.
>>     
>
> Well, there is no way I can see into the future, but I definitely think
> that the wire command model for NAND chips is going to stay -- it is in
> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
> NAND chips use it too. There are chips that use a different interface,
> like SSD or MMC or OneNand, but then these chips come with on-chip bad
> block management, wear leveling of some kind, and are completely
> different in the way they must be handled. I'd say E's and R's
> implementations are concerned only with 'raw' NAND chips.
>
>   
Correct, only for raw NAND chips to be soldered on a board. The others 
have an embedded controller and are already packaged.
>> One could say that makes it a more realistic emulation. But yes I can
>> see disadvantages with a somewhat rigid world view. Thinking out loud, I
>> wonder if Rutger's layer could work with something like Samsung OneNAND.
>>     
>
> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
> using a NOR Flash interface."
>
>   
Indeed, a oneNAND is to be threated as a NOR flash, like a pseudoSRAM is 
a DRAM with SRAM interface.
And SSD has a hard disk drive interface, just like MMC and SD card; they 
mostly have a FAT file system on them but also UFS ...

Kind regards,
Jürgen

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Re: NAND technical review
  2009-10-06 13:51 ` Ross Younger
                     ` (2 preceding siblings ...)
  2009-10-07 12:11   ` Rutger Hofman
@ 2009-10-08  8:16   ` Jürgen Lambrecht
  2009-10-12  1:13     ` Jonathan Larmour
  3 siblings, 1 reply; 58+ messages in thread
From: Jürgen Lambrecht @ 2009-10-08  8:16 UTC (permalink / raw)
  To: eCos developers

Just some explanatory remarks below, hardware related.

Ross Younger wrote:

<snip>
> 1. NAND 101 -------------------------------------------------------------
>
> (Those familiar with NAND chips can skip this section, but I appreciate
> that not everybody on-list is in the business of writing NAND device
> drivers :-) )
>
> (i) Conceptual
>   
<snip>
>
> Now, I mentioned ECC data. NAND technology has a number of underlying
> limitations, importantly that it has reliability issues. I don't have a full
> picture - the manufacturers seem to be understandably coy - but my
> understanding is that on each page, a driver ought to be able to cope with a
> single bit having flipped either on programming or on reading. The
>   
Such a "broken bit" is because the transistor that contains the bit is 
physically broken, and is stuck at 1 or at 0 (I don't know if it can be 
both). So you cannot anymore erase it (flip it back to 1) or program it 
(flip to 0).

I thought only programming or erasing could break it, not reading?
Is somebody sure about this?
> recommended way to achieve this is by storing an ECC in the spare area: the
> algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
> bytes of data and able to correct a 1 bit error and detect a 2 bit error.
>
> There is also the question of bad blocks. Again, full details are sketchy. A
> chip may be shipped with a number of "factory-bad" blocks (e.g. up to 20 on
> this Samsung chip); they are marked as such in their spare area. (What
> constitutes a "bad" block is not published; one imagines that the factory
> have access to more test information than users do and that there may be
> statistical techniques involved in judging the likely reliability of the
> block.) Blocks may also fail during the life of the device, usually by the
>   
NAND flash chips are very dense chips (many bits on a small size) and 
there is a trade-off in manufacturing between reliablility and density. 
To make them dense (hence cheap) faults have to be tolerated.
The manufacturer just tries to program all bits a first time to check 
for manufacturing errors. When a broken bit is discovered, the entire 
block is marked bad.
> chip reporting a failure during a program or erase operation. Because of
> this, the manufacturers recommend that chip drivers scan the device for
> factory-bad markers then create and maintain a Bad Block Table throughout
> the life of of the device. How this is done is not prescribed, but the
> behaviour of the Linux MTD layer is something approximating a de facto standard.
>   
<snip>
> (iii) Electrical
>
> Most, if not all, NAND chips have the same broad electrical interface.
>
> There is a master Chip Enable line; nothing happens if this is not active.
>   
(below a hardware designer note :-)
Be carefull on this: a standard chip enable is only active during the 
actual read or write. But an access to a NAND flash is a complete cycle 
during which the NAND flash embedded control logic needs to keep its state!
Therefore, the Chip Enable (or Chip Select) of the NAND flash is (on my 
ARM9 anyhow) connected to a GPIO pin (general-purpose input/output pin). 
Therefore the SW has to assert this pin at the start of an access and 
de-assert it at the end.
The read hardware Chip Select pin is not connected.
(In R's SW in the io/flash_nand/../controller: cyg_nand_ctl_chip_select, 
that calls chip_select implemented in the board-specific driver in 
/devs/flash/[uC brand])
> Data flows into and out of the chip via its data bus, which is 8 or 16 bits
> wide, mediated by Read Enable and Write Enable lines.
>
> Commands and addresses are sent on the data bus, but routed to the
> appropriate latches by asserting the Address Latch Enable or Command Latch
> Enable lines at the same time.
>
> There is also a ready/busy line which the driver can use to tell when an
> operation is in progress. Typical operation times from the Samsung spec
> sheet I have to hand are 25us for a page read, 300us for a page program, and
> 2ms for a block erase.
>
>
> (iv) Board hook-up
>   
<snip>
> Sometimes the ready/busy line isn't wired in or requires a jumper to be set
> to route it. This can be worked around: for a read operation, one can just
> insert a delay loop for the prescribed maximum time, while for programs and
> erases, most (all?) chips have a "Read Status" command which can be used to
> query whether the operation has completed.
>   
We started our driver this way
> It can be beneficial to be able to set up the ready/busy line as an
> interrupt source, as opposed to having to poll it. Whilst there is an
> overhead involved in context-switching, if other application threads have
> much to do it may be advantageous overall for the thread waiting for the
> NAND to sleep until woken by interrupt.
>   
To speed up, now we poll the ready/busy. To use it as interrupt is still 
todo.
> Of course, it is possible to put multiple chips on a board. In that case
> there needs to be a way to route between them; I would expect this to be
> done with the Chip Select line, addressed either by different MMIO addresses
> or a separate GPIO or CPLD step. Theoretically, multiple chips could be
> hooked up in parallel to give something that looks like a 16 or 32-bit
> "wide" chip, but I have never encountered this in the NAND world, and it
> would impose a certain extra level of complexity on the driver.
>   
Indeed, this would be difficult: a NAND is not a simple memory mapped 
device as a NOR flash or SRAM, easy to put in parallel.
Only because of bad block management, putting them in parallel is 
difficult: they cannot be put parallel in hardware, they need to be 
addresses separately. Then they must be made parallel virtually in software.

Regards,
Jürgen


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-07 12:11   ` Rutger Hofman
@ 2009-10-08 12:31     ` Ross Younger
  0 siblings, 0 replies; 58+ messages in thread
From: Ross Younger @ 2009-10-08 12:31 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Jonathan Larmour, eCos developers

Rutger Hofman wrote:
> R has part-read and part-write support. One thing that has always
> puzzled me is how this interacts with ECC. ECC often works on a complete
> subpage, like 256 bytes on a 2KB page chip; then I understand. But what
> if the read/write is not of such a subpage?

This is a very good question - I revisited it the other day when working on
hardware ECC support for the customer port I'm working on - and I don't have
a particularly good answer for it.

If the read is less than an ECC stride[*], one could perhaps fill in the ECC
calculation by reading the rest of that stride's worth anyway and not
passing it to the caller. Similarly, a write that is less than a stride
could be "filled in" with 0xFF for the purposes of computing its ECC. How
this would be achieved efficiently is an exercise for the reader as a bit of
refactoring is likely to be involved...

[*] I'm using "stride" here to mean the amount of data that an ECC
calculation operates over. The Samsung algorithm which computes 22 bits of
ECC over 256 bytes of data is common, not least of which because that's the
one used by the Linux MTD layer.

I did wonder about not supporting less-than-page reads and writes at all,
but my code currently tries its best on the grounds of being liberal in what
it accepts.

In passing, I note that some large page devices allow the data and spare
areas to be written in subpages (e.g. this Samsung K9 chip to hand - 2048
main + 64 spare per page - allows writes in units of 512 main and 16 spare);
there might be a use to be found here in allowing an application to treat a
large page device as if it were a small-page device.


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
       [not found]     ` <4ACDF868.7050706@ecoscentric.com>
@ 2009-10-09  8:27       ` Ross Younger
  2009-10-13  2:21         ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Ross Younger @ 2009-10-09  8:27 UTC (permalink / raw)
  To: ecos-devel

(resend, having fallen foul of sourceware's spamtrap)

Jonathan Larmour wrote:
 > > Good ONFI support should be the highest priority as that's the way
 > > everything is likely to go, although we do need the others too.

Agreed. As the Samsung K9 is nearly ONFI already, adapting my driver is
likely to be very quick; all other things being equal, I would just do this
as and when there was a demand (and suitable hardware on my desk).


 > > Personally I would expect use as an interrupt line as the main role of
 > > the ready line.

IMLE the overhead of sleeping and context switching is quite significant. In
the drivers I've written to date, where there is a possiblity to use the
ready line as an interrupt source I have provided this as an option in CDL.


 >> >> Theoretically, multiple chips could be
 >> >> hooked up in parallel to give something that looks like a 16 or 32-bit
 >> >> "wide" chip, but I have never encountered this in the NAND world [...]
 > >
 > > Have you found on-chip (SoC's) NAND controllers permit such a
 > > configuration? If not, I would assume that it's not an expected hardware
 > > configuration.

Not on the small number of controllers I have looked at in detail.


 > > What problems would you see, if any, using your layer with the same
 > > controller and two completely different chips, of different geometry?
 > > Can you still have a common codebase with other (different) platforms?

I don't see any issue: controllers don't IME care about the chip geometry,
they just take care of the electrical side, and some calculate ECC in
passing. For that matter I don't see an issue with a single controller on
one board driving two chips of different geometries at once.


 > > Is anyone aware of NAND chips with different sized blocks? Analogous to
 > > bootblocks with NOR (I haven't, but others will undoubtedly have seen
 > > more parts than I). Although it's possible that even if they're not
 > > around or common now, they may be in future.

I don't think there's a way to express such a chip in the ONFI chip
interrogation logic, and such a chip would I think comprehensively break the
Linux MTD layer into the bargain.

 > > Unfortunately from what I
 > > can tell neither layer would be able to support that directly, although
 > > I think it may be possible for the eCosCentric layer to allow the driver
 > > to pretend there is a different NAND chip. Do you think so too?

Two chip drivers exposing different geometries but with essentially the same
underlying access functions would probably do the trick. There would have to
be careful address translation or partitioning between the two, and a single
mutex protecting both devices in the chip driver layer, but I think it'd be
a goer.


 >> >> 2. Application interface -----------------------------------------------

 >> >> The basic operations required are reading a page, programming a page and
 >> >> erasing a block, and both layers provide these.
 > >
 > > However I believe Rutger's supports partial page writes (use of
 > > 'column'), whereas I don't believe eCosCentric's does.

As covered in the other subthread, is this actually useful, and how to sort
out the ECC?


 >> >> Rutger's layer has an extra hook in
 >> >> place where an application may explicitly request the use of cached reading
 >> >> and writing where the device supports this.
 > >
 > > That seems like a useful potential optimisation, exploiting underlying
 > > capabilities. Any reason you didn't implement this?
 > >
 > > I could also believe that NAND controllers can also optimise by doing
 > > multiple block reads, where this hint would also prove useful.

Not particularly. Looking at cache-assisted read and program operations for
multi-page operations is sitting on my TODO list, languishing :-). I would
note in passing that YAFFS doesn't make use of these, preferring only to
read and write single pages fully synchronously; this might be a worthwhile
  enhancement in dealing with larger files, though YAFFS's own internal NAND
interface is strictly page-oriented at the moment and so this would require
a bit of brain surgery - something best done in conjunction with Charles
Manning, I think.


 > > Does your implementation _require_ a BBT in its current implementation?
 > > For simpler NAND usage, it may be overkill e.g. an application where the
 > > number of rewrites is very small, so the factory bad markers may be
 > > considered sufficient.

I suppose it would be possible to provide a CDL option to switch the
persistent BBT off if you really wanted to. Caution is required, though:
after you have ever written to the chip, it can be impossible to distinguish
a genuine factory-bad marker from application data in the OOB area that
happens to resemble it. This can be worked around with very careful
management of what the application puts into the OOB or by tweaking the OOB
layout to simply avoid ever writing to the relevant byte(s).


 >> >> (a) Partitions
 > > [snip]
 >> >> R's interface does not have such a facility. It appears that, in the
 >> >> event
 >> >> that the flash is shared between two or more logical regions, it's up to
 >> >> higher-level code to be configured with the correct block ranges to use.
 > >
 > > In yours, the block ranges must be configured in CDL. Is there much
 > > difference? I can see an advantage in writing platform-independent test
 > > programs. But in applications within products possibly less so.

I provide CDL for manual config, but have included a partition layout
initialisation hook. If there was an on-chip partition table, all that's
needed would be some code to go into that hook to interrogate it and
translate to my layer's in-memory layout. This is admittedly not well
documented, but hinted at by "Planning a port"
(http://www.ecoscentric.com/ecospro/doc.cgi/html/ecospro-ref/nand-devs-writing.html)
and should be readily apparent on examining code for existing chip drivers.

 > > Especially since the flash geometry, including size, can be
 > > programmatically queried.

Flash geometry can only be programmatically queried up to a point in
non-ONFI chips. Look at the k9_devinit function in k9fxx08x08.inl: while the
ReadID response of Samsung chips encodes the page, block and spare area
sizes, it doesn't tell you about the chip block count or overall size - you
have to know based on the device identifier byte. Linux, for example, has a
big table of these in drivers/mtd/nand/nand_ids.c.

 > > If there was to be a single firmware supporting multiple board
 > > revisions/configurations (as can definitely happen), which could include
 > > different sizes of NAND, I think R's implementation would be able to
 > > adapt better than E's, as the high-level program can divide up the sizes
 > > based on what it sees.

I see no reason why E's wouldn't adapt just as well, given suitably written
driver(s) and init hooks.


 >> >> (b) Dynamic memory allocation
 >> >>
 >> >> R's layer mandates the provision of malloc and free, or compatible
 >> >> functions. These must be provided to the cyg_nand_init() call.
 > >
 > > That's unfortunate - that limits its use in smaller boot loaders - a key
 > > application.
 > >

 >> >> E's doesn't; instead it declares a small number of static buffers.
 > >
 > > I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are
 > > no other variables. Again I'm thinking of the scenario of single
 > > firmware - different board revs. Can you confirm?

Chip drivers are expected to require in CDL that CYGNUM_NAND_PAGEBUFFER be
large enough, and to set up a static byte array for their Bad Block Table.
Efficiently supporting two differently-sized chips on a single board - I
mean only allocating enough static space for the largest known BBT - would
not be difficult.


 > > OTOH your implementation doesn't supports program verifies in the higher
 > > level anyway (I note your code comment about it being unnecessary as the
 > > device should report a successful program - your faith in correct
 > > hardware behaviour is considerable :-) ).

Verifying after programming is also on my todo list :-)



 >> >> If multiple chips of different types are present in a build, E's model
 >> >> potentially duplicates code (though this could be worked around; also, an
 >> >> ONFI driver ought to be written).
 > >
 > > Worked around in a way likely to increase single-device footprint
 > > though. Shame about the lack of OFNI driver, although I guess the parts
 > > still aren't widely used which can't help. The Samsung K9 is close at
 > > least.

As I said, when one lands on my desk I'll gladly get writing :-)

 > > In fact, because of the requirement for the
 > > drivers to call CYG_NAND_FUNS, it doesn't seem difficult at all to be
 > > backwardly compatible. Am I right? Nevertheless, it would be unfortunate
 > > to have an API which already needs its low level driver interface
 > > updating to a rev 2.

Adding hardware ECC support and making the driver interface
backwards-compatible turned out to break layering, so I chose to change the
interface.

It's a relatively straightforward change in that I have broken up page read
and program operations into three: initialise, to read/write a stride of
data (length chosen by the NAND layer to mesh with whatever ECC length is
provided by the controller), and finalise. The flow inside my NAND layer for
programming a page becomes:

* Call chip driver to initialise the write (we expect it to send the command
and address)
* For each ECC-sized stride of data:
** If hardware ECC, call the ECC driver to tell it we're about to start a stride
** Call chip driver to write a stride of data
** If hardware ECC, call the ECC driver to get the ECC for the stride now
completed and stash it away

* If software ECC, compute it for the page
* Finalise the spare layout using the ECC wherever it came from
* Call chip driver to finalise the write, passing the final spare layout (we
expect it to write the spare area and send the program-confirm command).


I am not yet finished this work, but will update all my existing drivers
when it is done. In a way, the drawn-out nature of this process has provided
extra time for my state of the art to evolve ;-)


 > > Incidentally I note Rutger has a "Samsung" ECC implementation, whereas
 > > you support Samsung K9 chips, but use the normal ECC algorithm. Did
 > > Samsung change their practice?

The "Samsung" ECC implementation has nothing to do with the underlying chip;
it's just an algorithm whose details they published, I think in conjunction
with some of the higher-level NAND-based products they ship which feature an
FTL (USB sticks, SD cards, etc). There is in general no requirement to use
any particular ECC algorithm with any particular chip; all the spec sheets
tend to say is "use ECC".

If I have understood the code correctly, Rutger provides two ECC algorithms:

* nand_ecc.c implements the "standard" Linux MTD algorithm (indeed the code
is lifted, with acknowledgement). This is an algorithm created by Toshiba,
with a 256 byte data block and 22 bit ECC and the layer uses it by default
where no other algorithm is provided.

* io_nand_ecc_samsung.c provides a Samsung algorithm of the same parameters,
  used by the BlackFin board driver.

My layer only provides the Linux MTD algorithm at the moment (also by
lifting the code with acknowledgement).

In passing I note that the 22 bits for 256 bytes algorithm is a bit wasteful
of space as it's relative simple to add an extra pair of row-parity bits and
have 24 bits of ECC for 512 bytes of data. If you were happy that the chip
wouldn't suffer too many single-bit dropouts at once, and you decided you
didn't want to worry about subpage support you could go for 26/1024 or
28/2048. Would you believe it, writing 24/512 (and perhaps 26/1024 and
28/2048) algorithms is also on my todo list ...


 > > Your documentation does appear very thorough and well-structured
 > > (although the Samsung and EA LPC2468 docs really should be broken out
 > > into their own packages). Rutger's does also seem fine though so I don't
 > > think there's a strong difference either way.

The Samsung K9 is in its own (single-chapter) docs package as of a few weeks
ago, and the board-specific bits for the EA LPC2468 have been moved into
that HAL.


 > > [synth target]
 > > Bad block injection sounds like an extremely useful feature. I infer
 > > from the latter that we're now talking about many hours of testing?

We are. We have run our YAFFS severe stress testing with bad block injection
for over a week at a time.


 >> >> * Expansion of the device interface to better allow efficient hardware
 >> >> ECC support (in progress)
 > >
 > > Rough ETA? All I'm interested in knowing is whether the device interface
 > > changes for this are likely to be concluded within the timeframe of this
 > > discussion.

It's part and parcel of the customer port that I'm currently working on, so
"real soon now" - top of my priority list apart from this discussion ;-)
With a following wind I would hope to be able to finish it up, synch my
changes with the anoncvs side and push out maybe a week or so after I'm back
from holiday.


 >> >> * Partition addressing: make addressing relative to the start of the
 >> >> partition, once and for all
 > >
 > > That's quite a major API change, which seems problematic to me.

This is why it has to be worked out sooner rather than later, and is
currently very close to the top of my todo list ;-). Bart in particular has
been encouraging me to make this change for a while.


 >> >> * Part-page read support (would provide a big speed-up to parts of YAFFS2
 >> >> inbandTags mode as needed by small-page devices like that on the
 >> >> STM3210E)
 > >
 > > Do you foresee this happening within any particular timeframe? Do you
 > > expect the changes to be backwardly compatible?

No timescale as yet as it's relatively far down my todo list. I think
support would require an addition to the device interface to support reading
from a column address, not a break - so existing drivers would continue
working. But I need to think about this a bit more when I get there, as it
may require work on the YAFFS side, and it tickles the sleeping dragon that
is support for ECC on part-pages.


Cheers,


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                 www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-08  8:16   ` Jürgen Lambrecht
@ 2009-10-12  1:13     ` Jonathan Larmour
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-12  1:13 UTC (permalink / raw)
  To: Jürgen Lambrecht; +Cc: eCos developers

Jürgen Lambrecht wrote:
> Ross Younger wrote:
>> Now, I mentioned ECC data. NAND technology has a number of underlying
>> limitations, importantly that it has reliability issues. I don't have 
>> a full
>> picture - the manufacturers seem to be understandably coy - but my
>> understanding is that on each page, a driver ought to be able to cope 
>> with a
>> single bit having flipped either on programming or on reading. The
>>   
> 
> Such a "broken bit" is because the transistor that contains the bit is 
> physically broken, and is stuck at 1 or at 0 (I don't know if it can be 
> both). So you cannot anymore erase it (flip it back to 1) or program it 
> (flip to 0).
> 
> I thought only programming or erasing could break it, not reading?
> Is somebody sure about this?

I've had experience of dodgy flash that spontaneously started getting bit 
errors either over time or on reads - couldn't tell which. Really it was 
NOR, rather than NAND, but that should be /more/ reliable! I think it's 
probably best to assume that if it's hardware, it can go wrong :-).

[ NB I'll be replying to other mails in this thread tomorrow, but it's a 
bit late here at the moment for me to start ]

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-09  8:27       ` Ross Younger
@ 2009-10-13  2:21         ` Jonathan Larmour
  2009-10-13 13:35           ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-13  2:21 UTC (permalink / raw)
  To: Ross Younger; +Cc: ecos-devel

[ Lots of snippage throughout - assume "ack" or comprehension]
Ross Younger wrote:
> Jonathan Larmour wrote:
>  > > Personally I would expect use as an interrupt line as the main role of
>  > > the ready line.
> 
> IMLE the overhead of sleeping and context switching is quite 
> significant. In
> the drivers I've written to date, where there is a possiblity to use the
> ready line as an interrupt source I have provided this as an option in CDL.

For reads polling is good sure, for programs interrupts are probably 
better, for erases interrupts will almost certainly be better. I note that 
that's what you arrange for interrupt mode on the EA2468 port example, 
which is good.

But I digress, as this isn't something specific to your implementation.

>  > > What problems would you see, if any, using your layer with the same
>  > > controller and two completely different chips, of different geometry?
>  > > Can you still have a common codebase with other (different) platforms?
> 
> I don't see any issue: controllers don't IME care about the chip geometry,
> they just take care of the electrical side, and some calculate ECC in
> passing. For that matter I don't see an issue with a single controller on
> one board driving two chips of different geometries at once.

Hmm, I guess the key thing here is that in E's implementation most of the 
complexity has been pushed into the lower layers; at least compared to 
R's. R's has a more consistent interface through the layers. Albeit at the 
expense of some rigidity and noticeable function overhead.

It's not likely E's will be able to easily share controller code, given of 
course you don't know what chips, and so what chip driver APIs they'll be 
connected to. But OTOH, maybe this isn't a big deal since a lot of the 
controller-specific munging is likely to be platform-specific anyway due 
to characteristics of the attached NAND (e.g. timings etc.) and the only 
bits that would be sensibly shared would potentially happen in the 
processor HAL anyway at startup time. What's left may not be that much and 
isn't a problem in the platform HAL. However the likely exception to that 
is hardware-assisted ECC. A semi-formal API for that would be desirable.

>  >> >> 2. Application interface 
> -----------------------------------------------
> 
>  >> >> The basic operations required are reading a page, programming a 
> page and
>  >> >> erasing a block, and both layers provide these.
>  > >
>  > > However I believe Rutger's supports partial page writes (use of
>  > > 'column'), whereas I don't believe eCosCentric's does.
> 
> As covered in the other subthread, is this actually useful, and how to sort
> out the ECC?

Read back the whole page (which is a drop in the ocean compared to the 
time to do a full page program of course). memcmp the partially written 
section for validity, then regenerate the ECC. Unless the partial write 
was most of the page anyway (and a heuristic could deal with that), you 
should still end up ahead.

Alternatively, some people may not want or need ECC. Higher layers may be 
able to deal or have their own checking. Or the write patterns could be 
sufficiently infrequent that it's not an issue worth solving (e.g. 
firmware upgrades). In some cases you may not use ECC in one part managed 
by e.g. a simple boot loader which you want to keep small; and then in a 
different region on the same NAND there's a filesystem which does exploit 
ECCs.

>  >> >> Rutger's layer has an extra hook in
>  >> >> place where an application may explicitly request the use of 
> cached reading
>  >> >> and writing where the device supports this.
>  > >
>  > > That seems like a useful potential optimisation, exploiting underlying
>  > > capabilities. Any reason you didn't implement this?
>  > >
>  > > I could also believe that NAND controllers can also optimise by doing
>  > > multiple block reads, where this hint would also prove useful.
> 
> Not particularly. Looking at cache-assisted read and program operations for
> multi-page operations is sitting on my TODO list, languishing :-). I would
> note in passing that YAFFS doesn't make use of these, preferring only to
> read and write single pages fully synchronously; this might be a worthwhile
>  enhancement in dealing with larger files, though YAFFS's own internal NAND
> interface is strictly page-oriented at the moment and so this would require
> a bit of brain surgery - something best done in conjunction with Charles
> Manning, I think.

Looking to the future and things like
http://osdir.com/ml/linux.file-systems.yaffs/2008-09/msg00010.html this 
may well change in future.

Plus contiguous reads are more likely to be useful in other NAND using 
applications than a general-purpose FS. Contiguous writes admittedly would 
be less useful to exploit, but if you can have the facility for reads you 
may as well have the writes.

>  > > Does your implementation _require_ a BBT in its current 
> implementation?
>  > > For simpler NAND usage, it may be overkill e.g. an application 
> where the
>  > > number of rewrites is very small, so the factory bad markers may be
>  > > considered sufficient.
> 
> I suppose it would be possible to provide a CDL option to switch the
> persistent BBT off if you really wanted to. Caution is required, though:
> after you have ever written to the chip, it can be impossible to 
> distinguish
> a genuine factory-bad marker from application data in the OOB area that
> happens to resemble it. This can be worked around with very careful
> management of what the application puts into the OOB or by tweaking the OOB
> layout to simply avoid ever writing to the relevant byte(s).

Oh I'm sure that most people will use a BBT if they can, but for simple 
booting applications it may be overkill and the management has a penalty. 
Factory markers and use of the OOB in appropriate ways can avoid the need 
for a BBT for simple applications e.g. by relying only on ECCs, or its own 
"this verified ok" marker in the OOB area.

>  >> >> (a) Partitions
>  > > [snip]
>  >> >> R's interface does not have such a facility. It appears that, in the
>  >> >> event
>  >> >> that the flash is shared between two or more logical regions, 
> it's up to
>  >> >> higher-level code to be configured with the correct block ranges 
> to use.
>  > >
>  > > In yours, the block ranges must be configured in CDL. Is there much
>  > > difference? I can see an advantage in writing platform-independent 
> test
>  > > programs. But in applications within products possibly less so.
> 
> I provide CDL for manual config, but have included a partition layout
> initialisation hook. If there was an on-chip partition table, all that's
> needed would be some code to go into that hook to interrogate it and
> translate to my layer's in-memory layout. This is admittedly not well
> documented, but hinted at by "Planning a port"
> (http://www.ecoscentric.com/ecospro/doc.cgi/html/ecospro-ref/nand-devs-writing.html) 
> 
> and should be readily apparent on examining code for existing chip drivers.

Ok, that sounds like quite a good thing. It also sounds harder for R's to 
play nicely with Linux.

>  > > Especially since the flash geometry, including size, can be
>  > > programmatically queried.
> 
> Flash geometry can only be programmatically queried up to a point in
> non-ONFI chips. Look at the k9_devinit function in k9fxx08x08.inl: while 
> the
> ReadID response of Samsung chips encodes the page, block and spare area
> sizes, it doesn't tell you about the chip block count or overall size - you
> have to know based on the device identifier byte. Linux, for example, has a
> big table of these in drivers/mtd/nand/nand_ids.c.

Ahh, ok.

>  > > If there was to be a single firmware supporting multiple board
>  > > revisions/configurations (as can definitely happen), which could 
> include
>  > > different sizes of NAND, I think R's implementation would be able to
>  > > adapt better than E's, as the high-level program can divide up the 
> sizes
>  > > based on what it sees.
> 
> I see no reason why E's wouldn't adapt just as well, given suitably written
> driver(s) and init hooks.

Ok. I also see both your chip drivers possess these hooks - which is good 
as people will tend to use existing drivers as templates rather than write 
their own from scratch.

>  > > In fact, because of the requirement for the
>  > > drivers to call CYG_NAND_FUNS, it doesn't seem difficult at all to be
>  > > backwardly compatible. Am I right? Nevertheless, it would be 
> unfortunate
>  > > to have an API which already needs its low level driver interface
>  > > updating to a rev 2.
> 
> Adding hardware ECC support and making the driver interface
> backwards-compatible turned out to break layering, so I chose to change the
> interface.
>
> It's a relatively straightforward change in that I have broken up page read
> and program operations into three: initialise, to read/write a stride of
> data (length chosen by the NAND layer to mesh with whatever ECC length is
> provided by the controller), and finalise. The flow inside my NAND layer 
> for
> programming a page becomes:
> 
> * Call chip driver to initialise the write (we expect it to send the 
> command
> and address)
> * For each ECC-sized stride of data:
> ** If hardware ECC, call the ECC driver to tell it we're about to start 
> a stride
> ** Call chip driver to write a stride of data
> ** If hardware ECC, call the ECC driver to get the ECC for the stride now
> completed and stash it away
> 
> * If software ECC, compute it for the page
> * Finalise the spare layout using the ECC wherever it came from
> * Call chip driver to finalise the write, passing the final spare layout 
> (we
> expect it to write the spare area and send the program-confirm command).

NB Some hardware ECC's will only compute for the whole page, e.g. AT91SAM9's.

> I am not yet finished this work, but will update all my existing drivers
> when it is done. In a way, the drawn-out nature of this process has 
> provided
> extra time for my state of the art to evolve ;-)

Well that's fair enough. I think it's fair to make allowances for work 
that's actually under active development (rather than vapourware or just 
promises). Especially since you say further down your mail that it is 
likely to be done in the next couple of weeks. (I'm not asking you for a 
concrete commitment - as with anything involving volunteer effort).

>  > > Incidentally I note Rutger has a "Samsung" ECC implementation, whereas
>  > > you support Samsung K9 chips, but use the normal ECC algorithm. Did
>  > > Samsung change their practice?
> 
> The "Samsung" ECC implementation has nothing to do with the underlying 
> chip;
> it's just an algorithm whose details they published,

Indeed, but I sort of expected them to be using it in that context :).

> I think in conjunction
> with some of the higher-level NAND-based products they ship which 
> feature an
> FTL (USB sticks, SD cards, etc). There is in general no requirement to use
> any particular ECC algorithm with any particular chip; all the spec sheets
> tend to say is "use ECC".

Sure. But I was anticipating it may be industry practice, e.g. if 
Linux-MTD does the same. Maybe due to...

> * io_nand_ecc_samsung.c provides a Samsung algorithm of the same 
> parameters,
>  used by the BlackFin board driver.

...it is indeed industry practice, but perhaps only rarely.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-07  9:40   ` Jürgen Lambrecht
  2009-10-07 16:27     ` Rutger Hofman
@ 2009-10-13  2:44     ` Jonathan Larmour
  2009-10-13  6:35       ` Jürgen Lambrecht
                         ` (2 more replies)
  1 sibling, 3 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-13  2:44 UTC (permalink / raw)
  To: Jürgen Lambrecht
  Cc: Ross Younger, Rutger Hofman, eCos developers, Deroo Stijn

Jürgen Lambrecht wrote:
> Ross Younger wrote:
>> - E's high-level driver interface makes it harder to add new functions
>> later, necessitating a change to that API (H2 above). R's does not; the
>> requisite logic would only need to be added to the ANC. It is not thought
>> that more than a handful such changes will ever be required, and it 
>> may be
>> possible to maintain backwards compatibility. (As a case in point, 
>> support
>> for hardware ECC is currently work-in-progress within eCosCentric, and 
>> does
>> require such a change, but now is not the right time to discuss that.)
> 
> Therefore we prefer R's model.
> 
> Is it possible that R's model follows better the "general" structure of 
> drivers in eCos?
> I mean: (I follow our CVS, could maybe differ from the final commit of 
> Rutger to eCos)
> 1. with the low-level chip-specific code in /devs 
> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and 
> devs/flash/micron/nand)
> 2. with the "middleware" in /io (io/flash_nand/current/src and there 
> /anc, /chip, /controller)
> 3. with the high-level code in /fs

I don't see E's model as being much different in that perspective. There 
is stuff in devs/flash, io/nand and (presumably) fs as well.

The difference is more the separation out of the controller functionality 
into a different layer.

> Is it correct that R's abstraction makes it possible to add partitioning 
> easily?
> (because that is an interesting feature of E's implementation)

As Rutger said, it could be done - there's nothing in his design which 
presents it. It's not there now though, so unless someone's working on it 
it's probably not something to consider in the decision process. 
Especially since it would be a big user API change.

> We also prefer R's model of course because we started with R's model and 
> use it now.

You haven't done any profiling by any luck have you? Or code size 
analysis? Although I haven't got into the detail of R's version yet (since 
I was starting with dissecting E's), both the footprint and the cumulative 
function call and indirection time overhead are concerns of mine.

>> (b) Availability of drivers
[snip]
>> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
>> presumably only tested on the x8 chip on the BlackFin board?)
>>   
> 
> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB 
> page size, x8)
> Because if this chip, Rutger adapted the hardware ECC controller code, 
> because our chip uses more bits (for details, ask Stijn or Rutger).

I'd be interested in what the issue was. From admittedly a quick look I 
can't find anything about this in the code.

>> (d) Degree of testing
[snip]
> We have it very well tested, amongst others
> - an automatic (continual) nand-flash test in a clima chamber
> - stress tests: putting it full over and over again via FTP (both with 
> af few big and many small files) and check the heap remaining:
>  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
>  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
>  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
>  Conclusion: storing smaller files needs more heap, but we still have 
> plenty left with our 16MB
>  * Write a bundle of files over and over again on the filesystem. We put 
> everytime 1000 files of 100.000 bytes filesize on the flash drive.
> - used in the final mp3-player application

That's extremely useful to know, thanks! But a couple of further questions 
on this: Did any bad blocks show up at any point? Were you using a bad 
block table? Presumably there were factory-marked bad blocks on some?

Thanks,

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13  2:44     ` Jonathan Larmour
@ 2009-10-13  6:35       ` Jürgen Lambrecht
  2009-10-15  3:55         ` Jonathan Larmour
  2009-10-13 12:59       ` Rutger Hofman
  2009-10-13 14:19       ` Rutger Hofman
  2 siblings, 1 reply; 58+ messages in thread
From: Jürgen Lambrecht @ 2009-10-13  6:35 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Ross Younger, Rutger Hofman, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
> Jürgen Lambrecht wrote:
>   
>> Ross Younger wrote:
>>     
>>> - E's high-level driver interface makes it harder to add new functions
>>> later, necessitating a change to that API (H2 above). R's does not; the
>>> requisite logic would only need to be added to the ANC. It is not thought
>>> that more than a handful such changes will ever be required, and it
>>> may be
>>> possible to maintain backwards compatibility. (As a case in point,
>>> support
>>> for hardware ECC is currently work-in-progress within eCosCentric, and
>>> does
>>> require such a change, but now is not the right time to discuss that.)
>>>       
>> Therefore we prefer R's model.
>>
>> Is it possible that R's model follows better the "general" structure of
>> drivers in eCos?
>> I mean: (I follow our CVS, could maybe differ from the final commit of
>> Rutger to eCos)
>> 1. with the low-level chip-specific code in /devs
>> (devs/flash/arm/at91/[board] and devs/flash/arm/at91/nfc, and
>> devs/flash/micron/nand)
>> 2. with the "middleware" in /io (io/flash_nand/current/src and there
>> /anc, /chip, /controller)
>> 3. with the high-level code in /fs
>>     
>
> I don't see E's model as being much different in that perspective. There
> is stuff in devs/flash, io/nand and (presumably) fs as well.
>
> The difference is more the separation out of the controller functionality
> into a different layer.
>
>   
>> Is it correct that R's abstraction makes it possible to add partitioning
>> easily?
>> (because that is an interesting feature of E's implementation)
>>     
>
> As Rutger said, it could be done - there's nothing in his design which
> presents it. It's not there now though, so unless someone's working on it
> it's probably not something to consider in the decision process.
> Especially since it would be a big user API change.
>
>   
>> We also prefer R's model of course because we started with R's model and
>> use it now.
>>     
>
> You haven't done any profiling by any luck have you? Or code size
> analysis? Although I haven't got into the detail of R's version yet (since
> I was starting with dissecting E's), both the footprint and the cumulative
> function call and indirection time overhead are concerns of mine.
>
>   
No...
>>> (b) Availability of drivers
>>>       
> [snip]
>   
>>> - One chip: the ST Micro 0xG chip (large page, x8 and x16 present but
>>> presumably only tested on the x8 chip on the BlackFin board?)
>>>
>>>       
>> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
>> page size, x8)
>> Because if this chip, Rutger adapted the hardware ECC controller code,
>> because our chip uses more bits (for details, ask Stijn or Rutger).
>>     
>
> I'd be interested in what the issue was. From admittedly a quick look I
> can't find anything about this in the code.
>   
Maybe Rutger can better answer this. Else Stijn can look-up his mail on 
this issue.
>   
>>> (d) Degree of testing
>>>       
> [snip]
>   
>> We have it very well tested, amongst others
>> - an automatic (continual) nand-flash test in a clima chamber
>> - stress tests: putting it full over and over again via FTP (both with
>> af few big and many small files) and check the heap remaining:
>>  * Put 25 files with a filesize of 10.000.000 bytes on the filesystem
>>  * Put 2500 files with a filesize of 100.000 bytes on the filesystem
>>  * Put 7000 files with a filesize of 10.000 bytes on the filesystem
>>  Conclusion: storing smaller files needs more heap, but we still have
>> plenty left with our 16MB
>>  * Write a bundle of files over and over again on the filesystem. We put
>> everytime 1000 files of 100.000 bytes filesize on the flash drive.
>> - used in the final mp3-player application
>>     
>
> That's extremely useful to know, thanks! But a couple of further questions
> on this: (1) Did any bad blocks show up at any point? (2) Were you using a bad
> block table? (3) Presumably there were factory-marked bad blocks on some?
>   
(3) Yes, there are almost always factory-marked bad blocks.
(2) yes
(1)Yes, certainly! We have from time to time bad blocks, and they are 
handled correctly.

Kind regards,
Jürgen
> Thanks,
>
> Jifl
> --
> --["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
>   
totally agree ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13  2:44     ` Jonathan Larmour
  2009-10-13  6:35       ` Jürgen Lambrecht
@ 2009-10-13 12:59       ` Rutger Hofman
  2009-10-15  4:41         ` Jonathan Larmour
  2009-10-13 14:19       ` Rutger Hofman
  2 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-13 12:59 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Jürgen Lambrecht, Ross Younger, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
[snip]
>> We also prefer R's model of course because we started with R's model 
>> and use it now.
> 
> You haven't done any profiling by any luck have you? Or code size 
> analysis? Although I haven't got into the detail of R's version yet 
> (since I was starting with dissecting E's), both the footprint and the 
> cumulative function call and indirection time overhead are concerns of 
> mine.

In a first step in mitigating the 'footprint pressure', I have added CDL 
options to configure in/out support for the various chips types, to wit: 
- ONFI chips;
- 'regular' large-page chips;
- 'regular' small-page chips.
It is in r678 on my download page 
(http://www.cs.vu.nl/~rutger/software/ecos/nand-flash/). As I had 
suggested before, this was a very small refactoring (although code has 
moved about in io_nand_chip.c to save on the number of #ifdefs).

One more candidate for a reduce in code footprint: I can add a CDL 
option to configure out support for heterogeneous controllers/chips. The 
ANC layer will become paper-thin then. If this change will make any 
difference, I will do it within, say, a week's time.

As regards the concerns for (indirect) function call overhead: my 
intuition is that the NAND operations themselves (page read, page write, 
block erase) will dominate. It takes 200..500us only to transfer a page 
over the data bus to the NAND chip; one recent data sheet mentions 
program time 200us, erase time 1.5ms. I think only a very slow CPU would 
show the overhead of less than 10 indirect function calls.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13  2:21         ` Jonathan Larmour
@ 2009-10-13 13:35           ` Rutger Hofman
  2009-10-16  4:04             ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-13 13:35 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, ecos-devel

Jonathan Larmour wrote:
> Hmm, I guess the key thing here is that in E's implementation most of 
> the complexity has been pushed into the lower layers; at least compared 
> to R's. R's has a more consistent interface through the layers. Albeit 
> at the expense of some rigidity and noticeable function overhead.
> 
> It's not likely E's will be able to easily share controller code, given 
> of course you don't know what chips, and so what chip driver APIs 
> they'll be connected to. But OTOH, maybe this isn't a big deal since a 
> lot of the controller-specific munging is likely to be platform-specific 
> anyway due to characteristics of the attached NAND (e.g. timings etc.) 
> and the only bits that would be sensibly shared would potentially happen 
> in the processor HAL anyway at startup time. What's left may not be that 
> much and isn't a problem in the platform HAL. However the likely 
> exception to that is hardware-assisted ECC. A semi-formal API for that 
> would be desirable.

This is the largest difference in design philosophy between E and R. Is 
it OK if I expand?

NAND chips are all identical in their wire setup. They all have a data 
'bus', and control lines to indicate whether what is on the bus is a 
command, an address, or data.

NAND chips differ in how their command language works, but only so far. 
What is on the market now is 'regular' large-page chips that all speak 
the same command language, and small-page chips that have a somewhat 
different command language. ONFI chips are large-page chips except in 
interrogation at startup and in bad-block marking.

E.g. a page read for a large-page chip (my running example) looks like this:
. write a command 0x00 (READ_START)
. write address bytes of the page(+offset) to be read
. write a command 0x30 (READ_CONFIRM)
. read the data on the bus
. insofar as supported retrieve hw-calculated ECC
For small-page chips the sequence is different because a page's data is 
read in multiple chunks, using READ_1_A (0x00), READ_1_B (0x01), and for 
spare area READ_2 (0x05).

These 2 languages are all the variation there is for NAND chips (plus, 
at another level, 2 timing values for read cycle and write cycle)! The 
wide-ranging differences for devices for NAND are in the controllers.

How controllers work, is that they accept input like 'write a command of 
value 0x..', 'write an address of value 0x.....', etc, and do their job 
on the NAND chip's wires. They cannot really operate at a higher level, 
if only because they must support both small-page and large-page chips 
(and ONFI), and this is the level of common protocol for the chips.

So controller code has to bridge between API calls like page_read and 
the interface of the controller as described above. R's implementation 
presumes that a lot of the code to make this translation is generic: a 
large-page read translates to the controller steps as given above in the 
running example, in any controller implementation. Moreover, the generic 
code handles spare layout: where in the spare is the application's spare 
data folded, where is the ECC, where is the bad-block mark. OTOH, the 
generic code has hooks for handling any ECC that the controller has 
computed in hardware -- how ECC is supported in hardware varies across 
controllers. But the way the ECC check is handled (case in point is 
where a correctible bit error is flagged) is generic again.

So, lots of code can (and will) be shared across controller 
implementations -- whether by code sharing or by code duplication.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13  2:44     ` Jonathan Larmour
  2009-10-13  6:35       ` Jürgen Lambrecht
  2009-10-13 12:59       ` Rutger Hofman
@ 2009-10-13 14:19       ` Rutger Hofman
  2009-10-13 19:58         ` Lambrecht Jürgen
  2 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-13 14:19 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Jürgen Lambrecht, Ross Younger, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
> Jürgen Lambrecht wrote:
[snip]
>> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB 
>> page size, x8)
>> Because if this chip, Rutger adapted the hardware ECC controller code, 
>> because our chip uses more bits (for details, ask Stijn or Rutger).
> 
> I'd be interested in what the issue was. From admittedly a quick look I 
> can't find anything about this in the code.

As things go with NAND, this was not a chip issue but a controller 
issue. This controller has a different approach to hardware ECC than 
most; it doesn't export the ECC sum values, but the ECC syndromes -- 
values that in their bit pattern indicate where any bit errors are. I 
added ECC_SYNDROME support to my generic controller code. If I compare 
with MTD, I think with this addition, R kind/a covers the range of ECC 
hardware support types that currently are in existence.

I don't know whether Televic (Stijn) actually uses the ECC_SYNDROME 
code. Last thing I heard, coincident with my adding ECC_SYNDROME, is 
that they had already solved their performance issues differently, but I 
don't know what happened after that.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: NAND technical review
  2009-10-13 14:19       ` Rutger Hofman
@ 2009-10-13 19:58         ` Lambrecht Jürgen
  0 siblings, 0 replies; 58+ messages in thread
From: Lambrecht Jürgen @ 2009-10-13 19:58 UTC (permalink / raw)
  To: 'Rutger Hofman', Jonathan Larmour
  Cc: Ross Younger, eCos developers, Deroo Stijn



> -----Original Message-----
> From: ecos-devel-owner@ecos.sourceware.org [mailto:ecos-devel-
> owner@ecos.sourceware.org] On Behalf Of Rutger Hofman
> Sent: dinsdag 13 oktober 2009 16:25
> To: Jonathan Larmour
> Cc: Lambrecht Jürgen; Ross Younger; eCos developers; Deroo Stijn
> Subject: Re: NAND technical review
>
> Jonathan Larmour wrote:
> > Jürgen Lambrecht wrote:
> [snip]
> >> - Two: also the Micron MT29F2G08AACWP-ET:D 256MB 3V3 NAND FLASH (2kB
> >> page size, x8)
> >> Because if this chip, Rutger adapted the hardware ECC controller
> code,
> >> because our chip uses more bits (for details, ask Stijn or Rutger).
> >
> > I'd be interested in what the issue was. From admittedly a quick look
> I
> > can't find anything about this in the code.
>
> As things go with NAND, this was not a chip issue but a controller
> issue. This controller has a different approach to hardware ECC than
> most; it doesn't export the ECC sum values, but the ECC syndromes --
> values that in their bit pattern indicate where any bit errors are. I
> added ECC_SYNDROME support to my generic controller code. If I compare
> with MTD, I think with this addition, R kind/a covers the range of ECC
> hardware support types that currently are in existence.
>
> I don't know whether Televic (Stijn) actually uses the ECC_SYNDROME
> code. Last thing I heard, coincident with my adding ECC_SYNDROME, is
> that they had already solved their performance issues differently, but
> I
> don't know what happened after that.
Indeed, we have not yet used it. Maybe by the end of the year.
Regards,
Jürgen

>
> Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-07 16:22     ` Rutger Hofman
  2009-10-08  7:15       ` Jürgen Lambrecht
@ 2009-10-15  3:49       ` Jonathan Larmour
  2009-10-15 14:36         ` Rutger Hofman
  2009-10-15 15:43         ` Rutger Hofman
  1 sibling, 2 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-15  3:49 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers

[ Sorry for getting back to this late - I wanted to continue with Ross 
before he went on holiday ]

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> 
>> A device number does seem to be a bit limiting, and less 
>> deterministic. OTOH, a textual name arguably adds a little extra 
>> complexity.
> 
> 
> This will be straightforward to change either way.

Noted, thanks.

>> I note Rutger's layer needs an explicit init call, whereas yours DTRT 
>> using a constructor, which is good.
> 
> 
> I followed flash v2 in this. If the experts think a constructor is 
> better, that's easy to change too.

Flash v2 doesn't use a constructor for legacy reasons and only because of 
some last minute discussions before the v3 release which couldn't reach a 
conclusion about constructor priority, given things like SPI flash. 
cyg_flash_init() is going to be properly eliminated in due course.

These issues don't really affect your layer so much as you don't have any 
legacy burden, so moving straight to a constructor is better.

>> Does your implementation _require_ a BBT in its current 
>> implementation? For simpler NAND usage, it may be overkill e.g. an 
>> application where the number of rewrites is very small, so the factory 
>> bad markers may be considered sufficient.
> 
> 
> This is a bit hairy in my opinion, and one reason is that there is no 
> Standard Layout for the spare areas. One case where a BBT is forced: my 
> BlackFin NFC can be used to boot from NAND, but it enforces a spare 
> layout that is incompatible with MTD or anybody. It is even incompatible 
> with most chips' specification that the first byte of spare in the first 
> page of the block is the Bad Block Marker. BlackFin's boot layout uses 
> this first byte in a way that suits it, and it may be 0 -- which would 
> otherwise mean Bad Block.

I infer that your layer can cope with that? I didn't see the handling for 
that in io_nand_chip_bad_block.c.

Is your BBT compatible with Linux MTD? Including your use of a mirror?

> Also, what to do if a block grows bad during usage, and that block 
> doesn't allow writing a marker in its spare area? BBT seems a solution.

Well I was making the explicit assumption that it wasn't rewritten very 
often in the lifetime of the device. Think of things like in-field 
firmware upgrades.

>>> (b) Dynamic memory allocation
>>>
>>> R's layer mandates the provision of malloc and free, or compatible
>>> functions. These must be provided to the cyg_nand_init() call.
>>
>>
>> That's unfortunate - that limits its use in smaller boot loaders - a 
>> key application.
> 
> 
> Well, it is certainly possible to calculate statically how much space 
> R's NAND layer is going to use, to allocate that statically, and write a 
> tiny function to hand it out piecemeal at the NAND layer's request. 

If you know what it's going to be (at most), it could just be allocated 
statically and just used directly surely? That's got the lowest overheads.

E's implementation had a good idea of a CDL variable for the maximum 
supported block size. Then individual HALs or driver packages can use a 
CDL 'requires' to ensure it's >= the block size of the chips really in use.

>>> E's doesn't; instead it declares a small number of static buffers.
>>
>> I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are 
>> no other variables. Again I'm thinking of the scenario of single 
>> firmware - different board revs. Can you confirm?
>>
>>> Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a 
>>> major
>>> issue because the memory needs of that layer are well-bounded; I think I
>>> broadly agree, though the situation is not ideal in that it forces 
>>> somebody
>>> who wants to use a lean, mean eCos configuration to work around.
>>
>>
>> The overhead of including something like malloc/free in the image may 
>> compare badly with the amount of memory R's needs to allocate in the 
>> first place. I also note that if R's implementation has program 
>> verifies enabled it allocates and frees a page _every_ time. If 
>> nothing else this could lead to heap fragmentation.
> 
> 
> Program verifies should be considered a very deep debugging trait. 

I'm not sure about that. Experience with NOR Flash has shown that despite 
promises of error reporting in the datasheets, sometimes the only way to 
be sure of data integrity is an explicit verify step. It's up to the user, 
but I would consider it to have more use than just for debugging a driver.

> Still, another possible implementation for this page buffer would be on 
> the stack (not!), or in the controller struct. That would grow then by 
> 8KB + spare.

Or a single one for all chips maybe (since chances of clashes seem pretty 
small, so just protected with a mutex). And only if the program verify 
option is enabled of course. As per above, the page buffer size could be 
derived from the configuration, with appropriate CDL.

> [snip]
> 
>>> - R's model shares the command sequence logic amongst all chips,
>>> differentiating only between small- and large-page devices. (I do not 
>>> know
>>> whether this is correct for all current chips, though going forwards 
>>> seems
>>> less likely to be an issue as fully-ONFI-compliant chips become the 
>>> norm.)
>>
>>
>> Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it 
>> may be too prescriptive to be robustly future-proof.
> 
> 
> Well, there is no way I can see into the future, but I definitely think 
> that the wire command model for NAND chips is going to stay -- it is in 
> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum 
> NAND chips use it too.

I don't entirely disagree. But people do have a habit of inventing new 
things, particularly if it allows them to differentiate their products 
from their competitors.

> There are chips that use a different interface, 
> like SSD or MMC or OneNand, but then these chips come with on-chip bad 
> block management, wear leveling of some kind, and are completely 
> different in the way they must be handled. I'd say E's and R's 
> implementations are concerned only with 'raw' NAND chips.
>> One could say that makes it a more realistic emulation. But yes I can 
>> see disadvantages with a somewhat rigid world view. Thinking out loud, 
>> I wonder if Rutger's layer could work with something like Samsung 
>> OneNAND.
> 
> 
> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says: 
> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array 
> using a NOR Flash interface."

OneNAND isn't like SSD or MMC which essentially provide a block interface 
and an advanced controller hiding the details of NAND. It isn't like NOR 
flash because you can't address the entire array - as shown by the fact it 
only has a 16-bit address bus. Instead with OneNAND you get an SRAM buffer 
as a "window" into the NAND array. There are commands to load data from 
NAND pages into the SRAM buffers, or write them back. It has onboard ECC 
logic, but it has a very different way of controlling the NAND. You do get 
access to both data and spare areas too.

You can consider this the sort of thing I mean when I say that 
manufacturers can come up with interesting things which break rigid 
assumptions of how you talk to NAND chips. So my concern is not (just) 
that your layer can't support OneNAND, but it couldn't support anything 
which also had a different interface.

Obviously you already support small versus large page, which require 
different protocols, but they are still relatively similar in how they're 
controlled. Would it even be possible to sensibly extend your generic 
layer to support something like OneNAND? Without having a large number of 
kludges?

>> I would certainly appreciate feedback from anyone who has used R's 
>> layer. What you say would seem to imply that both small page and OFNI 
>> are untested in R's layer.
> 
> 
> That is correct. I would love some small-page testing. I have seen no 
> ONFI chips on the market yet, so testing will be future work for both E 
> and R.

Ross said that the Samsung K9 is pretty similar to ONFI, other than how 
you read the device ID etc. Is your layer equally close?

Thanks,

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-08  7:15       ` Jürgen Lambrecht
@ 2009-10-15  3:53         ` Jonathan Larmour
  2009-10-15 11:54           ` Jürgen Lambrecht
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-15  3:53 UTC (permalink / raw)
  To: Jürgen Lambrecht; +Cc: Rutger Hofman, Ross Younger, eCos developers

Jürgen Lambrecht wrote:
> Rutger Hofman wrote:
>>
>> Well, there is no way I can see into the future, but I definitely think
>> that the wire command model for NAND chips is going to stay -- it is in
>> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
>> NAND chips use it too. There are chips that use a different interface,
>> like SSD or MMC or OneNand, but then these chips come with on-chip bad
>> block management, wear leveling of some kind, and are completely
>> different in the way they must be handled. I'd say E's and R's
>> implementations are concerned only with 'raw' NAND chips.
> 
> Correct, only for raw NAND chips to be soldered on a board. The others 
> have an embedded controller and are already packaged.

I don't think E's implementation would have the same problem with OneNAND 
as R's (see below). Yes it has a sort of controller, but it's not as 
advanced as an MMC or SSD one - instead it's there as logic to manage 
exchanges between its SRAM and the NAND array.

>>> One could say that makes it a more realistic emulation. But yes I can
>>> see disadvantages with a somewhat rigid world view. Thinking out loud, I
>>> wonder if Rutger's layer could work with something like Samsung OneNAND.
>>
>> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
>> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
>> using a NOR Flash interface."
> 
> Indeed, a oneNAND is to be threated as a NOR flash, like a pseudoSRAM is 
> a DRAM with SRAM interface.
> And SSD has a hard disk drive interface, just like MMC and SD card; they 
> mostly have a FAT file system on them but also UFS ...

FAOD, I don't believe that's true. You only get a view into a small window 
of the NAND. That small window is memory mapped, but that's all. It's 
certainly not controlled like a NOR flash. It's a NAND, but not one with 
the wire command model that R's implementation assumes.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13  6:35       ` Jürgen Lambrecht
@ 2009-10-15  3:55         ` Jonathan Larmour
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-15  3:55 UTC (permalink / raw)
  To: Jürgen Lambrecht; +Cc: eCos developers, Deroo Stijn

Jürgen Lambrecht wrote:
> Jonathan Larmour wrote:
>> Jürgen Lambrecht wrote:
[snip]
>>> We have it very well tested, amongst others
[snip]
>>
>> That's extremely useful to know, thanks! But a couple of further 
>> questions
>> on this: (1) Did any bad blocks show up at any point? (2) Were you 
>> using a bad
>> block table? (3) Presumably there were factory-marked bad blocks on some?
> 
> (3) Yes, there are almost always factory-marked bad blocks.
> (2) yes
> (1)Yes, certainly! We have from time to time bad blocks, and they are 
> handled correctly.

That's great to know, especially (1), thanks!

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13 12:59       ` Rutger Hofman
@ 2009-10-15  4:41         ` Jonathan Larmour
  2009-10-15 14:55           ` Rutger Hofman
  2009-10-19 10:53           ` Ross Younger
  0 siblings, 2 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-15  4:41 UTC (permalink / raw)
  To: Rutger Hofman
  Cc: Jürgen Lambrecht, Ross Younger, eCos developers, Deroo Stijn

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> [snip]
> 
>>> We also prefer R's model of course because we started with R's model 
>>> and use it now.
>>
>>
>> You haven't done any profiling by any luck have you? Or code size 
>> analysis? Although I haven't got into the detail of R's version yet 
>> (since I was starting with dissecting E's), both the footprint and the 
>> cumulative function call and indirection time overhead are concerns of 
>> mine.
> 
> 
> In a first step in mitigating the 'footprint pressure', I have added CDL 
> options to configure in/out support for the various chips types, to wit: 
> - ONFI chips;
> - 'regular' large-page chips;
> - 'regular' small-page chips.
> It is in r678 on my download page 
> (http://www.cs.vu.nl/~rutger/software/ecos/nand-flash/). As I had 
> suggested before, this was a very small refactoring (although code has 
> moved about in io_nand_chip.c to save on the number of #ifdefs).

I'm sure that's useful.

> One more candidate for a reduce in code footprint: I can add a CDL 
> option to configure out support for heterogeneous controllers/chips. The 
> ANC layer will become paper-thin then. If this change will make any 
> difference, I will do it within, say, a week's time.

I wouldn't want you to spend time until the decision's made. I'll make a 
note that it would take a week to do. Admittedly, I'm not sure the savings 
would be enough to make it "paper-thin".

> As regards the concerns for (indirect) function call overhead: my 
> intuition is that the NAND operations themselves (page read, page write, 
> block erase) will dominate. It takes 200..500us only to transfer a page 
> over the data bus to the NAND chip; one recent data sheet mentions 
> program time 200us, erase time 1.5ms. I think only a very slow CPU would 
> show the overhead of less than 10 indirect function calls.

I think it's more the cumulative effect, primarily on reads. Especially as 
there's no asynchronous aspect - the control process is synchronous, so 
any delays between real underlying NAND operations only add up. Ross 
quoted an example of about 25us for a page read. Off the top of my head, 
for something like a 64MHz CPU with 4 clock ticks per instruction on 
average, that's 16 insns per us, so a page read is about equivalent to 400 
insns. At that sort of level I'm not sure overheads are lost in the noise. 
  Maybe I've messed up those guestimates though.

I wonder if Ross has any performance data for E he could contribute?

On a separate point, while I'm here, I think the use of printf via 
cyg_nand_global.pf would want tidied up a lot. Some of them seem to be 
there to mention errors to the user, but without any programmatic 
treatment of the errors, primarily reporting them to higher layers.

It should also be possible to eliminate the overheads of the printf. Right 
now there's quite a lot of them, involving function calls, allocation of 
const string data, and occasionally calculation of arguments, even if the 
pf function pointer is pointing to an empty null printf function. It 
should be possible to turn them off entirely, and not be any worse off for 
it (including error reporting back up to higher layers). It might not be 
so bad if the strings were a lot shorter, or the printf functions less 
frequently used, but being able to turn them off entirely would seem better.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15  3:53         ` Jonathan Larmour
@ 2009-10-15 11:54           ` Jürgen Lambrecht
  0 siblings, 0 replies; 58+ messages in thread
From: Jürgen Lambrecht @ 2009-10-15 11:54 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Rutger Hofman, Ross Younger, eCos developers

Jonathan Larmour wrote:
> Jürgen Lambrecht wrote:
>   
>> Rutger Hofman wrote:
>>     
>>> Well, there is no way I can see into the future, but I definitely think
>>> that the wire command model for NAND chips is going to stay -- it is in
>>> ONFI, after all. Besides, all except the 1 or 2 most pioneering museum
>>> NAND chips use it too. There are chips that use a different interface,
>>> like SSD or MMC or OneNand, but then these chips come with on-chip bad
>>> block management, wear leveling of some kind, and are completely
>>> different in the way they must be handled. I'd say E's and R's
>>> implementations are concerned only with 'raw' NAND chips.
>>>       
>> Correct, only for raw NAND chips to be soldered on a board. The others
>> have an embedded controller and are already packaged.
>>     
>
> I don't think E's implementation would have the same problem with OneNAND
> as R's (see below). Yes it has a sort of controller, but it's not as
> advanced as an MMC or SSD one - instead it's there as logic to manage
> exchanges between its SRAM and the NAND array.
>
>   
>>>> One could say that makes it a more realistic emulation. But yes I can
>>>> see disadvantages with a somewhat rigid world view. Thinking out loud, I
>>>> wonder if Rutger's layer could work with something like Samsung OneNAND.
>>>>         
>>> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says:
>>> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array
>>> using a NOR Flash interface."
>>>       
>> Indeed, a oneNAND is to be threated as a NOR flash, like a pseudoSRAM is
>> a DRAM with SRAM interface.
>> And SSD has a hard disk drive interface, just like MMC and SD card; they
>> mostly have a FAT file system on them but also UFS ...
>>     
>
> FAOD, I don't believe that's true. You only get a view into a small window
> of the NAND. That small window is memory mapped, but that's all. It's
> certainly not controlled like a NOR flash. It's a NAND, but not one with
> the wire command model that R's implementation assumes.
>   
I'm sorry Jifl, I did not check oneNAND very well. I read a bit about it 
some time ago when I was looking for a bigger NOR flash for our ARM7 board.
Jürgen
> Jifl
> --
> --["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
>   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15  3:49       ` Jonathan Larmour
@ 2009-10-15 14:36         ` Rutger Hofman
  2009-10-16  1:32           ` Jonathan Larmour
  2009-10-15 15:43         ` Rutger Hofman
  1 sibling, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-15 14:36 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
> [ Sorry for getting back to this late - I wanted to continue with Ross 
> before he went on holiday ]
> 
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
>>> Does your implementation _require_ a BBT in its current 
>>> implementation? For simpler NAND usage, it may be overkill e.g. an 
>>> application where the number of rewrites is very small, so the 
>>> factory bad markers may be considered sufficient.

I had forgotten: there is a configuration option to bypass BBT and only 
use factory-bad markers (and caveat emptor).

>> This is a bit hairy in my opinion, and one reason is that there is no 
>> Standard Layout for the spare areas. One case where a BBT is forced: 
>> my BlackFin NFC can be used to boot from NAND, but it enforces a spare 
>> layout that is incompatible with MTD or anybody. It is even 
>> incompatible with most chips' specification that the first byte of 
>> spare in the first page of the block is the Bad Block Marker. 
>> BlackFin's boot layout uses this first byte in a way that suits it, 
>> and it may be 0 -- which would otherwise mean Bad Block.
> 
> I infer that your layer can cope with that? I didn't see the handling 
> for that in io_nand_chip_bad_block.c.

No (not yet). To use the NAND controller in this way, a different spare 
layout must be used for the chip. Although there are no obstacles to 
selecting different spare layouts, there is no support for that yet. It 
would require one extra parameter in the chip device struct 
'constructor' (e.g. with NULL for 'choose default = MTD compatible'). 
For the record: MDT/Blackfin/u-boot has support for this different 
layout, but it is build-static. MDT cannot hot-swap layouts (at the moment).

> Is your BBT compatible with Linux MTD? Including your use of a mirror?

Yes, I read MTD, and tried to copy their BBT handling as faithfully as 
possible without actually copying code. It is on my stack to check if 
the BBTs are indeed identical; as you may have noticed elsethread, my 
eCos application wants to share a YAFFS 'disk' with u-boot which has MTD.

>>>> (b) Dynamic memory allocation
>>>>
>>>> R's layer mandates the provision of malloc and free, or compatible
>>>> functions. These must be provided to the cyg_nand_init() call.
>>>
>>>
>>> That's unfortunate - that limits its use in smaller boot loaders - a 
>>> key application.
>>
>>
>> Well, it is certainly possible to calculate statically how much space 
>> R's NAND layer is going to use, to allocate that statically, and write 
>> a tiny function to hand it out piecemeal at the NAND layer's request. 
> 
> If you know what it's going to be (at most), it could just be allocated 
> statically and just used directly surely? That's got the lowest overheads.
> 
> E's implementation had a good idea of a CDL variable for the maximum 
> supported block size. Then individual HALs or driver packages can use a 
> CDL 'requires' to ensure it's >= the block size of the chips really in use.

I can follow Ross's example here. Together with a switch to constructor 
and a cleanup of printfs, that will take some days. If it matters in the 
decision, I will schedule this to be finished within one month.

>> Still, another possible implementation for this page buffer would be 
>> on the stack (not!), or in the controller struct. That would grow then 
>> by 8KB + spare.
> 
> Or a single one for all chips maybe (since chances of clashes seem 
> pretty small, so just protected with a mutex). And only if the program 
> verify option is enabled of course. As per above, the page buffer size 
> could be derived from the configuration, with appropriate CDL.

Right, I'll do that when the allocator dependency goes.

>> See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says: 
>> "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash 
>> array using a NOR Flash interface."
> 
> OneNAND isn't like SSD or MMC which essentially provide a block 
> interface and an advanced controller hiding the details of NAND. It 
> isn't like NOR flash because you can't address the entire array - as 
> shown by the fact it only has a 16-bit address bus. Instead with OneNAND 
> you get an SRAM buffer as a "window" into the NAND array. There are 
> commands to load data from NAND pages into the SRAM buffers, or write 
> them back. It has onboard ECC logic, but it has a very different way of 
> controlling the NAND. You do get access to both data and spare areas too.
> 
> You can consider this the sort of thing I mean when I say that 
> manufacturers can come up with interesting things which break rigid 
> assumptions of how you talk to NAND chips. So my concern is not (just) 
> that your layer can't support OneNAND, but it couldn't support anything 
> which also had a different interface.

> Obviously you already support small versus large page, which require 
> different protocols, but they are still relatively similar in how 
> they're controlled. Would it even be possible to sensibly extend your 
> generic layer to support something like OneNAND? Without having a large 
> number of kludges?

I will take a better look at the OneNAND datasheet. You are right, it is 
software-wise as different from NOR as from 'raw' NAND. My guarded guess 
now is that integration into R would imply a replacement of the Common 
Controller code (by configuration or by 'object-oriented' indirect calls 
over a device struct). I will report on this later.

> Ross said that the Samsung K9 is pretty similar to ONFI, other than how 
> you read the device ID etc. Is your layer equally close?

Yes, absolutely. The reason is that the large-page chip that I tested is 
a 'regular' large-page chip, same as Samsung K9, and same as Jurgen 
Lambrecht's chip. All 'regular' large-page chips are equally pretty 
similar to ONFI. Small-page chips are not in ONFI though.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15  4:41         ` Jonathan Larmour
@ 2009-10-15 14:55           ` Rutger Hofman
  2009-10-16  1:45             ` Jonathan Larmour
  2009-10-19 10:53           ` Ross Younger
  1 sibling, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-15 14:55 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Jürgen Lambrecht, Ross Younger, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
> On a separate point, while I'm here, I think the use of printf via 
> cyg_nand_global.pf would want tidied up a lot. Some of them seem to be 
> there to mention errors to the user, but without any programmatic 
> treatment of the errors, primarily reporting them to higher layers.
> 
> It should also be possible to eliminate the overheads of the printf. 
> Right now there's quite a lot of them, involving function calls, 
> allocation of const string data, and occasionally calculation of 
> arguments, even if the pf function pointer is pointing to an empty null 
> printf function. It should be possible to turn them off entirely, and 
> not be any worse off for it (including error reporting back up to higher 
> layers). It might not be so bad if the strings were a lot shorter, or 
> the printf functions less frequently used, but being able to turn them 
> off entirely would seem better.

I agree. Many of the printfs are leftovers from debugging stages. They 
should go (and will go anyway at a next code cleanup), and an error 
should be reported upwards where that isn't done yet; or possibly 
asserts when they flag a programming error in this layer -- preferences? 
I will do this somewhere in the coming weeks.

When the dependency on a memory allocator is also gone (see other 
response), there is no practical obstacle left to switch from explicit 
initialisation to init-time constructor.

If this makes a difference in acceptance, I will convert from malloc and 
explicit initialisation somewhere within one month.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15  3:49       ` Jonathan Larmour
  2009-10-15 14:36         ` Rutger Hofman
@ 2009-10-15 15:43         ` Rutger Hofman
  1 sibling, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-15 15:43 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
>>> Does your implementation _require_ a BBT in its current 
>>> implementation? For simpler NAND usage, it may be overkill e.g. an 
>>> application where the number of rewrites is very small, so the 
>>> factory bad markers may be considered sufficient.
>>
>> This is a bit hairy in my opinion, and one reason is that there is no 
>> Standard Layout for the spare areas. One case where a BBT is forced: 
>> my BlackFin NFC can be used to boot from NAND, but it enforces a spare 
>> layout that is incompatible with MTD or anybody. It is even 
>> incompatible with most chips' specification that the first byte of 
>> spare in the first page of the block is the Bad Block Marker. 
>> BlackFin's boot layout uses this first byte in a way that suits it, 
>> and it may be 0 -- which would otherwise mean Bad Block.
> 
> I infer that your layer can cope with that? I didn't see the handling 
> for that in io_nand_chip_bad_block.c.

Well, I think I didn't answer this appropriately after all. This is not 
a chip issue but a controller issue (I am the needle stuck in the 
groove). The issue is that the BlackFin NFC (in its boot mode only!) 
enforces a deviant spare layout, which introduces an incompatibility 
between controller and any chips: the booting controller thinks it can 
arbitrarily use byte 0 of the first page's spare, and for chips that is 
usually the bad-block marker.

One observation: the BlackFin NFC boots from page 0 (on block 0) and 
NAND chips usually guarantee that block 0 is not bad. Chips may even 
have a specified higher write-count-before-errors for block 0 than the 
other blocks. I think the chip manufacturers' motivation is to 
facilitate special handling of block 0: boot code, FIS, BBT, anything...

Another observation: even though block 0 is not bad whatever the marker 
written by the boot code, it ought to be marked BAD_RESERVED in the BBT 
to avoid accidental erasure.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15 14:36         ` Rutger Hofman
@ 2009-10-16  1:32           ` Jonathan Larmour
  2009-10-19  9:56             ` Ross Younger
  2009-10-19 14:21             ` Rutger Hofman
  0 siblings, 2 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-16  1:32 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers

Rutger Hofman wrote:
> Jonathan Larmour wrote:
>> Rutger Hofman wrote:
>>> Jonathan Larmour wrote:
>>>
>>>> Does your implementation _require_ a BBT in its current 
>>>> implementation? For simpler NAND usage, it may be overkill e.g. an 
>>>> application where the number of rewrites is very small, so the 
>>>> factory bad markers may be considered sufficient.
> 
> 
> I had forgotten: there is a configuration option to bypass BBT and only 
> use factory-bad markers (and caveat emptor).

Ah. Your documentation includes:
"Before the second initialization stage, some properties of the NAND 
system can be configured. These are en/disabling of ECC generation and 
en/disabling of a Bad Block Table (BBT). By default, ECC and BBT are enabled."

I hadn't noticed that these are not in fact CDL configuration options. 
They ought to be really.

>>> This is a bit hairy in my opinion, and one reason is that there is no 
>>> Standard Layout for the spare areas. One case where a BBT is forced: 
>>> my BlackFin NFC can be used to boot from NAND, but it enforces a 
>>> spare layout that is incompatible with MTD or anybody. It is even 
>>> incompatible with most chips' specification that the first byte of 
>>> spare in the first page of the block is the Bad Block Marker. 
>>> BlackFin's boot layout uses this first byte in a way that suits it, 
>>> and it may be 0 -- which would otherwise mean Bad Block.
>>
>>
>> I infer that your layer can cope with that? I didn't see the handling 
>> for that in io_nand_chip_bad_block.c.
> 
> No (not yet).

If it doesn't sound too silly, how were you able to test your layer on the 
bfin then?

> To use the NAND controller in this way, a different spare 
> layout must be used for the chip. Although there are no obstacles to 
> selecting different spare layouts, there is no support for that yet. It 
> would require one extra parameter in the chip device struct 
> 'constructor' (e.g. with NULL for 'choose default = MTD compatible'). 

You mean CYG_NAND_DRIVER_CHIP()? Presumably some way to pass a different 
layout?

I was wondering about the extensibility of the spare layouts given how 
much stuff is sort of hard coded - sure it may fit plenty of existing 
chips but more can come along. In the chip ID table in io_nand_chip.c I 
haven't worked out what the layout field is for - I can't find where it is 
used. I also can't see how a driver in a new port can add a new chip with 
a new layout. There's talk in read_id() of being able to do a custom chip 
device (does that mean also you can do a custom spare layout?), but it's 
not clear to me how a new port can add that to the table since read_id() 
only searches the table. Obviously it's not sensible to have to keep 
changing io/nand, and may be inappropriate for custom spare layouts.

NB the chip ID table can be const can't it?

> For the record: MDT/Blackfin/u-boot has support for this different 
> layout, but it is build-static. MDT cannot hot-swap layouts (at the 
> moment).

That's reasonable given it's associated with the controller.

>> Is your BBT compatible with Linux MTD? Including your use of a mirror?
> 
> 
> Yes, I read MTD, and tried to copy their BBT handling as faithfully as 
> possible without actually copying code. It is on my stack to check if 
> the BBTs are indeed identical; as you may have noticed elsethread, my 
> eCos application wants to share a YAFFS 'disk' with u-boot which has MTD.

Okay. Obviously compatibility is very important, especially for those 
using eCos/RedBoot just to load Linux.

But also, since R does not have partitioning, won't that potentially 
interfere with compatibility with MTD? A Linux booted image may not be 
using the whole chip as a single FS.

[memory use]
>> If you know what it's going to be (at most), it could just be 
>> allocated statically and just used directly surely? That's got the 
>> lowest overheads.
>>
>> E's implementation had a good idea of a CDL variable for the maximum 
>> supported block size. Then individual HALs or driver packages can use 
>> a CDL 'requires' to ensure it's >= the block size of the chips really 
>> in use.
> 
> I can follow Ross's example here. Together with a switch to constructor 
> and a cleanup of printfs, that will take some days. If it matters in the 
> decision, I will schedule this to be finished within one month.

I'll make a note about it - again I don't want you to be adjusting things 
before the decision has been made - I for one certainly haven't made up my 
mind either way. But thanks very much for your willingness to do so! I 
doubt either implementation will be checked in with zero further changes 
anyway.

[OneNAND (and things like it) fitting into R's model]
> I will take a better look at the OneNAND datasheet. You are right, it is 
> software-wise as different from NOR as from 'raw' NAND. My guarded guess 
> now is that integration into R would imply a replacement of the Common 
> Controller code (by configuration or by 'object-oriented' indirect calls 
> over a device struct). I will report on this later.

Thanks. Although of course adding more indirection may have its own 
disadvantages.

Ross, if you're reading, I would be interested to know whether something 
with a different access model, like OneNAND, would work with E's layer, in 
your opinion. While I think the answer is yes, I had better check.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15 14:55           ` Rutger Hofman
@ 2009-10-16  1:45             ` Jonathan Larmour
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-16  1:45 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers, Deroo Stijn

Rutger Hofman wrote:
> 
> I agree. Many of the printfs are leftovers from debugging stages. They 
> should go (and will go anyway at a next code cleanup), and an error 
> should be reported upwards where that isn't done yet; or possibly 
> asserts when they flag a programming error in this layer -- preferences? 
> I will do this somewhere in the coming weeks.

I think that's the way to do it - asserts for programming errors (things 
which should never ever happen), and errors for things which could maybe 
happen in the field, e.g. due to hardware errors.

If you prefer you could change the existing printfs into some sorts of 
macros which you'd only want to see if you're debugging NAND operation, 
and completely left out otherwise. Like CYG_NAND_CHATTER. Or perhaps some 
of them should be turned into CYG_NAND_CHATTER, it depends.

> When the dependency on a memory allocator is also gone (see other 
> response), there is no practical obstacle left to switch from explicit 
> initialisation to init-time constructor.
> 
> If this makes a difference in acceptance, I will convert from malloc and 
> explicit initialisation somewhere within one month.

If the decision is made to adopt your one, that's a change I think would 
be beneficial yes, and can be done then.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-13 13:35           ` Rutger Hofman
@ 2009-10-16  4:04             ` Jonathan Larmour
  2009-10-19 14:51               ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-16  4:04 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, ecos-devel

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> 
>> Hmm, I guess the key thing here is that in E's implementation most of 
>> the complexity has been pushed into the lower layers; at least 
>> compared to R's. R's has a more consistent interface through the 
>> layers. Albeit at the expense of some rigidity and noticeable function 
>> overhead.
>>
>> It's not likely E's will be able to easily share controller code, 
>> given of course you don't know what chips, and so what chip driver 
>> APIs they'll be connected to. But OTOH, maybe this isn't a big deal 
>> since a lot of the controller-specific munging is likely to be 
>> platform-specific anyway due to characteristics of the attached NAND 
>> (e.g. timings etc.) and the only bits that would be sensibly shared 
>> would potentially happen in the processor HAL anyway at startup time. 
>> What's left may not be that much and isn't a problem in the platform 
>> HAL. However the likely exception to that is hardware-assisted ECC. A 
>> semi-formal API for that would be desirable.
> 
> 
> This is the largest difference in design philosophy between E and R. Is 
> it OK if I expand?

Sure.

> NAND chips are all identical in their wire setup. They all have a data 
> 'bus', and control lines to indicate whether what is on the bus is a 
> command, an address, or data.
> 
> NAND chips differ in how their command language works, but only so far. 
> What is on the market now is 'regular' large-page chips that all speak 
> the same command language, and small-page chips that have a somewhat 
> different command language. ONFI chips are large-page chips except in 
> interrogation at startup and in bad-block marking.

As I've already noted, it may be useful to think ahead to what may come 
into the market later, including things that don't fit into the known 
command languages (such as existing OneNAND) - a framework which can 
support wider implementations can have that advantage.

[snip example]
> These 2 languages are all the variation there is for NAND chips (plus, 
> at another level, 2 timing values for read cycle and write cycle)! The 
> wide-ranging differences for devices for NAND are in the controllers.
> 
> How controllers work, is that they accept input like 'write a command of 
> value 0x..', 'write an address of value 0x.....', etc, and do their job 
> on the NAND chip's wires. They cannot really operate at a higher level, 
> if only because they must support both small-page and large-page chips 
> (and ONFI), and this is the level of common protocol for the chips.
> 
> So controller code has to bridge between API calls like page_read and 
> the interface of the controller as described above. R's implementation 
> presumes that a lot of the code to make this translation is generic: a 
> large-page read translates to the controller steps as given above in the 
> running example, in any controller implementation.

That's true. At the same time, have a look at E's code in 
https://bugzilla.ecoscentric.com/show_bug.cgi?id=1000770
Specifically the Samsung K9 driver in 
devs/nand/samsung_k9/d20090826/include/k9fxx08x0x.inl - while you could 
argue the steps required are generic and can be made common (write this 
address, write that command, etc.), it seems E assumes that the steps may 
not really be complex enough to justify abstracting them out.

I would certainly be interested in your perspective about what E's driver 
implementation lacks compared to R's. Lack of hardware ECC is one thing 
certainly.

> Moreover, the generic 
> code handles spare layout: where in the spare is the application's spare 
> data folded, where is the ECC, where is the bad-block mark. 

In E's implementation, the complexities of an abstracted spare layout seem 
to start disappearing as you know more about what chip you've got as a lot 
of the complexity has been pushed into the chip driver.

> OTOH, the 
> generic code has hooks for handling any ECC that the controller has 
> computed in hardware -- how ECC is supported in hardware varies across 
> controllers. But the way the ECC check is handled (case in point is 
> where a correctible bit error is flagged) is generic again.

In E's case, in the EA LPC2468 port example, they have the following in 
the platform HAL for a port (although it could be a package instead):

[various functions/macros defined which are used by k9fxx08x0x.inl]
#include <cyg/devs/nand/k9fxx08x0x.inl>
CYG_NAND_DEVICE(ea_nand, "onboard", &k9f8_funs, &_k9_ea_lpc2468_priv,
                 &linux_mtd_ecc, &nand_mtd_oob_64);

which succinctly brings together the chip driver, accessor functions, ECC 
algorithm, and OOB layout. It becomes easy for a board port to choose some 
different chips/layouts/ECC. There's flexibility for the future in that.

With R's implementation, there seems to be much more code involved. And I 
sort of see why there's more code, and I sort of don't. Not just in the 
generic layer, but in the drivers as well, at least looking at the bfin 
chip, and I don't think the differences are completely explained by the 
hardware properties of each NFC (but I'm very willing to be corrected!). 
Comparing E's k9_read_page() along with everything it calls, with R's 
bfin_nfc_data_read() along with everything it calls (and those call etc. 
not just in bfin_nfc.c but also nand_ez_kit_bf548.inc[1]) there's a huge 
difference. If nothing else from what I can tell this may then require a 
much larger porting effort, compared to E's.

I see that some of the reasons for larger code in R are due to run-time 
testing of hardware properties: 8 vs 16-bit bus width, SP vs LP vs ONFI. I 
also note that E's implementation doesn't do as much error checking as I 
think it ought to, especially in the Samsung K9 chip driver. But that's 
not all of it the difference.

Anyway, I think I'm talking out loud here rather than asking anything 
specific about it. It may just be something we have to put down to the 
difference in design philosophy, rather than something which can be 
improved. There are still advantages with R in other ways.

Jifl

[1] which should really be .inl for consistency in eCos but that's a detail
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-02 15:51 NAND technical review Jonathan Larmour
  2009-10-06 13:51 ` Ross Younger
@ 2009-10-16  7:29 ` Simon Kallweit
  2009-10-16 13:53   ` Jonathan Larmour
  2009-10-19 15:02   ` Rutger Hofman
  1 sibling, 2 replies; 58+ messages in thread
From: Simon Kallweit @ 2009-10-16  7:29 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers, Rutger Hofman

Jonathan Larmour wrote:
> But this is an open discussion, so I'd appreciate anyone's views. I'd 
> especially value Simon Kallweit's views as someone who has actually used 
> both code implementations which gives him a very good perspective. 
> Although if anyone wants to contribute, please keep it on topic, within 
> this thread, and technical.

I have been following the NAND discussion since it started two weeks 
ago. I actually don't have much to add, because I think most points have 
been brought to topic already. Also I currently don't have an immediate 
necessity for NAND flash support in our products, so it's not of high 
priority to me at the moment.

I still try to give a quick flashback of my work with both R's and E's 
implementations. I started out using R's implementation, trying to add a 
driver for synthetic NAND chips, which would not exist back then. In the 
meantime Rutger has implemented a synthetic chip, but in form of a NAND 
controller and not as I have tried, in form of a NAND chip. In 
retrospective, this seems to be the better (and simpler) approach. What 
I dislike is how the synthetic chips have to be configured. R's 
implementation needs the user to assign a valid NAND chip device id, 
which will then be used through chip interrogation to determine the 
chips geometry. I find it much more useful if you can directly define 
the chips geometry in CDL, as it's more explicit. This brings me to my 
biggest concern with R's design. It's pretty rigid in terms of future 
chip implementations. Sure if everyone is going to make ONFI chips in 
the future, that's fine. Otherwise, parts of the layering could be 
rendered wrong/useless rather soon. I also dislike the generic 
determination of the chip geometry in io_nand_chip.c:read_id(). IMHO it 
is already a bit messy by mixing interrogation for small-page, 
large-page and ONFI chips. This is probably fine, until exceptions have 
to be implemented to support more exotic chips. I might be wrong, as I 
think MTD does chip interrogation in a similar way. E's model splits 
chip interrogation into the drivers, adding more flexibility in turn for 
a bit of code duplication.

My work with E's framework involved writing basic drivers for the STM32 
evaluation board as well as doing a few tests with YAFFS1. This was a 
breeze. I only implemented the basics, adapting/copying the drivers from 
Ross. It occurred to me, that a lot of the code could just straight be 
copied. So there is a certain level of code duplication in E's 
framework, but as John pointed out, it's questionable if things like 
address/command writing etc. should be abstracted out, as they are so 
simple, and may need adjustment in the future for new chips. In general, 
I think E's code is more lightweight and quite a bit looser in coupling, 
which I think results in smaller code size, lower overhead but most 
importantly in more flexibility. Just the right thing for a platform 
where resources are scarce. My tests with YAFFS1 were promising, but I 
had to realize that YAFFS just needs too much memory for my current 
platform, so I abandoned it.

My current preference clearly is with E's framework. But I might be 
biased, as my current platform is quite low on resources, and I'm 
looking for a small, simple and lightweight framework.

I will gladly elaborate in more detail if there are further questions 
about my experience with both frameworks.


Simon

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-16  7:29 ` Simon Kallweit
@ 2009-10-16 13:53   ` Jonathan Larmour
  2009-10-19 15:02   ` Rutger Hofman
  1 sibling, 0 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-16 13:53 UTC (permalink / raw)
  To: Simon Kallweit; +Cc: Ross Younger, eCos developers, Rutger Hofman

Simon Kallweit wrote:
> 
> I still try to give a quick flashback of my work with both R's and E's 
> implementations.

Thanks very much. Your insight is valuable, especially given your experience.

> I will gladly elaborate in more detail if there are further questions 
> about my experience with both frameworks.

You wouldn't happen to have any opinion on run-time overheads of E's 
versus R's? Just since we were discussing this in the thread (I asked the 
same of Jurgen) and it wasn't clear whether these add up to anything 
significant compared to the NAND accesses themselves.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-16  1:32           ` Jonathan Larmour
@ 2009-10-19  9:56             ` Ross Younger
  2009-10-19 14:21             ` Rutger Hofman
  1 sibling, 0 replies; 58+ messages in thread
From: Ross Younger @ 2009-10-19  9:56 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Rutger Hofman, eCos developers

Jonathan Larmour wrote:
> Ross, if you're reading, I would be interested to know whether something
> with a different access model, like OneNAND, would work with E's layer,
> in your opinion. While I think the answer is yes, I had better check.

( Slowly catching up on the EEC email mountain ;-) )

Yes, I see no reason why different access models wouldn't work. OneNAND
would be a new chip driver to write, but should be pretty straightforward.


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-15  4:41         ` Jonathan Larmour
  2009-10-15 14:55           ` Rutger Hofman
@ 2009-10-19 10:53           ` Ross Younger
  2009-10-20  1:40             ` Jonathan Larmour
  1 sibling, 1 reply; 58+ messages in thread
From: Ross Younger @ 2009-10-19 10:53 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Rutger Hofman, Jürgen Lambrecht, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
> I wonder if Ross has any performance data for E he could contribute?

I have done a little benchmarking and so have _some_ numbers to hand, but
the goalposts are moving and my figures are a bit old and must be treated
with caution...

On the EA LPC2468 board (Samsung K9 NAND chip), with the state of my code on
July 8, compiling with -O2 and asserts off, my NAND benchmarker reported
average page read times[*] of 3578us per page, programming 2680us, and
erasing 1848us. These stack up against the fastest-possible raw chip times
(which I computed from the "typical" times on the datasheet) of 88.5, 363.5
and 2000us.

[*] full page (2k) plus OOB

This led to a YAFFS throughput data rate, on a recently-erased NAND array,
of up to 480kB/s in reading and 578kB/s in writing. (Actual rates vary
depending on the size of chunk you pass to read() and write().)

It seems worthwhile to share numbers for the customer port I've been working
on; not because they're of direct use to the eCos project, but to show what
a difference the hardware makes.

The board is based on the Samsung S3C2410X ucontroller and carries the same
Samsung K9 NAND chip as on the EA LPC2468. Now, this CPU has a dedicated
NAND controller with hardware ECC... After I taught the library to use h/w
ECC I immediately saw a 46% speedup on reads and 38% on writes when compared
with software ECC. I've also added an option to do a partial loop unroll in
the read and write cycles which gives a further 4% boost on reads and 15% on
writes. The current (work-in-progress) numbers I have from the benchmarker
are 452us per page read, 623us per write and 1934us per erase; YAFFS
throughput is similarly impressive at 4690 kB/s in reads and 3432 kB/s in
writes. (Charles Manning has stated publicly several times that if you want
YAFFS to be fast, you should start by looking at the speed of your NAND driver.)

Of course, we're not comparing apples with apples here; the S3C2410X is an
ARM9 whose CPU clock runs at 200MHz, but the EA LPC2468 is an ARM7TDMI
running at just 48MHz, but even so the speed-up given by hardware ECC
demonstrates that option to be a no-brainer.

BTW: Some profiling and souping up is on my todo list, and some more
benchmarking will probably happen at that time. When I implement hardware
ECC support on the STM3210E I intend to produce some before and after numbers.


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-16  1:32           ` Jonathan Larmour
  2009-10-19  9:56             ` Ross Younger
@ 2009-10-19 14:21             ` Rutger Hofman
  2009-10-20  3:21               ` Jonathan Larmour
  1 sibling, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-19 14:21 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
>>> Rutger Hofman wrote:
>>>> Jonathan Larmour wrote:
> Ah. Your documentation includes:
> "Before the second initialization stage, some properties of the NAND 
> system can be configured. These are en/disabling of ECC generation and 
> en/disabling of a Bad Block Table (BBT). By default, ECC and BBT are 
> enabled."
> 
> I hadn't noticed that these are not in fact CDL configuration options. 
> They ought to be really.

OK. I'd say no-ECC is a property of the application/ANC and no-BBT is a 
property of a chip. I'll move these to CDL.

When I am done with this and a number of other small changes, I'll put 
up a new revision.

>>>> This is a bit hairy in my opinion, and one reason is that there is 
>>>> no Standard Layout for the spare areas. One case where a BBT is 
>>>> forced: my BlackFin NFC can be used to boot from NAND, but it 
>>>> enforces a spare layout that is incompatible with MTD or anybody. It 
>>>> is even incompatible with most chips' specification that the first 
>>>> byte of spare in the first page of the block is the Bad Block 
>>>> Marker. BlackFin's boot layout uses this first byte in a way that 
>>>> suits it, and it may be 0 -- which would otherwise mean Bad Block.
>>>
>>>
>>> I infer that your layer can cope with that? I didn't see the handling 
>>> for that in io_nand_chip_bad_block.c.
>>
>> No (not yet).
> 
> If it doesn't sound too silly, how were you able to test your layer on 
> the bfin then?

Well, this mode is *only* for booting a BlackFin from NAND, and for 
writing the boot blocks into NAND. For all other usage, one still uses 
the standard MTD spare layout.

>> To use the NAND controller in this way, a different spare layout must 
>> be used for the chip. Although there are no obstacles to selecting 
>> different spare layouts, there is no support for that yet. It would 
>> require one extra parameter in the chip device struct 'constructor' 
>> (e.g. with NULL for 'choose default = MTD compatible'). 
> 
> You mean CYG_NAND_DRIVER_CHIP()? Presumably some way to pass a different 
> layout?
> 
> I was wondering about the extensibility of the spare layouts given how 
> much stuff is sort of hard coded - sure it may fit plenty of existing 
> chips but more can come along. In the chip ID table in io_nand_chip.c I 
> haven't worked out what the layout field is for - I can't find where it 
> is used. I also can't see how a driver in a new port can add a new chip 
> with a new layout. There's talk in read_id() of being able to do a 
> custom chip device (does that mean also you can do a custom spare 
> layout?), but it's not clear to me how a new port can add that to the 
> table since read_id() only searches the table. Obviously it's not 
> sensible to have to keep changing io/nand, and may be inappropriate for 
> custom spare layouts.

OK, a custom spare layout now is a parameter to CYG_NAND_DRIVER_CHIP(). 
The layout is used in the common controller code, see calls to function 
spare_scatter_fill() and spare_scatter_extract(). These serve ECC and 
application spare slots in the same fashion.

> NB the chip ID table can be const can't it?

Thanks, fixed.

>> For the record: MDT/Blackfin/u-boot has support for this different 
>> layout, but it is build-static. MDT cannot hot-swap layouts (at the 
>> moment).
> 
> That's reasonable given it's associated with the controller.

Well, not completely. The layout is only associated with the *boot 
mode*. For other use, there is no prescription of the layout, so it's 
typical to use the MTD layout - then one can share with Linux or u-boot. 
Rotten consequence: if one wants to program a new boot image into the 
first block(s) of the NAND, the boot-compatible layout must be used. 
OTOH, if anything else of the NAND is going to be used, the MTD layout 
is required. So hot-swapping is highly desirable

> But also, since R does not have partitioning, won't that potentially 
> interfere with compatibility with MTD? A Linux booted image may not be 
> using the whole chip as a single FS.

Linux has no fixed approach to partitioning, as I learned during the 
discussions on this list.

> [OneNAND (and things like it) fitting into R's model]
>> I will take a better look at the OneNAND datasheet. You are right, it 
>> is software-wise as different from NOR as from 'raw' NAND. My guarded 
>> guess now is that integration into R would imply a replacement of the 
>> Common Controller code (by configuration or by 'object-oriented' 
>> indirect calls over a device struct). I will report on this later.
> 
> Thanks. Although of course adding more indirection may have its own 
> disadvantages.

That guarded guess was correct. OneNAND is no raw NAND chip so R's 
common controller code won't fit. The thing to do is make a driver that 
comes in place of the common controller, as suggested above. Once ANC 
supports a pluggable controller (which I will do if it makes a 
difference), adding a OneNAND will take the same amount of effort as for 
E's implementation.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-16  4:04             ` Jonathan Larmour
@ 2009-10-19 14:51               ` Rutger Hofman
  2009-10-20  4:28                 ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-19 14:51 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, ecos-devel

Jonathan Larmour wrote:
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
[snip]
> In E's case, in the EA LPC2468 port example, they have the following in 
> the platform HAL for a port (although it could be a package instead):
> 
> [various functions/macros defined which are used by k9fxx08x0x.inl]
> #include <cyg/devs/nand/k9fxx08x0x.inl>
> CYG_NAND_DEVICE(ea_nand, "onboard", &k9f8_funs, &_k9_ea_lpc2468_priv,
>                 &linux_mtd_ecc, &nand_mtd_oob_64);
> 
> which succinctly brings together the chip driver, accessor functions, 
> ECC algorithm, and OOB layout. It becomes easy for a board port to 
> choose some different chips/layouts/ECC. There's flexibility for the 
> future in that.

Yes, in R that is all in the board's CDL. I am unsure what that means 
w.r.t. flexibility.

> With R's implementation, there seems to be much more code involved. And 
> I sort of see why there's more code, and I sort of don't. Not just in 
> the generic layer, but in the drivers as well, at least looking at the 
> bfin chip, and I don't think the differences are completely explained by 
> the hardware properties of each NFC (but I'm very willing to be 
> corrected!). Comparing E's k9_read_page() along with everything it 
> calls, with R's bfin_nfc_data_read() along with everything it calls (and 
> those call etc. not just in bfin_nfc.c but also 
> nand_ez_kit_bf548.inc[1]) there's a huge difference. If nothing else 
> from what I can tell this may then require a much larger porting effort, 
> compared to E's.

The BlackFin nfc reads/writes in small sub-pages so doing a 512B or 2KB 
read/write needs a loop to traverse the sub-pages. More complexity in 
bfin_nfc_data_read is added because there is support for random 
sub-small-page reads too - ultimately a consequence of the outer API 
capability to do random reads. And things are complicated because the 
BFin NFC wants a handshake with each data byte/word read.

The code in the .inc file (chip select) is more generic than it should 
be. It has support for multiple, possibly heterogeneous, chips, while 
there is only one NAND chip on the EZ-Kit. It figures out at run-time 
though that the only thing to do is handle the CHIP_ENABLE pin.

Shameless plug: I think you might also consider to take a look at the 
example GPIO controller driver that I bundled. That is intended to show 
how small a device-specific controller driver can actually be.

> I see that some of the reasons for larger code in R are due to run-time 
> testing of hardware properties: 8 vs 16-bit bus width, SP vs LP vs ONFI. 
> I also note that E's implementation doesn't do as much error checking as 
> I think it ought to, especially in the Samsung K9 chip driver. But 
> that's not all of it the difference.

> [1] which should really be .inl for consistency in eCos but that's a detail

I'll fix that too.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-16  7:29 ` Simon Kallweit
  2009-10-16 13:53   ` Jonathan Larmour
@ 2009-10-19 15:02   ` Rutger Hofman
  1 sibling, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-19 15:02 UTC (permalink / raw)
  To: Simon Kallweit; +Cc: Jonathan Larmour, Ross Younger, eCos developers

Simon Kallweit wrote:
> ... What 
> I dislike is how the synthetic chips have to be configured. R's 
> implementation needs the user to assign a valid NAND chip device id, 
> which will then be used through chip interrogation to determine the 
> chips geometry. I find it much more useful if you can directly define 
> the chips geometry in CDL, as it's more explicit.

This emulates the read_id command in the same way as the other commands. 
I saw no way around this because the generic code calls read_id and 
expects an answer in the prescribed way.

Well, this all (again) does mean that R's (my) common controller stuff 
is really tied to the raw NAND chips. See my other responses for what to 
do if anything else is needed.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-19 10:53           ` Ross Younger
@ 2009-10-20  1:40             ` Jonathan Larmour
  2009-10-20 10:17               ` Ross Younger
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-20  1:40 UTC (permalink / raw)
  To: Ross Younger
  Cc: Rutger Hofman, Jürgen Lambrecht, eCos developers, Deroo Stijn

Ross Younger wrote:
> Jonathan Larmour wrote:
> 
>>I wonder if Ross has any performance data for E he could contribute?
>
> I have done a little benchmarking and so have _some_ numbers to hand, but
> the goalposts are moving and my figures are a bit old and must be treated
> with caution...
> 
> On the EA LPC2468 board (Samsung K9 NAND chip), with the state of my code on
> July 8, compiling with -O2 and asserts off, my NAND benchmarker reported
> average page read times[*] of 3578us per page, programming 2680us, and
> erasing 1848us. These stack up against the fastest-possible raw chip times
> (which I computed from the "typical" times on the datasheet) of 88.5, 363.5
> and 2000us.

To double check, you mean reading was slowest, programming was faster and 
erasing was fastest, even apparently faster than what may be the 
theoretical fastest time? (I use the term "fast" advisedly, mark).

Are you sure there isn't a problem with your driver to cause such figures? :-)

> This led to a YAFFS throughput data rate, on a recently-erased NAND array,
> of up to 480kB/s in reading and 578kB/s in writing. (Actual rates vary
> depending on the size of chunk you pass to read() and write().)

I wonder if Rutger has the ability to compare with his YAFFS throughput. 
OTOH, as you say, the controller plays a large part, and there's no common 
ground with R so it's entirely possible no comparison can be fair for 
either implementation.

> The board is based on the Samsung S3C2410X ucontroller and carries the same
> Samsung K9 NAND chip as on the EA LPC2468. Now, this CPU has a dedicated
> NAND controller with hardware ECC... After I taught the library to use h/w
> ECC I immediately saw a 46% speedup on reads and 38% on writes when compared
> with software ECC. I've also added an option to do a partial loop unroll in
> the read and write cycles which gives a further 4% boost on reads and 15% on
> writes.

Just to be sure, are the differences measured by these percentages purely 
in terms of overall data throughput per time?

I'm very interested in the fact that software changes you made, had such a 
relatively large change to the performance. If that's true, this seems to 
go against the possibility that waiting for hardware (the NAND chip) may 
have figured as the dominating component of the time (which would mean the 
software components of the overall time are lost in the noise). Instead 
the software latency required in setting up the next operation can be 
noticeable - which was my concern with R in my mail of 2009-10-15 which 
you're replying to.

> The current (work-in-progress) numbers I have from the benchmarker
> are 452us per page read, 623us per write and 1934us per erase; YAFFS
> throughput is similarly impressive at 4690 kB/s in reads and 3432 kB/s in
> writes. (Charles Manning has stated publicly several times that if you want
> YAFFS to be fast, you should start by looking at the speed of your NAND driver.)

Hmm, as opposed to what though? YAFFS itself isn't able to change much.

> Of course, we're not comparing apples with apples here; the S3C2410X is an
> ARM9 whose CPU clock runs at 200MHz, but the EA LPC2468 is an ARM7TDMI
> running at just 48MHz, but even so the speed-up given by hardware ECC
> demonstrates that option to be a no-brainer.

Hence my surprise at E not having support, even in principle, before! But 
clearly you're at the stage where stuff is nearly working. I look forward 
to a code drop, as the APIs would benefit from comparison with R's. It 
looks like R has considered a variety of interesting ECC hardware so it 
would be interesting to see if E's could cope.

> BTW: Some profiling and souping up is on my todo list, and some more
> benchmarking will probably happen at that time. When I implement hardware
> ECC support on the STM3210E I intend to produce some before and after numbers.

Just as an aside, you may find that improving eCos more generally to have 
e.g. assembler optimised implementation of memcpy/memmove/memset (and 
possibly others) may improve performance of these and other things across 
the board. GCC's intrinsics can only do so much. (FAOD actual 
implementations to use (at least to start with) can be found in newlib.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-19 14:21             ` Rutger Hofman
@ 2009-10-20  3:21               ` Jonathan Larmour
  2009-10-20 12:19                 ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-20  3:21 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers

Rutger Hofman wrote:
> Jonathan Larmour wrote:
>> Rutger Hofman wrote:
>>> Jonathan Larmour wrote:
>>>> Rutger Hofman wrote:
>>
>> Ah. Your documentation includes:
>> "Before the second initialization stage, some properties of the NAND 
>> system can be configured. These are en/disabling of ECC generation and 
>> en/disabling of a Bad Block Table (BBT). By default, ECC and BBT are 
>> enabled."
>>
>> I hadn't noticed that these are not in fact CDL configuration options. 
>> They ought to be really.
> 
> 
> OK. I'd say no-ECC is a property of the application/ANC and no-BBT is a 
> property of a chip. I'll move these to CDL.
>
> When I am done with this and a number of other small changes, I'll put 
> up a new revision.

Sure, thanks. I'm making a note that it will be done, so don't worry about 
getting it done.

>>>>> This is a bit hairy in my opinion, and one reason is that there is 
>>>>> no Standard Layout for the spare areas. One case where a BBT is 
>>>>> forced: my BlackFin NFC can be used to boot from NAND, but it 
>>>>> enforces a spare layout that is incompatible with MTD or anybody. 
>>>>> It is even incompatible with most chips' specification that the 
>>>>> first byte of spare in the first page of the block is the Bad Block 
>>>>> Marker. BlackFin's boot layout uses this first byte in a way that 
>>>>> suits it, and it may be 0 -- which would otherwise mean Bad Block.
>>> [ R's layer can't do this yet ]
>>
>> If it doesn't sound too silly, how were you able to test your layer on 
>> the bfin then?
> 
> Well, this mode is *only* for booting a BlackFin from NAND, and for 
> writing the boot blocks into NAND. For all other usage, one still uses 
> the standard MTD spare layout.

I understand that the bfin's boot loader oddities aren't really relevant 
in themselves (since we don't support bfin in the repository yet anyway :-)).

>> I was wondering about the extensibility of the spare layouts given how 
>> much stuff is sort of hard coded - sure it may fit plenty of existing 
>> chips but more can come along. In the chip ID table in io_nand_chip.c 
>> I haven't worked out what the layout field is for - I can't find where 
>> it is used. I also can't see how a driver in a new port can add a new 
>> chip with a new layout. There's talk in read_id() of being able to do 
>> a custom chip device (does that mean also you can do a custom spare 
>> layout?), but it's not clear to me how a new port can add that to the 
>> table since read_id() only searches the table. Obviously it's not 
>> sensible to have to keep changing io/nand, and may be inappropriate 
>> for custom spare layouts.
> 
> 
> OK, a custom spare layout now is a parameter to CYG_NAND_DRIVER_CHIP(). 

Ok, that should help.

> The layout is used in the common controller code, see calls to function 
> spare_scatter_fill() and spare_scatter_extract(). These serve ECC and 
> application spare slots in the same fashion.

I was referring to the "layout" field of cyg_nand_chip_id_t. Whereas 
spare_scatter_fill()/extract() use the spare_layout field of the 
cyg_nand_chip_info_t. I haven't yet found where the "layout" field of 
cyg_nand_chip_id_t is used.

>>> For the record: MDT/Blackfin/u-boot has support for this different 
>>> layout, but it is build-static. MDT cannot hot-swap layouts (at the 
>>> moment).
>>
>> That's reasonable given it's associated with the controller.
> 
> Well, not completely. The layout is only associated with the *boot 
> mode*. For other use, there is no prescription of the layout, so it's 
> typical to use the MTD layout - then one can share with Linux or u-boot. 
> Rotten consequence: if one wants to program a new boot image into the 
> first block(s) of the NAND, the boot-compatible layout must be used. 

So I believe that means if you want to rewrite the boot program on 
Blackfin _and_ use a normal layout, you wouldn't be able to use R as it 
stands at the moment. Ack, OK.

> OTOH, if anything else of the NAND is going to be used, the MTD layout 
> is required. So hot-swapping is highly desirable

Hot swapping sounds like a risky thing to do, with potentially multiple 
subsystems wanting access to flash, not just the application. Anyway, I 
think I'm getting myself bogged down in detail. I don't believe E could 
support this bootloader either without a BBT (without modification). So I 
don't think there's anything to distinguish the two implementations when 
it comes to the oddities of the blackfin boot loader.

>> But also, since R does not have partitioning, won't that potentially 
>> interfere with compatibility with MTD? A Linux booted image may not be 
>> using the whole chip as a single FS.
> 
> 
> Linux has no fixed approach to partitioning, as I learned during the 
> discussions on this list.

Yes, but that isn't the same as not supporting partitions. Just that they 
aren't fixed. They're specified on the kernel boot line, and so can be 
configured by the user arbitrarily.... it's just that you can't change 
them without rebooting. So I believe anyone who is using multiple 
partitions with MTD would have difficulties interoperating with R.

>> [OneNAND (and things like it) fitting into R's model]
>>
>>> My guarded 
>>> guess now is that integration into R would imply a replacement of the 
>>> Common Controller code (by configuration or by 'object-oriented' 
>>> indirect calls over a device struct). I will report on this later.
>>
>> Thanks. Although of course adding more indirection may have its own 
>> disadvantages.
> 
> That guarded guess was correct. OneNAND is no raw NAND chip so R's 
> common controller code won't fit. The thing to do is make a driver that 
> comes in place of the common controller, as suggested above. Once ANC 
> supports a pluggable controller (which I will do if it makes a 
> difference), adding a OneNAND will take the same amount of effort as for 
> E's implementation.

Would that not require a significant reworking and relayering of code? It 
seems to me that controller drivers and the chip drivers used by 
controller drivers under this system will still want to be able to access 
the infrastructure support for BBTs, ECCs and spare layout. From what I 
can see, preserving that without large amounts of indirection (imposing 
further performance and size hits) would pose quite some challenge. The 
result would be something really quite different to what R is like today.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-19 14:51               ` Rutger Hofman
@ 2009-10-20  4:28                 ` Jonathan Larmour
  0 siblings, 0 replies; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-20  4:28 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, ecos-devel

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> 
>> In E's case, in the EA LPC2468 port example, they have the following 
>> in the platform HAL for a port (although it could be a package instead):
>>
>> [various functions/macros defined which are used by k9fxx08x0x.inl]
>> #include <cyg/devs/nand/k9fxx08x0x.inl>
>> CYG_NAND_DEVICE(ea_nand, "onboard", &k9f8_funs, &_k9_ea_lpc2468_priv,
>>                 &linux_mtd_ecc, &nand_mtd_oob_64);
>>
>> which succinctly brings together the chip driver, accessor functions, 
>> ECC algorithm, and OOB layout. It becomes easy for a board port to 
>> choose some different chips/layouts/ECC. There's flexibility for the 
>> future in that.
> 
> Yes, in R that is all in the board's CDL. I am unsure what that means 
> w.r.t. flexibility.

But in R that CDL doesn't allow for linking together rather more arbitrary 
implementations - instead it's one of a set of existing implementations 
provided by R. Although from what you said in the other mail, the spare 
layout at least has now been tackled by allowing a custom layout.

>> With R's implementation, there seems to be much more code involved. 
>> And I sort of see why there's more code, and I sort of don't. Not just 
>> in the generic layer, but in the drivers as well, at least looking at 
>> the bfin chip, and I don't think the differences are completely 
>> explained by the hardware properties of each NFC (but I'm very willing 
>> to be corrected!). Comparing E's k9_read_page() along with everything 
>> it calls, with R's bfin_nfc_data_read() along with everything it calls 
>> (and those call etc. not just in bfin_nfc.c but also 
>> nand_ez_kit_bf548.inc[1]) there's a huge difference. If nothing else 
>> from what I can tell this may then require a much larger porting 
>> effort, compared to E's.
> 
> The BlackFin nfc reads/writes in small sub-pages so doing a 512B or 2KB 
> read/write needs a loop to traverse the sub-pages.

Ok.

> More complexity in 
> bfin_nfc_data_read is added because there is support for random 
> sub-small-page reads too - ultimately a consequence of the outer API 
> capability to do random reads.

And it's true that E does not support partial reads, although it would be 
more reasonable to make partial read support an optional driver feature 
(and synthesise it otherwise, or make it a property the user can detect) - 
if nothing else I'd believe some NFCs won't support it.

> And things are complicated because the 
> BFin NFC wants a handshake with each data byte/word read.

Ok.

> The code in the .inc file (chip select) is more generic than it should 
> be. It has support for multiple, possibly heterogeneous, chips, while 
> there is only one NAND chip on the EZ-Kit. It figures out at run-time 
> though that the only thing to do is handle the CHIP_ENABLE pin.

I think it's bound up with the layering really - R's controller code has 
to able to handle any chip. In E it can be bound to the platform and so 
it's known.

> Shameless plug: I think you might also consider to take a look at the 
> example GPIO controller driver that I bundled. That is intended to show 
> how small a device-specific controller driver can actually be.

That does take quite a lot of shortcuts, such as not ensuring lines are 
asserted for long enough, nor having timeouts. But yes if the bfin driver 
is odd, and this driver is more typical then this would make a device 
driver roughly the same size as E. But I'm not entirely sure how 
consistently true that would be in general for NFCs due to that layer not 
knowing about properties of the underlying chip driver, and instead having 
to determine things at runtime, e.g. 8 vs 16-bit access, etc. For example, 
E has a much simpler mechanism to handle what R requires both 
data_read_8() and data_read() methods in the nand controller functions to 
supply.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-20  1:40             ` Jonathan Larmour
@ 2009-10-20 10:17               ` Ross Younger
  2009-10-21  2:06                 ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Ross Younger @ 2009-10-20 10:17 UTC (permalink / raw)
  To: Jonathan Larmour
  Cc: Rutger Hofman, Jürgen Lambrecht, eCos developers, Deroo Stijn

Jonathan Larmour wrote:
> To double check, you mean reading was slowest, programming was faster
> and erasing was fastest, even apparently faster than what may be the
> theoretical fastest time? (I use the term "fast" advisedly, mark).
> 
> Are you sure there isn't a problem with your driver to cause such
> figures? :-)

Those are the raw numbers. Yes, I agree that they don't appear to make
sense. As I said, profiling - which will include figuring out what's going
on here - is languishing on the todo list ...


> I wonder if Rutger has the ability to compare with his YAFFS throughput.
> OTOH, as you say, the controller plays a large part, and there's no
> common ground with R so it's entirely possible no comparison can be fair
> for either implementation.

The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only
through fileio so ought to be trivially portable. It doesn't appear in my
last drop on the bz ticket, but will when I get round to freshening it.


>> After I taught the library to use h/w
>> ECC I immediately saw a 46% speedup on reads and 38% on writes when
>> compared with software ECC [...]
> 
> Just to be sure, are the differences measured by these percentages
> purely in terms of overall data throughput per time?

These are from my raw NAND benchmarks (tests/rwbenchmark.c) which measure
the end-to-end time taken for a whole cyg_nand_page_read() / write /
block_erase call to return.


> I'm very interested in the fact that software changes you made, had such
> a relatively large change to the performance. 


> [hardware ECC]
> Hence my surprise at E not having support, even in principle, before!
> But clearly you're at the stage where stuff is nearly working. 

I was surprised too; but then I had been operating under the general mantra
of "first make it work, then make it work fast" and the speed work is still
in progress ...

To be clear: hwecc _is_ working well, on this customer port, and getting it
going on the STM3210E is on the cards so I have something I can usefully
share publicly.


> Just as an aside, you may find that improving eCos more generally to
> have e.g. assembler optimised implementation of memcpy/memmove/memset
> (and possibly others) may improve performance of these and other things
> across the board. GCC's intrinsics can only do so much. (FAOD actual
> implementations to use (at least to start with) can be found in newlib.

The speedups in my NAND driver on this board came from a straightforward
Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR;
16-way and 32-way unrolls seemed to add a smidgen more performance but
increased code size perhaps disproportionately. (Using the existing VECTOR
macro but with -funroll-loops gave a similar speed-up but more noticeable
code bloat across the board.)

The word copies in newlib's memcpy et al look like they would boost
performance generally, but I have attempted to avoid copying data around as
far as possible in my layer. I don't see them as helping at all with NAND
device access: you have to make a sequence of 8-bit or 16-bit writes to the
MMIO register, and that's that. This is pretty much the same situation as
Tom Duff found himself in ...

To try and fit with the eCos philosophy, I've left the localised unroll as a
CDL option in this driver, defaulting to off. I expect similar unrolls would
be profitable in other NAND drivers, but a more generalised solution might
be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to
configure whether and how far it was unrolled?


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-20  3:21               ` Jonathan Larmour
@ 2009-10-20 12:19                 ` Rutger Hofman
  2009-10-21  1:45                   ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-20 12:19 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
> Rutger Hofman wrote:
> I was referring to the "layout" field of cyg_nand_chip_id_t. Whereas spare_scatter_fill()/extract() use the spare_layout field of the cyg_nand_chip_info_t. I haven't yet found where the "layout" field of cyg_nand_chip_id_t is used. 

This represents the fourth byte of the 'regular' read_ID response. 
(First is manufacturer, second is device type, third is often 0, but may 
be defined to give further architectural details.) The specx for this 
layout byte:

bit   meaning               description
0..1  1KB, 2KB, 4KB, 8KB    page size
2     8, 16                 spare bytes per 512 data bytes
4..5  64KB, 128KB, 256KB,   block size (without spare)
6     x8, x16               bus width

The other bits differ across manufacturers.

Reading and parsing this byte is still insufficient to fully access the 
chip: its total size is unknown. It can be in the third byte, in which 
case the chip table needn't wouldn't need an entry for this chip because 
it is sufficiently self-descriptive.

>>>> For the record: MDT/Blackfin/u-boot has support for this different 
>>>> layout, but it is build-static. MDT cannot hot-swap layouts (at the 
>>>> moment).
>>>
>>> That's reasonable given it's associated with the controller.
>>
>> Well, not completely. The layout is only associated with the *boot 
>> mode*. For other use, there is no prescription of the layout, so it's 
>> typical to use the MTD layout - then one can share with Linux or 
>> u-boot. Rotten consequence: if one wants to program a new boot image 
>> into the first block(s) of the NAND, the boot-compatible layout must 
>> be used. 
> 
> So I believe that means if you want to rewrite the boot program on 
> Blackfin _and_ use a normal layout, you wouldn't be able to use R as it 
> stands at the moment. Ack, OK.

Hot-swapping is necessary to pull this tric. It would be doable in my 
current svn version (which is coming up one of these days). It would 
imply crossing through all layers though, and it would indeed be 
horribly risky.

>> OTOH, if anything else of the NAND is going to be used, the MTD layout 
>> is required. So hot-swapping is highly desirable
> 
> Hot swapping sounds like a risky thing to do, with potentially multiple 
> subsystems wanting access to flash, not just the application.

[snip]

>>> But also, since R does not have partitioning, won't that potentially 
>>> interfere with compatibility with MTD? A Linux booted image may not 
>>> be using the whole chip as a single FS.
>>
>>
>> Linux has no fixed approach to partitioning, as I learned during the 
>> discussions on this list.
> 
> Yes, but that isn't the same as not supporting partitions. Just that 
> they aren't fixed. They're specified on the kernel boot line, and so can 
> be configured by the user arbitrarily.... it's just that you can't 
> change them without rebooting. So I believe anyone who is using multiple 
> partitions with MTD would have difficulties interoperating with R.

If this is so (I am not conversant enough yet with Linux internals) then 
it is a very useful addition. Using multiple YAFFS devices is 
straightforward right now (I did this and met no problems).

>>> [OneNAND (and things like it) fitting into R's model]
>>>
>>>> My guarded guess now is that integration into R would imply a 
>>>> replacement of the Common Controller code (by configuration or by 
>>>> 'object-oriented' indirect calls over a device struct). I will 
>>>> report on this later.
>>>
>>> Thanks. Although of course adding more indirection may have its own 
>>> disadvantages.
>>
>> That guarded guess was correct. OneNAND is no raw NAND chip so R's 
>> common controller code won't fit. The thing to do is make a driver 
>> that comes in place of the common controller, as suggested above. Once 
>> ANC supports a pluggable controller (which I will do if it makes a 
>> difference), adding a OneNAND will take the same amount of effort as 
>> for E's implementation.
> 
> Would that not require a significant reworking and relayering of code? 
> It seems to me that controller drivers and the chip drivers used by 
> controller drivers under this system will still want to be able to 
> access the infrastructure support for BBTs, ECCs and spare layout. From 
> what I can see, preserving that without large amounts of indirection 
> (imposing further performance and size hits) would pose quite some 
> challenge. The result would be something really quite different to what 
> R is like today.

The reworking would just be to replace the calls in the ANC of the 
controller-common API with indirect calls, and supporting this 
configurability in the CDL. OneNAND doesn't need ECC code because things 
are handled in hardware (the datasheet recommends not even checking the 
ECC status register for 2-bit failures because they are so rare). Maybe 
BBT is necessary too for OneNAND; I didn't think that through yet, but I 
would hope the BBT implementation would support different controllers 
without a lot of reworking - right now, accessing the chip already goes 
through controller calls. The indirections would be few: one for each 
top-level API call (unless the ANC must redistribute application pages 
to chip pages, which is only in case of heterogeneous chips).

In conclusion: a driver in R for a different class of chip than 'raw 
NAND' would indeed replace R's controller-common, any 
controller-device-specific, and any chip, because these are all code for 
raw NAND. I would hope that linker magic remove the code as unused from 
the binary.

A driver for OneNAND would still need to be written obviously, and it 
would share no or little code with controller-common and raw chip. My 
guess is that would hold for E too - although I don't really know 
because I have had no time to review E's code.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-20 12:19                 ` Rutger Hofman
@ 2009-10-21  1:45                   ` Jonathan Larmour
  2009-10-21 12:15                     ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-21  1:45 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> 
>> Rutger Hofman wrote:
>> I was referring to the "layout" field of cyg_nand_chip_id_t. Whereas 
>> spare_scatter_fill()/extract() use the spare_layout field of the 
>> cyg_nand_chip_info_t. I haven't yet found where the "layout" field of 
>> cyg_nand_chip_id_t is used. 
> 
> 
> This represents the fourth byte of the 'regular' read_ID response. 
> (First is manufacturer, second is device type, third is often 0, but may 
> be defined to give further architectural details.) The specx for this 
> layout byte:
> 
[snip]

Okay. So the fact the byte is in the table is for completeness, even 
though read_id reads it out and doesn't need to compare it. Got it, thanks.

>> So I believe that means if you want to rewrite the boot program on 
>> Blackfin _and_ use a normal layout, you wouldn't be able to use R as 
>> it stands at the moment. Ack, OK.
> 
> 
> Hot-swapping is necessary to pull this tric. It would be doable in my 
> current svn version (which is coming up one of these days). It would 
> imply crossing through all layers though, and it would indeed be 
> horribly risky.

Okay.

[snip]
>>> That guarded guess was correct. OneNAND is no raw NAND chip so R's 
>>> common controller code won't fit. The thing to do is make a driver 
>>> that comes in place of the common controller, as suggested above. 
>>> Once ANC supports a pluggable controller (which I will do if it makes 
>>> a difference), adding a OneNAND will take the same amount of effort 
>>> as for E's implementation.
>>
>> Would that not require a significant reworking and relayering of code? 
>> It seems to me that controller drivers and the chip drivers used by 
>> controller drivers under this system will still want to be able to 
>> access the infrastructure support for BBTs, ECCs and spare layout. 
>> From what I can see, preserving that without large amounts of 
>> indirection (imposing further performance and size hits) would pose 
>> quite some challenge. The result would be something really quite 
>> different to what R is like today.
> 
> The reworking would just be to replace the calls in the ANC of the 
> controller-common API with indirect calls, and supporting this 
> configurability in the CDL. OneNAND doesn't need ECC code because things 
> are handled in hardware (the datasheet recommends not even checking the 
> ECC status register for 2-bit failures because they are so rare).

As I mentioned when I brought up OneNAND, my concern was really more 
general: that the layer is at present only intended for a particular 
access model. OneNAND is only a current example of where this assumption 
breaks, there may be more in the future (or now).

> Maybe 
> BBT is necessary too for OneNAND; I didn't think that through yet,

I would have thought so personally.

> but I 
> would hope the BBT implementation would support different controllers 
> without a lot of reworking - right now, accessing the chip already goes 
> through controller calls. The indirections would be few: one for each 
> top-level API call (unless the ANC must redistribute application pages 
> to chip pages, which is only in case of heterogeneous chips).

It's not just indirecting the functions, but checking what may need 
changing for anything which accesses the controller data, e.g. the 
contents of struct CYG_NAND_CTL.

It's not clear how the stuff in src/chip/ would map onto a different 
controller model.

There still seems to me to be challenges in how spare layout is managed in 
the abstracted NFC case.

I'm not saying it can't be done - it's all software so it definitely can. 
But it seems to me there would be quite an upheaval to get from here to there.

> In conclusion: a driver in R for a different class of chip than 'raw 
> NAND' would indeed replace R's controller-common, any 
> controller-device-specific, and any chip, because these are all code for 
> raw NAND. I would hope that linker magic remove the code as unused from 
> the binary.

It depends how you would envisage it is selected. What I would have 
thought would probably be a CDL option (which lives in CYGPKG_IO_NAND) 
which is required by either the platform NAND package or platform HAL 
package. Enabling this CDL option would build the existing high level 
controller layer (in CYGPKG_IO_NAND) into extras.o to force inclusion in 
the image (probably via a new HAL table). I doubt linker magic would help 
in particular. So when that option is not required by the platform it 
remains disabled and the code is not built.

> A driver for OneNAND would still need to be written obviously, and it 
> would share no or little code with controller-common and raw chip. My 
> guess is that would hold for E too - although I don't really know 
> because I have had no time to review E's code.

That's correct as I see it.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-20 10:17               ` Ross Younger
@ 2009-10-21  2:06                 ` Jonathan Larmour
  2009-10-22 10:05                   ` Ross Younger
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-21  2:06 UTC (permalink / raw)
  To: Ross Younger; +Cc: Rutger Hofman, eCos developers

Ross Younger wrote:
> Jonathan Larmour wrote:
> 
>>To double check, you mean reading was slowest, programming was faster
>>and erasing was fastest, even apparently faster than what may be the
>>theoretical fastest time? (I use the term "fast" advisedly, mark).
>>
>>Are you sure there isn't a problem with your driver to cause such
>>figures? :-)
> 
> 
> Those are the raw numbers. Yes, I agree that they don't appear to make
> sense. As I said, profiling - which will include figuring out what's going
> on here - is languishing on the todo list ...

Ok, although I think I may have to take those particular figures with a 
pinch of salt, given they are.... unexpected.

>>I wonder if Rutger has the ability to compare with his YAFFS throughput.
>>OTOH, as you say, the controller plays a large part, and there's no
>>common ground with R so it's entirely possible no comparison can be fair
>>for either implementation.
> 
> 
> The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only
> through fileio so ought to be trivially portable. It doesn't appear in my
> last drop on the bz ticket, but will when I get round to freshening it.

Ok. Although I'm not sure how long these discussions will continue for. 
Although not ideal, running it on both the synth targets may be the only 
way to compare.

> To be clear: hwecc _is_ working well, on this customer port, and getting it
> going on the STM3210E is on the cards so I have something I can usefully
> share publicly.

Can you at least shed light on the API changes (by cut'n'pasting relevant 
sections of headers/code even if not the whole thing)? I feel this is a 
key thing to get clarity on since I don't have a view on that yet and it's 
an important feature. I know doc etc. will want updating, but in reality 
we can probably get a good idea from an overview of the code, even if it's 
not a complete self-contained package drop. It may also save you time. I 
don't doubt it works on your port, but I think I need to get a view on how 
well it would fit with other hardware ECC systems which I know about and 
those which R goes to pains to support.

I also realise people have busy lives so no worries if you can't do it for 
a few days if there's more than a little effort involved; although I'd 
have thought it wouldn't be much.

>>Just as an aside, you may find that improving eCos more generally to
>>have e.g. assembler optimised implementation of memcpy/memmove/memset
>>(and possibly others) may improve performance of these and other things
>>across the board. GCC's intrinsics can only do so much. (FAOD actual
>>implementations to use (at least to start with) can be found in newlib.
> 
> 
> The speedups in my NAND driver on this board came from a straightforward
> Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR;
> 16-way and 32-way unrolls seemed to add a smidgen more performance but
> increased code size perhaps disproportionately. (Using the existing VECTOR
> macro but with -funroll-loops gave a similar speed-up but more noticeable
> code bloat across the board.)

OOI you could add it to the per-package CFLAGS (unless you meant you 
already did and the bloat was noticeable even just there).

> The word copies in newlib's memcpy et al look like they would boost
> performance generally, but I have attempted to avoid copying data around as
> far as possible in my layer.

I was mostly thinking of YAFFS in fact (and fileio on top), although I 
haven't really looked at the extent they depend on bulk memory moves.

> To try and fit with the eCos philosophy, I've left the localised unroll as a
> CDL option in this driver, defaulting to off. I expect similar unrolls would
> be profitable in other NAND drivers, but a more generalised solution might
> be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to
> configure whether and how far it was unrolled?

Possible, although there are probably bigger fish to fry.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-21  1:45                   ` Jonathan Larmour
@ 2009-10-21 12:15                     ` Rutger Hofman
  2009-10-23 14:06                       ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Rutger Hofman @ 2009-10-21 12:15 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
[How would different hardware, like OneNAND, fit into R's model?]
>>> Would that not require a significant reworking and relayering of 
>>> code? It seems to me that controller drivers and the chip drivers 
>>> used by controller drivers under this system will still want to be 
>>> able to access the infrastructure support for BBTs, ECCs and spare 
>>> layout. From what I can see, preserving that without large amounts of 
>>> indirection (imposing further performance and size hits) would pose 
>>> quite some challenge. The result would be something really quite 
>>> different to what R is like today.
>>
>> The reworking would just be to replace the calls in the ANC of the 
>> controller-common API with indirect calls, and supporting this 
>> configurability in the CDL. OneNAND doesn't need ECC code because 
>> things are handled in hardware (the datasheet recommends not even 
>> checking the ECC status register for 2-bit failures because they are 
>> so rare).
> 
> As I mentioned when I brought up OneNAND, my concern was really more 
> general: that the layer is at present only intended for a particular 
> access model. OneNAND is only a current example of where this assumption 
> breaks, there may be more in the future (or now).
> 
>> Maybe BBT is necessary too for OneNAND; I didn't think that through yet,
> 
> I would have thought so personally.
> 
>> but I would hope the BBT implementation would support different 
>> controllers without a lot of reworking - right now, accessing the chip 
>> already goes through controller calls. The indirections would be few: 
>> one for each top-level API call (unless the ANC must redistribute 
>> application pages to chip pages, which is only in case of 
>> heterogeneous chips).
> 
> It's not just indirecting the functions, but checking what may need 
> changing for anything which accesses the controller data, e.g. the 
> contents of struct CYG_NAND_CTL.

OK, to answer this question, the level of detail will go up considerably.

Lots of CYG_NAND_CTL is pointers to higher layers (anc) and lower layers 
(funs, priv, chip stuff). These would remain, although the function 
dispatch table would be reused (or unused, I don't know about OneNAND 
varieties). The ECC fields would remain too (if applicable, but nowadays 
that is a CDL option) and likewise mutex. I would guess that any state 
for the different class of controller/chip could be incorporated into priv.

So, (surprisingly to me because I didn't consider anything else than raw 
NAND), CYG_NAND_CTL seems generic enough to incorporate other types of 
NAND chip. I'd say the controller-common API must stay -- if it doesn't, 
I would be doubtful to fit it into a NAND harness. Reminder to self: ANC 
must call the controller over a function dispatcher.

CYG_NAND_CHIP would need to be split into a generic part that has page 
size, block size, num blocks, and type-specific stuff like timing and 
like the bucket-full of ONFI parameters.

Required code refactoring: a number of functions must just be lifted out 
for common usage. For spare layout, my guess is that just the 
scatter/gather functions from controller-common would be shared (small 
though they are). If ECC is desired: the configurable part of it is 
already separate; some more (cyg_nand_ctl_ecc_repair, debug stuff) can 
be usefully factored out. One more thing that should be common is the 
verification step; it had best move one level up, to ANC.

I think the rest of controller-common is tied to raw NAND, and would 
have no place in a driver for non-raw NAND hardware.

> It's not clear how the stuff in src/chip/ would map onto a different 
> controller model.

The interrogation parts would not fit at all, they would have no place 
in a driver for different hardware. Interrogation is the majority of the 
code. Common code would be BBT and bad-block queries.


If this all is going to happen, the code would be partitioned like:

top-level = anc (with verify and function dispatch)
controller = shared (ECC, spare), raw NAND, OneNAND, ...
chip = shared (BBT, bad-block query), raw NAND, OneNAND, ...
... and unmodified device-specific controller, platform stuff, ...

The CDL would piece together the desired components, like you suggested. 
I think that both the public API and the interface to device-specific 
drivers will remain intact, so clients won't really notice (except 
possiby in their CDL).

I guess that this refactoring will take something like one or a few 
days' work, including having ANC call the controller over a dispatch 
table. I'll be glad to do it (ETA: somewhere in the next 1 to 1.5 months).

Personal note: I am glad with this kind of detailed feedback. Still, I 
would have preferred to get it when I put up the NAND design for 
discussion, about a year ago.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-21  2:06                 ` Jonathan Larmour
@ 2009-10-22 10:05                   ` Ross Younger
  2009-11-10  5:15                     ` Jonathan Larmour
  0 siblings, 1 reply; 58+ messages in thread
From: Ross Younger @ 2009-10-22 10:05 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Rutger Hofman, eCos developers

Jonathan Larmour wrote:
> Can you at least shed light on the API changes (by cut'n'pasting
> relevant sections of headers/code even if not the whole thing)?

I have changed the device interface (nand_device.h) by carving up the read
and write page functions into three. (The prototype changes are the same for
both, with only the obvious semantic differences when writing, so I'll only
paste in read for the sake of brevity.)

Reading a page used to be an all-in-one call:

int (*read_page) (cyg_nand_device *dev, cyg_nand_page_addr page,
          void * dest, size_t size, void * spare, size_t spare_size);

Now, the NAND layer calls "begin" once to set up the read:

    int (*read_begin)(cyg_nand_device *dev, cyg_nand_page_addr page);

... "stride" one or more times to actually transfer data

    int (*read_stride)(cyg_nand_device *dev, void * dest, size_t size);

... and then "finish" once to do the spare area and any finishing up that
may be necessary (e.g. send the "program confirm" command, unlock the device).

    int (*read_finish)(cyg_nand_device *dev,
                       void * spare, size_t spare_size);

The ECC interface (nand_ecc.h, not well documented) has also expanded
slightly. I had had just a 'calc' call, but have now added an 'init' call so
that any device-specific registers can be tweaked. The interaction is
perhaps best sketched out as pseudocode; here's what the NAND library looks
like in a call to read a page:

  dev->read_begin(page number);
  while (there are still bytes to send) {
    ecc->init(); // may be a no-op
    dev->read_stride(ecc->datasize bytes);
    if (ecc is hardware)
      ecc->calc(the block we've just read);
  }
  dev->read_finish(spare data);
  if (ecc is software)
    ecc->calc(the whole thing);
  ecc->repair(the whole block, looping as necessary, comparing the
calculated ECC against what's in the spare area);


I have renamed the device interface struct and macros on the grounds of
"change the interface, change the name" (but not the ECC interface, because
nothing outside that package had used it before now).


>> [-funroll-loops]
> OOI you could add it to the per-package CFLAGS (unless you meant you
> already did and the bloat was noticeable even just there).

Good point, I forgot about that. Will try when I get back around to the
go-faster stripes.


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-21 12:15                     ` Rutger Hofman
@ 2009-10-23 14:06                       ` Jonathan Larmour
  2009-10-23 15:25                         ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-10-23 14:06 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: Ross Younger, eCos developers

Rutger Hofman wrote:
> Jonathan Larmour wrote:
> 
>> Rutger Hofman wrote:
>>> but I would hope the BBT implementation would support different 
>>> controllers without a lot of reworking - right now, accessing the 
>>> chip already goes through controller calls. The indirections would be 
>>> few: one for each top-level API call (unless the ANC must 
>>> redistribute application pages to chip pages, which is only in case 
>>> of heterogeneous chips).
>>
>> It's not just indirecting the functions, but checking what may need 
>> changing for anything which accesses the controller data, e.g. the 
>> contents of struct CYG_NAND_CTL.
> 
> OK, to answer this question, the level of detail will go up considerably.
> 
> Lots of CYG_NAND_CTL is pointers to higher layers (anc) and lower layers 
> (funs, priv, chip stuff). These would remain, although the function 
> dispatch table would be reused (or unused, I don't know about OneNAND 
> varieties). The ECC fields would remain too (if applicable, but nowadays 
> that is a CDL option) and likewise mutex. I would guess that any state 
> for the different class of controller/chip could be incorporated into priv.

Ok. And associated code updates of course.

> So, (surprisingly to me because I didn't consider anything else than raw 
> NAND), CYG_NAND_CTL seems generic enough to incorporate other types of 
> NAND chip. I'd say the controller-common API must stay -- if it doesn't, 
> I would be doubtful to fit it into a NAND harness. Reminder to self: ANC 
> must call the controller over a function dispatcher.

Although there are other ways to test code than requiring it to have an 
abstract API in order to access internals. Having the anc/controller/chip 
layers as self-contained APIs necessarily incurs some overheads.

> CYG_NAND_CHIP would need to be split into a generic part that has page 
> size, block size, num blocks, and type-specific stuff like timing and 
> like the bucket-full of ONFI parameters.

Also looking at all the source files there are quite a few parts which go 
straight to the chip layer. Again I'm not saying it can't be done, but it 
looks like it requires a lot of unpicking, and making sure the right bits 
end up in the most appropriate places.

[snip]

Thanks for the various outlines.

> I guess that this refactoring will take something like one or a few 
> days' work, including having ANC call the controller over a dispatch 
> table. I'll be glad to do it (ETA: somewhere in the next 1 to 1.5 months).

I would be very surprised by a day!

> Personal note: I am glad with this kind of detailed feedback. Still, I 
> would have preferred to get it when I put up the NAND design for 
> discussion, about a year ago.

Obviously Andrew was able to advise on some aspects at the time. But also 
at that point in time, my own understanding of the requirements of a NAND 
layer wasn't as developed as it has now become so I wouldn't have been 
able to give you that level of feedback then anyway.

Although I'm reluctant to do so, I think it would probably prove valuable 
to the decision process if I could do some size and timing measurements. 
I'll have to look at that, but as I'll need to adapt E's rwbenchmark.c to 
R, it'll take me a little time.

Finally, are there any questions about E's layer that you think I should 
ask about which I haven't?

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-23 14:06                       ` Jonathan Larmour
@ 2009-10-23 15:25                         ` Rutger Hofman
  2009-10-23 18:03                           ` Rutger Hofman
                                             ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-23 15:25 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ross Younger, eCos developers

Jonathan Larmour wrote:
> Rutger Hofman wrote:
>> Jonathan Larmour wrote:
>>
>>> Rutger Hofman wrote:
[on adding support for other NAND chips than raw NAND]
> 
>> I guess that this refactoring will take something like one or a few 
>> days' work, including having ANC call the controller over a dispatch 
>> table. I'll be glad to do it (ETA: somewhere in the next 1 to 1.5 
>> months).
> 
> I would be very surprised by a day!

Yesterday, there was an unexpected lull in the usual storm of work. 
Basically, the refactoring is done so R can support hardware other than 
raw NAND. I must still update the documentation, though. The structure 
is a bit different than I first thought; there is a package IO_NAND for 
the general stuff (anc, ecc, bbt), and a package IO_NAND_RAW for the raw 
NAND. So, if somebody wants NAND but not raw NAND, that package isn't 
included so no raw NAND code.

I will put up a next release when the documentation is done. I am aware 
that changing the code while you are reviewing it is not very polite; 
maybe you prefer to stick with the code that you have right now, and 
just acknowledge any updates/refactoring I did.

An unusual problem has cropped up in my synth build. After having run my 
tests on the BlackFin, I built for synth and I am now meeting weird 
alignment behaviour in the HAL_TABLE device tabs. My gcc (gcc-4.2.real 
(GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu4)) for synth generates a directive 
.align 32 in the declaration of a device struct in the appropriate 
section, but the HAL_TABLE array has align=4. In consequence, the 
pointer to traverse the device table has the wrong stride. 'Proof': if I 
hand-align the device struct to be a multiple of 32, things work 
correctly in synth. Anybody any idea what I should do? Put in an 
align(32) directive on the device struct definitions? That is a hack to 
work around a compiler bug just for one platform, I'd say; OTOH it's bad 
that it might just bite *any* synth device driver table, not only mine.

> Finally, are there any questions about E's layer that you think I should 
> ask about which I haven't?

I am sorry, I have had no time to review E's layer. My boss tells me 
loudly that doing so is not on the critical path of our project...

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-23 15:25                         ` Rutger Hofman
@ 2009-10-23 18:03                           ` Rutger Hofman
  2009-10-27 20:02                           ` Rutger Hofman
  2009-11-10  7:03                           ` Jonathan Larmour
  2 siblings, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-23 18:03 UTC (permalink / raw)
  Cc: eCos developers

Rutger Hofman wrote:
> An unusual problem has cropped up in my synth build. After having run my 
> tests on the BlackFin, I built for synth and I am now meeting weird 
> alignment behaviour in the HAL_TABLE device tabs...

Please ignore. I had forgotten the CYG_HAL_TABLE_TYPE attribute for the 
chip driver.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-23 15:25                         ` Rutger Hofman
  2009-10-23 18:03                           ` Rutger Hofman
@ 2009-10-27 20:02                           ` Rutger Hofman
  2009-11-10  7:03                           ` Jonathan Larmour
  2 siblings, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2009-10-27 20:02 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: eCos developers

Rutger Hofman wrote:
> Jonathan Larmour wrote:
>> Rutger Hofman wrote:
>>> Jonathan Larmour wrote:
>>>
>>>> Rutger Hofman wrote:
> [on adding support for other NAND chips than raw NAND]
>>
>>> I guess that this refactoring will take something like one or a few 
>>> days' work, including having ANC call the controller over a dispatch 
>>> table. I'll be glad to do it (ETA: somewhere in the next 1 to 1.5 
>>> months).
>>
>> I would be very surprised by a day!
> 
> Yesterday, there was an unexpected lull in the usual storm of work. 
> Basically, the refactoring is done so R can support hardware other than 
> raw NAND. I must still update the documentation, though. The structure 
> is a bit different than I first thought; there is a package IO_NAND for 
> the general stuff (anc, ecc, bbt), and a package IO_NAND_RAW for the raw 
> NAND. So, if somebody wants NAND but not raw NAND, that package isn't 
> included so no raw NAND code.
> 
> I will put up a next release when the documentation is done. I am aware 
> that changing the code while you are reviewing it is not very polite; 
> maybe you prefer to stick with the code that you have right now, and 
> just acknowledge any updates/refactoring I did.

The documentation has been overhauled, too. The new NAND layout, with a 
package for Common NAND besides a package for raw NAND, is published on 
http://www.cs.vu.nl/~rutger/software/ecos/nand-flash .
The Common NAND should be able to harbour other kinds of NAND than raw NAND.

I fully understand if you will not want to review the changed code 
again. But maybe you are willing to quickly browse through the 
documentation to see if it looks OK at first glance.

Rutger

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-22 10:05                   ` Ross Younger
@ 2009-11-10  5:15                     ` Jonathan Larmour
  2009-11-10 10:38                       ` Ross Younger
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-11-10  5:15 UTC (permalink / raw)
  To: Ross Younger; +Cc: Rutger Hofman, eCos developers

[ Sorry all for the loss of momentum. Real life intervened for a while. ]

Ross Younger wrote:
> Jonathan Larmour wrote:
> 
>>Can you at least shed light on the API changes (by cut'n'pasting
>>relevant sections of headers/code even if not the whole thing)?
> 
> 
> I have changed the device interface (nand_device.h) by carving up the read
> and write page functions into three. (The prototype changes are the same for
> both, with only the obvious semantic differences when writing, so I'll only
> paste in read for the sake of brevity.)
> 
> Reading a page used to be an all-in-one call:
> 
> int (*read_page) (cyg_nand_device *dev, cyg_nand_page_addr page,
>           void * dest, size_t size, void * spare, size_t spare_size);
> 
> Now, the NAND layer calls "begin" once to set up the read:
> 
>     int (*read_begin)(cyg_nand_device *dev, cyg_nand_page_addr page);
> 
> ... "stride" one or more times to actually transfer data
> 
>     int (*read_stride)(cyg_nand_device *dev, void * dest, size_t size);
> 
> ... and then "finish" once to do the spare area and any finishing up that
> may be necessary (e.g. send the "program confirm" command, unlock the device).
> 
>     int (*read_finish)(cyg_nand_device *dev,
>                        void * spare, size_t spare_size);
> 
> The ECC interface (nand_ecc.h, not well documented) has also expanded
> slightly. I had had just a 'calc' call, but have now added an 'init' call so
> that any device-specific registers can be tweaked. The interaction is
> perhaps best sketched out as pseudocode; here's what the NAND library looks
> like in a call to read a page:
> 
>   dev->read_begin(page number);
>   while (there are still bytes to send) {
>     ecc->init(); // may be a no-op
>     dev->read_stride(ecc->datasize bytes);
>     if (ecc is hardware)
>       ecc->calc(the block we've just read);
>   }
>   dev->read_finish(spare data);
>   if (ecc is software)
>     ecc->calc(the whole thing);
>   ecc->repair(the whole block, looping as necessary, comparing the
> calculated ECC against what's in the spare area);
> 
> 
> I have renamed the device interface struct and macros on the grounds of
> "change the interface, change the name" (but not the ECC interface, because
> nothing outside that package had used it before now).

This seems reasonable at first glance. But to double check.... if you have 
an SoC with both built-in NFC and hardware ECC, it may be able to calc the 
ECC automatically as the pages are read/written through the NFC. What you 
can then get with reads is a direct result of whether the ECC has failed, 
whether it's recoverable or not. You don't have to actually look at the 
ECC value itself at any point. This is what Rutger calls ECC "syndrome" 
mode, which isn't a very descriptive term, but OTOH I can't think of 
anything much better either!). You can consider the Atmel SAM9260 to be a 
concrete example if that helps (although in that particular case you can 
see the actual computed ECC too - but (R) implies that that may not always 
be the case for reads).

So can you just confirm that (E) supports that form of hardware ECC 
implementation? Or does it actually require the ECC to be directly 
available (as opposed to potentially just being a token)? I also note with 
the SAM9260 that the hardware ECC registers get wiped if you start to 
read/write another page, so can you confirm or otherwise that no other 
NAND API user can cause that to happen in (E)? The SAM9260 also requires 
you to read the entire page data, followed immediately by the spare area 
locations where the ECC is stored. Is that also supportable by (E)?

It does seem that it is supported by (R), albeit complicated by the 
abstracted scatter-gather spare layout management.

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-10-23 15:25                         ` Rutger Hofman
  2009-10-23 18:03                           ` Rutger Hofman
  2009-10-27 20:02                           ` Rutger Hofman
@ 2009-11-10  7:03                           ` Jonathan Larmour
  2010-12-11 19:18                             ` John Dallaway
  2 siblings, 1 reply; 58+ messages in thread
From: Jonathan Larmour @ 2009-11-10  7:03 UTC (permalink / raw)
  To: Rutger Hofman; +Cc: eCos developers, Ross Younger

Rutger Hofman wrote:
> Jonathan Larmour wrote:
>> Rutger Hofman wrote:
>>> Jonathan Larmour wrote:
>>>> Rutger Hofman wrote:
> 
> [on adding support for other NAND chips than raw NAND]
>>
>>> I guess that this refactoring will take something like one or a few 
>>> days' work, including having ANC call the controller over a dispatch 
>>> table. I'll be glad to do it (ETA: somewhere in the next 1 to 1.5 
>>> months).
>>
>> I would be very surprised by a day!
> 
> Yesterday, there was an unexpected lull in the usual storm of work. 
> Basically, the refactoring is done so R can support hardware other than 
> raw NAND. I must still update the documentation, though. The structure 
> is a bit different than I first thought; there is a package IO_NAND for 
> the general stuff (anc, ecc, bbt), and a package IO_NAND_RAW for the raw 
> NAND. So, if somebody wants NAND but not raw NAND, that package isn't 
> included so no raw NAND code.

Wow! That's very interesting and despite what I said at the outset, I've 
got your updated code and will now be referring to it. I see a few rough 
edges but it appears you're already aware of some of them. It's a shame 
more of the spare layout code wasn't potentially shareable.

I think I really have to get some comparative measurements on code size 
and performance at least. It's tricky when there is no common hardware. 
Not even a common architecture (given Jurgen's SAM9260 port isn't public). 
And no common chip even then. I think that unless someone is willing to 
port one or other to a common piece of hardware, then the only recourse is 
the synthetic target.

I've now built both implementations and run all tests successfully on 
synth for both. Now I "just" need to finish porting rwbenchmark.c to (R).

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-11-10  5:15                     ` Jonathan Larmour
@ 2009-11-10 10:38                       ` Ross Younger
  2009-11-10 11:28                         ` Ethernet over SPI driver for ENC424J600 Ilija Stanislevik
  2009-11-12 18:32                         ` NAND technical review Ross Younger
  0 siblings, 2 replies; 58+ messages in thread
From: Ross Younger @ 2009-11-10 10:38 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Rutger Hofman, eCos developers

> [syndrome]
> So can you just confirm that (E) supports that form of hardware ECC
> implementation?

I've just googled up the outline of syndrome mode (which seems,
incidentally, to be at least a de facto standard term). I don't see any
reason I couldn't fit it in given my current hooks.

> I also note
> with the SAM9260 that the hardware ECC registers get wiped if you start
> to read/write another page, so can you confirm or otherwise that no
> other NAND API user can cause that to happen in (E)? 

The standard per-device locking provided by (E) will take care of this. If
there is a risk of concurrent access to other devices causing problems it's
trivial to add a further board-level lock in the driver.

> The SAM9260 also
> requires you to read the entire page data, followed immediately by the
> spare area locations where the ECC is stored. Is that also supportable
> by (E)?

It looks like that's standard for syndrome-type ECC. I think this is just a
funny OOB area layout, though as ever I'd have to try it and see to be sure.


Ross

(PS. BTW, I am still working on that fresh anon-based drop I promised the
other week, but am trying to find out why the benchmarks are so screwy on
the EA LPC2468 before I finish cutting the drop. And - speaking of ECC -
sitting in my pile of queued commits I have a major speed-up in software ECC
calculation...)

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Ethernet over SPI driver for ENC424J600
  2009-11-10 10:38                       ` Ross Younger
@ 2009-11-10 11:28                         ` Ilija Stanislevik
  2009-11-10 12:16                           ` Chris Holgate
  2009-11-12 18:32                         ` NAND technical review Ross Younger
  1 sibling, 1 reply; 58+ messages in thread
From: Ilija Stanislevik @ 2009-11-10 11:28 UTC (permalink / raw)
  To: eCos developers

Hi all,

I am developing the driver on STM32 platform and my intention is to make
it independent of platform.

The driver makes use of external interrupt (from one of the general
purpose I/O pins in case of STM32) to get interrupt request from
Ethernet chip. I've found that, in eCos 3.0 implementation for STM32,
the cyg_drv_interrupt_... functions don't set up the AFIO_EXTICRx
register, which is necessary in order to connect particular I/O pin with
the EXTI logic. The driver can always arrange for this outside the
standard functions, but such an approach produces a driver which is tied
not only to Ethernet chip, but to platform too.

I wish if some of cyg_drv_interrupt_...() could arrange the whole
external interrupt signal train based solely on the interrupt vector
given to it. This is possible for STM32, since interrupt vectors bear
unambiguous info on the assigned GPIO pin. Are there plans to provide
such functionality?

Another solution is external function provided from application within
the private data structure, to be called from drivers init function. The
driver is still platform-independent and the application programmer
should take care for marshaling of interrupt.

Thoughts or suggestions?

Ilija Stanislevik

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Ethernet over SPI driver for ENC424J600
  2009-11-10 11:28                         ` Ethernet over SPI driver for ENC424J600 Ilija Stanislevik
@ 2009-11-10 12:16                           ` Chris Holgate
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Holgate @ 2009-11-10 12:16 UTC (permalink / raw)
  To: Ilija Stanislevik; +Cc: eCos developers

Hi Ilija,

Ilija Stanislevik wrote:

> The driver makes use of external interrupt (from one of the general
> purpose I/O pins in case of STM32) to get interrupt request from
> Ethernet chip. I've found that, in eCos 3.0 implementation for STM32,
> the cyg_drv_interrupt_... functions don't set up the AFIO_EXTICRx
> register, which is necessary in order to connect particular I/O pin with
> the EXTI logic. The driver can always arrange for this outside the
> standard functions, but such an approach produces a driver which is tied
> not only to Ethernet chip, but to platform too.

I think that it's the general case that when using external interrupts
there will be some form of platform specific setup, so it's good idea to
isolate this anyway.

> Another solution is external function provided from application within
> the private data structure, to be called from drivers init function. The
> driver is still platform-independent and the application programmer
> should take care for marshaling of interrupt.

Rather than adding a callback in the private data structure, you can add
the external interrupt setup to your platform specific initialisation
code and then just include the interrupt vector ID in the private area
of your SPI device data structure.  Once the generic driver knows the
vector ID it can then take care of managing the interrupt itself.

Chris.

-- 
Christopher J. Holgate

Thinking Of The Future @ Zynaptic Limited (www.zynaptic.com)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-11-10 10:38                       ` Ross Younger
  2009-11-10 11:28                         ` Ethernet over SPI driver for ENC424J600 Ilija Stanislevik
@ 2009-11-12 18:32                         ` Ross Younger
  1 sibling, 0 replies; 58+ messages in thread
From: Ross Younger @ 2009-11-12 18:32 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: eCos developers

> (PS. BTW, I am still working on that fresh anon-based drop I promised the
> other week ...)

It's a little later than I had hoped, but I've just cut another drop against
the anoncvs codebase.

Thanks to mercurial I can now do this a lot more easily, and of course you
get the ability to interactively browse the tree and changelogs via the web
for free. (Rather than convert my entire change history, which is a bit
messy, I started with the August 26 drop and have hg-imported just the
changes since then.)

I have exposed two repositories:

1. http://hg-pub.ecoscentric.com/nand-ecoscentric/
This is a clone of the "ecos" (anoncvs import) repo, with our NAND support
patched in. (Of course, if you already have a clone of the ecos repo and you
want to play with this, you should be able to pull just the changes from
nand-ecoscentric.)

2. http://hg-pub.ecoscentric.com/yaffs-ecoscentric/
This is the union of YAFFS upstream (imported wholesale from CVS), our
changes required to put it into the correct directory layout etc. and the
porting layer to bring it into eCos.
This is a separate repo because of the licence and so it's easy to keep
track of upstream's changes.

Note:
* I will only sync from ecos to nand-ecoscentric, from yaffs upstream to
yaffs-ecoscentric, and from my internal repos to both of these, from time to
time.
* eCosCentric may pull these repositories at any time.

====================================================================
Headline changes since Aug 26

NAND:
* v2 device interface created, and later rationalised. read_page and
write_page were split into three (begin, stride, finish); the optional
read_part_page was added.
* ECC interface expanded to better allow hardware assistance.
* Software ECC completely rewritten for major speedup (over 20x faster on my
workstation, a more modest 3x faster on the ea2468).

NAND devices: Updated to suit v2 interface.

ea2468 HAL, stm3210e HAL, synth nand: Various significant speed boosts.

====================================================================
Benchmark results

It occurred to me that I could run dhrystone while I was at it to try and
provide some degree of comparison between the hardware, but as you will see
the correlation isn't all that great.

Here are some results obtained today from the above repos. The times are in
microseconds, and I've omitted the part reads (rwbenchmark's first two
outputs) for brevity.

Target        Dhrystones/s   Page-read  Page-write  Block-erase
---------------------------------------------------------------
ea2468 16        14179.4       1853.92      1842.77     1978.27
stm3210e_eval     7252.9        712.27      1007.37     1921.03
synth(*)      13458950.2          2.21         4.00        9.33

(*) On my workstation - a Core2 Quad running at 2.4Ghz.

Bear in mind that the stm3210e has a small-page NAND chip while the ea2468 a
large page.


Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2009-11-10  7:03                           ` Jonathan Larmour
@ 2010-12-11 19:18                             ` John Dallaway
  2010-12-22 14:54                               ` Rutger Hofman
  0 siblings, 1 reply; 58+ messages in thread
From: John Dallaway @ 2010-12-11 19:18 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Rutger Hofman, eCos developers, Ross Younger

Hi Jifl

On Tue, 10 Nov 2009 07:03:02 +0000, Jonathan Larmour wrote:

> I think I really have to get some comparative measurements on code size
> and performance at least. It's tricky when there is no common hardware.
> Not even a common architecture (given Jurgen's SAM9260 port isn't
> public). And no common chip even then. I think that unless someone is
> willing to port one or other to a common piece of hardware, then the
> only recourse is the synthetic target.
> 
> I've now built both implementations and run all tests successfully on
> synth for both. Now I "just" need to finish porting rwbenchmark.c to (R).

Do you see any opportunity to complete this review in the near future?

John Dallaway

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: NAND technical review
  2010-12-11 19:18                             ` John Dallaway
@ 2010-12-22 14:54                               ` Rutger Hofman
  0 siblings, 0 replies; 58+ messages in thread
From: Rutger Hofman @ 2010-12-22 14:54 UTC (permalink / raw)
  To: John Dallaway; +Cc: Jonathan Larmour, eCos developers, Ross Younger

On 12/11/2010 08:18 PM, John Dallaway wrote:
> Hi Jifl
>
> On Tue, 10 Nov 2009 07:03:02 +0000, Jonathan Larmour wrote:
>
>> I think I really have to get some comparative measurements on code size
>> and performance at least. It's tricky when there is no common hardware.
>> Not even a common architecture (given Jurgen's SAM9260 port isn't
>> public). And no common chip even then. I think that unless someone is
>> willing to port one or other to a common piece of hardware, then the
>> only recourse is the synthetic target.
>>
>> I've now built both implementations and run all tests successfully on
>> synth for both. Now I "just" need to finish porting rwbenchmark.c to (R).
>
> Do you see any opportunity to complete this review in the near future?
>
> John Dallaway

Our project, RFID Guardian, has unexpectedly be cut out of funding. The 
project is now in the fridge until we find new funding, and that might 
take an arbitrarily long time. I am winding up loose ends in the 
software right now, and I expect I will be assigned as research 
programmer to some other project within the Computer Systems Group of 
the VU Amsterdam. I don't know for sure, but my expectation is that I 
will not be able to do more than bug fixes for my NAND package.

Well, this kind-a breaks any promises I made regarding 
improvements/performance hacks to the package, and although I feel bad 
about that, there is some mitigation in the fact that more than a year 
has passed since in complete silence.

Rutger Hofman
VU Amsterdam

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2010-12-22 14:54 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-02 15:51 NAND technical review Jonathan Larmour
2009-10-06 13:51 ` Ross Younger
2009-10-07  3:12   ` Jonathan Larmour
2009-10-07 16:22     ` Rutger Hofman
2009-10-08  7:15       ` Jürgen Lambrecht
2009-10-15  3:53         ` Jonathan Larmour
2009-10-15 11:54           ` Jürgen Lambrecht
2009-10-15  3:49       ` Jonathan Larmour
2009-10-15 14:36         ` Rutger Hofman
2009-10-16  1:32           ` Jonathan Larmour
2009-10-19  9:56             ` Ross Younger
2009-10-19 14:21             ` Rutger Hofman
2009-10-20  3:21               ` Jonathan Larmour
2009-10-20 12:19                 ` Rutger Hofman
2009-10-21  1:45                   ` Jonathan Larmour
2009-10-21 12:15                     ` Rutger Hofman
2009-10-23 14:06                       ` Jonathan Larmour
2009-10-23 15:25                         ` Rutger Hofman
2009-10-23 18:03                           ` Rutger Hofman
2009-10-27 20:02                           ` Rutger Hofman
2009-11-10  7:03                           ` Jonathan Larmour
2010-12-11 19:18                             ` John Dallaway
2010-12-22 14:54                               ` Rutger Hofman
2009-10-15 15:43         ` Rutger Hofman
     [not found]     ` <4ACDF868.7050706@ecoscentric.com>
2009-10-09  8:27       ` Ross Younger
2009-10-13  2:21         ` Jonathan Larmour
2009-10-13 13:35           ` Rutger Hofman
2009-10-16  4:04             ` Jonathan Larmour
2009-10-19 14:51               ` Rutger Hofman
2009-10-20  4:28                 ` Jonathan Larmour
2009-10-07  9:40   ` Jürgen Lambrecht
2009-10-07 16:27     ` Rutger Hofman
2009-10-13  2:44     ` Jonathan Larmour
2009-10-13  6:35       ` Jürgen Lambrecht
2009-10-15  3:55         ` Jonathan Larmour
2009-10-13 12:59       ` Rutger Hofman
2009-10-15  4:41         ` Jonathan Larmour
2009-10-15 14:55           ` Rutger Hofman
2009-10-16  1:45             ` Jonathan Larmour
2009-10-19 10:53           ` Ross Younger
2009-10-20  1:40             ` Jonathan Larmour
2009-10-20 10:17               ` Ross Younger
2009-10-21  2:06                 ` Jonathan Larmour
2009-10-22 10:05                   ` Ross Younger
2009-11-10  5:15                     ` Jonathan Larmour
2009-11-10 10:38                       ` Ross Younger
2009-11-10 11:28                         ` Ethernet over SPI driver for ENC424J600 Ilija Stanislevik
2009-11-10 12:16                           ` Chris Holgate
2009-11-12 18:32                         ` NAND technical review Ross Younger
2009-10-13 14:19       ` Rutger Hofman
2009-10-13 19:58         ` Lambrecht Jürgen
2009-10-07 12:11   ` Rutger Hofman
2009-10-08 12:31     ` Ross Younger
2009-10-08  8:16   ` Jürgen Lambrecht
2009-10-12  1:13     ` Jonathan Larmour
2009-10-16  7:29 ` Simon Kallweit
2009-10-16 13:53   ` Jonathan Larmour
2009-10-19 15:02   ` Rutger Hofman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).