From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Woodhouse <dwmw2@infradead.org>
To: ecos-discuss@sources.redhat.com
Subject: [ECOS] Re: Fw: [ECOS] Re: Simple flash filesystem? (fwd)
Date: Wed, 07 Feb 2001 04:14:00 -0000
Message-id: <7895.981548077@redhat.com>
X-SW-Source: 2001-02/msg00095.html

Eep. Paying attention where I send it this time...

--
dwmw2

------- Forwarded Message

From: David Woodhouse <dwmw2@infradead.org>
To: Kristian Otnes <kristian.otnes@tevero.no>
Cc: "Paul Beskeen" <paulb@cambridge.redhat.com>,
    ecos-maintainers@redhat.com
Subject: Re: Fw: [ECOS] Re: Simple flash filesystem? 
Date: Wed, 07 Feb 2001 11:22:50 +0000


kristian.otnes@tevero.no said:
> the reason for emulating a disk in a flash based filesystem is
> probably twofold:

> - It fits in with the normal disk approach usage 

> - It breaks the larger blocks (typically 64KB or 128KB) into
>   virtual smaller blocks, so that other software is not
>   bothered by the problem of handling the large flash blocks
>   efficiently. In other words, it is a relatively simple
>   way of managing some of the harder parts of flash usage. 


Both of those are of an issue for people dealing with legacy operating
systems who are stuck with the existing block-based filesystem concept. 

I can understand doing this under DOS where you provide an INT13h handler
for your device to make it pretend to be a normal disc drive, and you 
don't want to get any more involved with the O/S than you have to.

Under real operating systems these days though, it's not really an issue.

When you emulate a 'normal' block device on flash, you basically end up 
with a kind of pseudo-filesystem to keep track of where the blocks are, 
etc. Obviously you need that to be a journalling pseudo-filesystem of some 
kind, to prevent corruption. 

On top of that emulated block device, you then need to put a 'normal' 
journalling filesystem. You've got two layers of filesystem and two layers 
of journalling. It's not wonderfully efficient.

I spent a long time dreaming of a filesystem which worked directly on flash 
chips without this problem. Eventually, the guys at Axis wrote it - JFFS 
runs directly on the flash chips. It's a log-structured filesystem. You 
just write nodes sequentially to the flash. 

Each node contains the current metadata for the file you're writing,
including stuff like name and parent inode number so the directory tree can
be built, and usually some data for a portion of that file. There's no 
wasted space, because each node comes immediately after the previous node. 
(Well, we do align them to 4 bytes.)

The filesystem keeps a map of which bits of each file can be found at what 
location on the flash, and when you read from the file, it just copies the 
data out of the right node on the flash for you.

The interesting bit is when you get to the end of the flash chip(s) - you 
have to start again at the beginning. Generally, some of the nodes you 
wrote out right at the beginning have been obsoleted by later writes to the 
same offset in the same file. So taking each erase block one at a time from 
the beginning again, you copy the nodes that are still valid into the space 
you've got left, and then delete the erase block. Generally, you've made 
yourself some more space by doing that.


JFFS has some problems - it uses quite a lot of RAM because it keeps a
complete 'map' for each file in-core at all times, and it will
garbage-collect erase blocks strictly in order even if some of them don't
actually _have_ any obsoleted nodes so it's just moving megabytes of data
from one location on the flash to another. We get _perfect_ wear
_levelling_, but it's hardly optimal.

I'm currently working on a re-implementation of JFFS; imaginatively called 
JFFS2. It extends the excellent ideas of Axis' original and fixes these 
problems, along with adding support for hard links and compression. It's 
turning out to re-use almost no code from the original (GPL'd) version. So 
the prospects for an eCos port look fairly good. 

Any volunteers to work on an eCos version once the Linux code has stabilised
a little and at least _compiles_ would be welcome :)

- --
dwmw2


------- End of Forwarded Message