[ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?

public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed

* [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?
@ 2011-07-28 13:15 Gunnar Ruthenberg
  2011-07-28 13:36 ` Gunnar Ruthenberg
  2011-08-02  6:42 ` Lambrecht Jürgen
  0 siblings, 2 replies; 5+ messages in thread
From: Gunnar Ruthenberg @ 2011-07-28 13:15 UTC (permalink / raw)
  To: ecos-discuss

Hi,

I stumbled upon a problem with JFFS2.

When overwriting a relatively large file (580 kB in a 896 kB flash region),
the file system breaks.

Plenty of errors like these ensue when trying to read the new, overwritten file
after a reboot:

BUG() at ~/ecos-2.0.40/packages/fs/jffs2/v2_0_40/src/readinode.c 381
<5>JFFS2 notice:  read_unknown: node header CRC failed at %#08x. But it must hav
e been OK earlier.
<4>Node totlen on flash (0xffffffff) != totlen from node ref (0x00000044)
<4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found for ino #3
<5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so we fake s
ome modes for it
<4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found for ino #3
<5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so we fake s
ome modes for it

It does not matter if the new file is identical with the old file.
Unlinking and then writing the file again causes the same result.

If this is more or less a known problem, how likely is it that small files (say,
being smaller than 50% of the file system's storage space) cause this behaviour
as well?

I noticed that the current version in the eCos source tree lags a bit behind the
code in the Linux kernel. Maybe this issue has been fixed there already and only
requires a port of the current code to eCos.

I'd be grateful for any ideas about this problem, even if it is just pointing me
to the linux-mtd mailing list.

Thanks,

Gunnar Ruthenberg.

-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurÃ¼ck-Garantie!		
Jetzt informieren: http://www.gmx.net/de/go/freephone

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?
  2011-07-28 13:15 [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system? Gunnar Ruthenberg
@ 2011-07-28 13:36 ` Gunnar Ruthenberg
  2011-08-02  6:42 ` Lambrecht Jürgen
  1 sibling, 0 replies; 5+ messages in thread
From: Gunnar Ruthenberg @ 2011-07-28 13:36 UTC (permalink / raw)
  To: ecos-discuss

Oh,

and please ignore the v2_0_40 version marker.

The JFFS2 portion used in that older eCos version is copied from the current eCos source tree in the hope of fixing this issue, which it didn't.

Greetings,

Gunnar Ruthenberg.

> Hi,
> 
> I stumbled upon a problem with JFFS2.
> 
> When overwriting a relatively large file (580 kB in a 896 kB flash
> region),
> the file system breaks.
> 
> Plenty of errors like these ensue when trying to read the new, overwritten
> file
> after a reboot:
> 
> BUG() at ~/ecos-2.0.40/packages/fs/jffs2/v2_0_40/src/readinode.c 381
> <5>JFFS2 notice:  read_unknown: node header CRC failed at %#08x. But it
> must hav
> e been OK earlier.
> <4>Node totlen on flash (0xffffffff) != totlen from node ref (0x00000044)
> <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found for
> ino #3
> <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so we
> fake s
> ome modes for it
> <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found for
> ino #3
> <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so we
> fake s
> ome modes for it
> 
> It does not matter if the new file is identical with the old file.
> Unlinking and then writing the file again causes the same result.
> 
> If this is more or less a known problem, how likely is it that small files
> (say,
> being smaller than 50% of the file system's storage space) cause this
> behaviour
> as well?
> 
> I noticed that the current version in the eCos source tree lags a bit
> behind the
> code in the Linux kernel. Maybe this issue has been fixed there already
> and only
> requires a port of the current code to eCos.
> 
> I'd be grateful for any ideas about this problem, even if it is just
> pointing me
> to the linux-mtd mailing list.
> 
> Thanks,
> 
> Gunnar Ruthenberg.
-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurÃ¼ck-Garantie!		
Jetzt informieren: http://www.gmx.net/de/go/freephone

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?
  2011-07-28 13:15 [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system? Gunnar Ruthenberg
  2011-07-28 13:36 ` Gunnar Ruthenberg
@ 2011-08-02  6:42 ` Lambrecht Jürgen
  2011-08-09 14:52   ` Gunnar Ruthenberg
  1 sibling, 1 reply; 5+ messages in thread
From: Lambrecht Jürgen @ 2011-08-02  6:42 UTC (permalink / raw)
  To: Gunnar Ruthenberg; +Cc: ecos-discuss

On 07/28/2011 03:15 PM, Gunnar Ruthenberg wrote:
>
> Hi,
>
> I stumbled upon a problem with JFFS2.
>
> When overwriting a relatively large file (580 kB in a 896 kB flash 
> region),
> the file system breaks.
>
What is your block size, or more precisely, the size of the erase blocks?
Often it is 128kB.
JFFS2 needs some spare blocks for garbage collection. I don't remember 
if the default is 3 or 5.
3 means 384kB, so only 512kB is available. And if you still force to use 
it, it _will_ give problems.
>
>
> Plenty of errors like these ensue when trying to read the new, 
> overwritten file
> after a reboot:
>
> BUG() at ~/ecos-2.0.40/packages/fs/jffs2/v2_0_40/src/readinode.c 381
> <5>JFFS2 notice:  read_unknown: node header CRC failed at %#08x. But 
> it must hav
> e been OK earlier.
> <4>Node totlen on flash (0xffffffff) != totlen from node ref (0x00000044)
> <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found 
> for ino #3
> <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so 
> we fake s
> ome modes for it
> <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found 
> for ino #3
> <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so 
> we fake s
> ome modes for it
>
> It does not matter if the new file is identical with the old file.
> Unlinking and then writing the file again causes the same result.
>
FYI: JFFS2 is a log based file system, better read about it if you don't 
know that (I can give some pointers). It means that you cannot overwrite 
a file: instead, the file is marked (in the file system meta data or 
table of contents) for removal, and the new version of the file is 
appended in the file system. If the file system is full, garbage 
collection (GC) starts to recover blocks, using those spare blocks.
And GC can take some time (and then all file system accesses are frozen, 
e.g. the TFTP server), resulting e.g. in a TFTP timeout in the client. 
And if GC can free a block, and then the client resends a TFTP packet, 
the same data portion can be present in double... I have seen several 
file system problems when stress testing it..
Also mark that on a flash you cannot delete bytes, only per erase block.

> If this is more or less a known problem, how likely is it that small 
> files (say,
> being smaller than 50% of the file system's storage space) cause this 
> behaviour
> as well?
>
> I noticed that the current version in the eCos source tree lags a bit 
> behind the
> code in the Linux kernel. Maybe this issue has been fixed there 
> already and only
> requires a port of the current code to eCos.
>
indeed, already for a long time a volunteer is needed to update the ecos 
jffs2 code.

Regards,
Jürgen
>
>
> I'd be grateful for any ideas about this problem, even if it is just 
> pointing me
> to the linux-mtd mailing list.
>
> Thanks,
>
> Gunnar Ruthenberg.
>
>
> --
> NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>
> --
> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>


-- 
Jürgen Lambrecht
R&D Associate
Tel: +32 (0)51 303045    Fax: +32 (0)51 310670
http://www.televic-rail.com
Televic Rail NV - Leo Bekaertlaan 1 - 8870 Izegem - Belgium
Company number 0825.539.581 - RPR Kortrijk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?
  2011-08-02  6:42 ` Lambrecht Jürgen
@ 2011-08-09 14:52   ` Gunnar Ruthenberg
  2011-08-11  6:51     ` Lambrecht Jürgen
  0 siblings, 1 reply; 5+ messages in thread
From: Gunnar Ruthenberg @ 2011-08-09 14:52 UTC (permalink / raw)
  To: ecos-discuss


Hi,

> >
> > Hi,
> >
> > I stumbled upon a problem with JFFS2.
> >
> > When overwriting a relatively large file (580 kB in a 896 kB flash 
> > region),
> > the file system breaks.
> >
> What is your block size, or more precisely, the size of the erase blocks?
> Often it is 128kB.
> JFFS2 needs some spare blocks for garbage collection. I don't remember 
> if the default is 3 or 5.
> 3 means 384kB, so only 512kB is available. And if you still force to use 
> it, it _will_ give problems.
> >

The block size is 64 kB, and there are 14 blocks in total.

The file takes up 6 blocks (with compression), so 8 blocks are free.

These are the settings for the amount of reserved blocks:

resv_blocks_write = 3,
resv_blocks_deletion = 2,
resv_blocks_gctrigger = 4,
resv_blocks_gcbad = 0,
resv_blocks_gcmerge = 3.

If I understood you correctly, deleting and saving the file again should work
fine by these numbers.

I tried increasing the FS size by 8 blocks, but the problem still persists.

After that I tried to run the GC from application code between deleting and
re-writing the file, but saw that this isn't possible without some ugly hacks,
so I decided to reply to this ML first before getting down to that (depending
on the answer).

Anyhow, the GC should be run before re-writing the file anyways, if I under-
stood things right, so triggering it manually would probably not help either.

> >
> > Plenty of errors like these ensue when trying to read the new, 
> > overwritten file
> > after a reboot:
> >
> > BUG() at ~/ecos-2.0.40/packages/fs/jffs2/v2_0_40/src/readinode.c 381
> > <5>JFFS2 notice:  read_unknown: node header CRC failed at %#08x. But 
> > it must hav
> > e been OK earlier.
> > <4>Node totlen on flash (0xffffffff) != totlen from node ref
> (0x00000044)
> > <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found 
> > for ino #3
> > <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so 
> > we fake s
> > ome modes for it
> > <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found 
> > for ino #3
> > <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so 
> > we fake s
> > ome modes for it
> >
> > It does not matter if the new file is identical with the old file.
> > Unlinking and then writing the file again causes the same result.
> >
> FYI: JFFS2 is a log based file system, better read about it if you don't 
> know that (I can give some pointers). It means that you cannot overwrite 
> a file: instead, the file is marked (in the file system meta data or 
> table of contents) for removal, and the new version of the file is 
> appended in the file system. If the file system is full, garbage 
> collection (GC) starts to recover blocks, using those spare blocks.
> And GC can take some time (and then all file system accesses are frozen, 
> e.g. the TFTP server), resulting e.g. in a TFTP timeout in the client. 
> And if GC can free a block, and then the client resends a TFTP packet, 
> the same data portion can be present in double... I have seen several 
> file system problems when stress testing it..
> Also mark that on a flash you cannot delete bytes, only per erase block.


Thanks, I had a vague concept of journalling or log-based filesystems in
mind, but never looked at the details.


> > If this is more or less a known problem, how likely is it that small 
> > files (say,
> > being smaller than 50% of the file system's storage space) cause this 
> > behaviour
> > as well?
> >
> > I noticed that the current version in the eCos source tree lags a bit 
> > behind the
> > code in the Linux kernel. Maybe this issue has been fixed there 
> > already and only
> > requires a port of the current code to eCos.
> >
> indeed, already for a long time a volunteer is needed to update the ecos 
> jffs2 code.
> 
> Regards,
> JÃ¼rgen
> >
> >
> > I'd be grateful for any ideas about this problem, even if it is just 
> > pointing me
> > to the linux-mtd mailing list.
> >
> > Thanks,
> >
> > Gunnar Ruthenberg.
> >
> 

Any more suggestions?

Thanks,

Gunnar Ruthenberg.

-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurÃ¼ck-Garantie!		
Jetzt informieren: http://www.gmx.net/de/go/freephone

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system?
  2011-08-09 14:52   ` Gunnar Ruthenberg
@ 2011-08-11  6:51     ` Lambrecht Jürgen
  0 siblings, 0 replies; 5+ messages in thread
From: Lambrecht Jürgen @ 2011-08-11  6:51 UTC (permalink / raw)
  To: Gunnar Ruthenberg; +Cc: ecos-discuss

On 08/09/2011 04:52 PM, Gunnar Ruthenberg wrote:
>
> Hi,
>
> > >
> > > Hi,
> > >
> > > I stumbled upon a problem with JFFS2.
> > >
> > > When overwriting a relatively large file (580 kB in a 896 kB flash
> > > region),
> > > the file system breaks.
> > >
> > What is your block size, or more precisely, the size of the erase 
> blocks?
> > Often it is 128kB.
> > JFFS2 needs some spare blocks for garbage collection. I don't remember
> > if the default is 3 or 5.
> > 3 means 384kB, so only 512kB is available. And if you still force to use
> > it, it _will_ give problems.
> > >
>
> The block size is 64 kB, and there are 14 blocks in total.
>
> The file takes up 6 blocks (with compression), so 8 blocks are free.
>
> These are the settings for the amount of reserved blocks:
>
> resv_blocks_write = 3,
> resv_blocks_deletion = 2,
> resv_blocks_gctrigger = 4,
> resv_blocks_gcbad = 0,
> resv_blocks_gcmerge = 3.
>
> If I understood you correctly, deleting and saving the file again 
> should work
> fine by these numbers.
>
indeed
>
>
> I tried increasing the FS size by 8 blocks, but the problem still 
> persists.
>
then your problem is something else. Maybe a small bug in the driver 
(the low level flash access functions that jffs2 calls).

Mark that JFFS2 needs a relatively big amount of RAM. Count for 2% of 
the FS size, but as your partition is so small, that should not be the 
problem.. Also compression asks extra RAM..

We also have had big problems with logging: that writes very small 
amounts of data, such that a file of 1kB consumed 10kB of RAM when 
opened and 400kB on flash (numbers are approximate). And even more: this 
thrashes the file system, with a format as only solution finally.
>
>
> After that I tried to run the GC from application code between 
> deleting and
> re-writing the file, but saw that this isn't possible without some 
> ugly hacks,
> so I decided to reply to this ML first before getting down to that 
> (depending
> on the answer).
>
I tried the same, and also stopped because of the ugly hacks :-).
And because GC is not thread safe, I added a GC pass at every mount (by 
adding the call in the jffs2 mount function).
Mark also that a GC pass does not a complete GC, several passes are needed.
>
>
> Anyhow, the GC should be run before re-writing the file anyways, if I 
> under-
> stood things right, so triggering it manually would probably not help 
> either.
>
> > >
> > > Plenty of errors like these ensue when trying to read the new,
> > > overwritten file
> > > after a reboot:
> > >
> > > BUG() at ~/ecos-2.0.40/packages/fs/jffs2/v2_0_40/src/readinode.c 381
> > > <5>JFFS2 notice:  read_unknown: node header CRC failed at %#08x. But
> > > it must hav
> > > e been OK earlier.
> > > <4>Node totlen on flash (0xffffffff) != totlen from node ref
> > (0x00000044)
> > > <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found
> > > for ino #3
> > > <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so
> > > we fake s
> > > ome modes for it
> > > <4>JFFS2 warning:  jffs2_do_read_inode_internal: no data nodes found
> > > for ino #3
> > > <5>JFFS2 notice:  jffs2_do_read_inode_internal: but it has children so
> > > we fake s
> > > ome modes for it
> > >
> > > It does not matter if the new file is identical with the old file.
> > > Unlinking and then writing the file again causes the same result.
> > >
> > FYI: JFFS2 is a log based file system, better read about it if you don't
> > know that (I can give some pointers). It means that you cannot overwrite
> > a file: instead, the file is marked (in the file system meta data or
> > table of contents) for removal, and the new version of the file is
> > appended in the file system. If the file system is full, garbage
> > collection (GC) starts to recover blocks, using those spare blocks.
> > And GC can take some time (and then all file system accesses are frozen,
> > e.g. the TFTP server), resulting e.g. in a TFTP timeout in the client.
> > And if GC can free a block, and then the client resends a TFTP packet,
> > the same data portion can be present in double... I have seen several
> > file system problems when stress testing it..
> > Also mark that on a flash you cannot delete bytes, only per erase block.
>
>
> Thanks, I had a vague concept of journalling or log-based filesystems in
> mind, but never looked at the details.
>
http://blog.datalight.com/jffs2-linux-flash-file-system
(see also the links at the bottom)

Jürgen
>
>
>
> > > If this is more or less a known problem, how likely is it that small
> > > files (say,
> > > being smaller than 50% of the file system's storage space) cause this
> > > behaviour
> > > as well?
> > >
> > > I noticed that the current version in the eCos source tree lags a bit
> > > behind the
> > > code in the Linux kernel. Maybe this issue has been fixed there
> > > already and only
> > > requires a port of the current code to eCos.
> > >
> > indeed, already for a long time a volunteer is needed to update the ecos
> > jffs2 code.
> >
> > Regards,
> > Jürgen
> > >
> > >
> > > I'd be grateful for any ideas about this problem, even if it is just
> > > pointing me
> > > to the linux-mtd mailing list.
> > >
> > > Thanks,
> > >
> > > Gunnar Ruthenberg.
> > >
> >
>
> Any more suggestions?
>
> Thanks,
>
> Gunnar Ruthenberg.
>
> --
> NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
> Jetzt informieren: http://www.gmx.net/de/go/freephone
>
> --
> Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
> and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
>


-- 
Jürgen Lambrecht
R&D Associate
Tel: +32 (0)51 303045    Fax: +32 (0)51 310670
http://www.televic-rail.com
Televic Rail NV - Leo Bekaertlaan 1 - 8870 Izegem - Belgium
Company number 0825.539.581 - RPR Kortrijk

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-11  6:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-28 13:15 [ECOS] JFFS2 bug with overwriting files bigger than 50% of the file system? Gunnar Ruthenberg
2011-07-28 13:36 ` Gunnar Ruthenberg
2011-08-02  6:42 ` Lambrecht Jürgen
2011-08-09 14:52   ` Gunnar Ruthenberg
2011-08-11  6:51     ` Lambrecht Jürgen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).