public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed
* [ECOS] High priority thread versus network
@ 2007-05-07 15:39 Wayne Visser
  2007-05-07 17:28 ` Robin Randhawa
  2007-08-14 13:18 ` [ECOS] High priority thread versus network - FOLLOW-UP Wayne Visser
  0 siblings, 2 replies; 4+ messages in thread
From: Wayne Visser @ 2007-05-07 15:39 UTC (permalink / raw)
  To: ecos-discuss

Hello all,

We're having a problem with an eCos app that has a relatively 
long-running, high priority thread (runs at priority 2 every 10 ms and 
takes about 4ms to complete).  Under high network loads, the app will 
crash with no asserts or panics.  If the high priority thread is 
disabled, the app will run fine for days without problem under high net 
loads.  Conversely, without any networking activity, the app runs fine 
for days.

We've stripped this down to a simple test app with two parts (a) the 
high priority thread basically does nothing but consume CPU time:

static void
high_thread( cyg_addrword_t arg )
{
   int i, j;
   cyg_uint16 buf[1024];

   while (1)
   {
     for ( j=0; j<90; ++j )
     {
       for ( i=0; i<1024; ++i)
         buf[i] = rand();
     }

     cyg_thread_delay(1);
   }
}

and (b) several identical networking threads that do nothing but accept 
client connections and echo data sent to them.

If several clients connect to the eCos app, a crash will occur in as 
little as a few minutes (but sometimes hours).

So my question is this: Are there any known issues in running a high 
priority thread with a relatively long running time?  i.e. this thread 
is effectively blocking the network threads from running for up to 4ms. 
  Will that create any known problems?

Thanks for any feedback.

   -- Wayne


ps: Out target is i386 and the problem is evident with both 8139 and 
82559 ethernet drivers.  Curiously, the problem does NOT appear with the 
83816 ethernet driver.


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] High priority thread versus network
  2007-05-07 15:39 [ECOS] High priority thread versus network Wayne Visser
@ 2007-05-07 17:28 ` Robin Randhawa
  2007-05-09 19:13   ` Wayne Visser
  2007-08-14 13:18 ` [ECOS] High priority thread versus network - FOLLOW-UP Wayne Visser
  1 sibling, 1 reply; 4+ messages in thread
From: Robin Randhawa @ 2007-05-07 17:28 UTC (permalink / raw)
  To: Wayne Visser; +Cc: ecos-discuss

Hi.

Just some shots in the dark :

1. Have you enabled all assertions ? Checks for stack manipulation ?

2. Are you using a separate interrupt stack ?

3. Does changing the default stack size of the Network Thread make a
difference in either the exhibited phenomena and/or the time before the
system hangs up ?

The problem you face seems to be a stack overflow but there really isn't
sufficient data to state that as a fact.

I would try the above just to reduce some of the possibilities.

Cheers,
Robin

On Mon, 2007-05-07 at 08:53 -0400, Wayne Visser wrote:
> Hello all,
> 
> We're having a problem with an eCos app that has a relatively 
> long-running, high priority thread (runs at priority 2 every 10 ms and 
> takes about 4ms to complete).  Under high network loads, the app will 
> crash with no asserts or panics.  If the high priority thread is 
> disabled, the app will run fine for days without problem under high net 
> loads.  Conversely, without any networking activity, the app runs fine 
> for days.
> 
> We've stripped this down to a simple test app with two parts (a) the 
> high priority thread basically does nothing but consume CPU time:
> 
> static void
> high_thread( cyg_addrword_t arg )
> {
>    int i, j;
>    cyg_uint16 buf[1024];
> 
>    while (1)
>    {
>      for ( j=0; j<90; ++j )
>      {
>        for ( i=0; i<1024; ++i)
>          buf[i] = rand();
>      }
> 
>      cyg_thread_delay(1);
>    }
> }
> 
> and (b) several identical networking threads that do nothing but accept 
> client connections and echo data sent to them.
> 
> If several clients connect to the eCos app, a crash will occur in as 
> little as a few minutes (but sometimes hours).
> 
> So my question is this: Are there any known issues in running a high 
> priority thread with a relatively long running time?  i.e. this thread 
> is effectively blocking the network threads from running for up to 4ms. 
>   Will that create any known problems?
> 
> Thanks for any feedback.
> 
>    -- Wayne
> 
> 
> ps: Out target is i386 and the problem is evident with both 8139 and 
> 82559 ethernet drivers.  Curiously, the problem does NOT appear with the 
> 83816 ethernet driver.
> 
> 


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] High priority thread versus network
  2007-05-07 17:28 ` Robin Randhawa
@ 2007-05-09 19:13   ` Wayne Visser
  0 siblings, 0 replies; 4+ messages in thread
From: Wayne Visser @ 2007-05-09 19:13 UTC (permalink / raw)
  To: Robin Randhawa; +Cc: ecos-discuss


Thanks for the tips, Robin.  You may be shooting in the dark, but I *am* 
in the dark! ;-)

You're right, it does look like a stack overflow, but I have assertions 
enabled (CYGDBG_USE_ASSERTS) and CYGFUN_KERNEL_THREADS_STACK_CHECKING is 
also enabled.  I've seen no asserts or panics being raised to date.  Are 
there other assertions I've missed?

I'm using a separate interrupt stack (size = 4096).

I've gone ahead and increased the size of the NET_THREAD and 
NET_FAST_THREAD and am currently re-running the tests.

   -- Wayne

Robin Randhawa wrote:
> Hi.
> 
> Just some shots in the dark :
> 
> 1. Have you enabled all assertions ? Checks for stack manipulation ?
> 
> 2. Are you using a separate interrupt stack ?
> 
> 3. Does changing the default stack size of the Network Thread make a
> difference in either the exhibited phenomena and/or the time before the
> system hangs up ?
> 
> The problem you face seems to be a stack overflow but there really isn't
> sufficient data to state that as a fact.
> 
> I would try the above just to reduce some of the possibilities.
> 
> Cheers,
> Robin
> 
> On Mon, 2007-05-07 at 08:53 -0400, Wayne Visser wrote:
>> Hello all,
>>
>> We're having a problem with an eCos app that has a relatively 
>> long-running, high priority thread (runs at priority 2 every 10 ms and 
>> takes about 4ms to complete).  Under high network loads, the app will 
>> crash with no asserts or panics.  If the high priority thread is 
>> disabled, the app will run fine for days without problem under high net 
>> loads.  Conversely, without any networking activity, the app runs fine 
>> for days.
>>
>> We've stripped this down to a simple test app with two parts (a) the 
>> high priority thread basically does nothing but consume CPU time:
>>
>> static void
>> high_thread( cyg_addrword_t arg )
>> {
>>    int i, j;
>>    cyg_uint16 buf[1024];
>>
>>    while (1)
>>    {
>>      for ( j=0; j<90; ++j )
>>      {
>>        for ( i=0; i<1024; ++i)
>>          buf[i] = rand();
>>      }
>>
>>      cyg_thread_delay(1);
>>    }
>> }
>>
>> and (b) several identical networking threads that do nothing but accept 
>> client connections and echo data sent to them.
>>
>> If several clients connect to the eCos app, a crash will occur in as 
>> little as a few minutes (but sometimes hours).
>>
>> So my question is this: Are there any known issues in running a high 
>> priority thread with a relatively long running time?  i.e. this thread 
>> is effectively blocking the network threads from running for up to 4ms. 
>>   Will that create any known problems?
>>
>> Thanks for any feedback.
>>
>>    -- Wayne
>>
>>
>> ps: Out target is i386 and the problem is evident with both 8139 and 
>> 82559 ethernet drivers.  Curiously, the problem does NOT appear with the 
>> 83816 ethernet driver.
>>
>>
> 
> .
> 

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] High priority thread versus network - FOLLOW-UP
  2007-05-07 15:39 [ECOS] High priority thread versus network Wayne Visser
  2007-05-07 17:28 ` Robin Randhawa
@ 2007-08-14 13:18 ` Wayne Visser
  1 sibling, 0 replies; 4+ messages in thread
From: Wayne Visser @ 2007-08-14 13:18 UTC (permalink / raw)
  To: ecos-discuss


Wayne Visser wrote:
> Hello all,
> 
> We're having a problem with an eCos app that has a relatively 
> long-running, high priority thread (runs at priority 2 every 10 ms and 
> takes about 4ms to complete).  Under high network loads, the app will 
> crash with no asserts or panics.  If the high priority thread is 
> disabled, the app will run fine for days without problem under high net 
> loads.  Conversely, without any networking activity, the app runs fine 
> for days.
> <snip>


Hello all,

This is a follow-up to some mysterious crashes we were seeing related to 
network activity.  Related posts are here:

https://bugzilla.ecoscentric.com/show_bug.cgi?id=1000403
http://sourceware.org/ml/ecos-discuss/2007-03/msg00024.html
http://sourceware.org/ml/ecos-discuss/2007-05/msg00046.html

This was seen on an i386 platform (Advantech PCM3370) and we noticed 
that when the AGP aperture was reduced the observed problem apparently 
'disappeared' leaving me to think we had some type of memory conflict 
between the aperture and the ethernet card.

Whether or not a high-priority thread was running and stealing time away 
from the networking threads turned out not be causal - network activity 
on its own was enough to cause crashes.  No asserts or panics were 
raised at a crash (apart from me perhaps :-0

It's not entirely clear why a change (specifically a reduction) in AGP 
aperture eliminates the assumed memory conflict.  We also observed that 
crashing was most frequent when the aperture was set to 1/2 of the main 
memory size.

The board uses a Via VT8606 Northbridge (ProSavage PN133T) and on our 
board's BIOS, it's possible to reduce but not completely disable the 
aperture so we did some research and have a method to disable it 
programmatically, which is probably better anyway.

Since this Northbridge is fairly common in PC104 boards, maybe someone 
else is seeing crashes in a similar way so here's how we ended up 
disabling the aperture.  NOTE:  This is not a totally satisfying fix, 
since we don't completely understand the problem, but in the 2 months 
that we've started doing this we have not recorded a single crash on our 
test boards.


// *******************************************************************
#include <cyg/io/pci.h>

// <snip>
// ...
// ...
// ...

// *******************************************************************
#define DEBUG_VT8606_SETUP      0

// <snip>
// ...
// ...
// ...


// *******************************************************************
// device/vendor id matching function for VT8606 Northbridge
//
static cyg_bool
pci_find_match_func(cyg_uint16 v, cyg_uint16 d, cyg_uint32 c, void *p)
{
   // vendor ID for Via Technologies = 0x1106
   // device ID for VT8606 = 0x0605
   return ((v == 0x1106) && (d == 0x0605));
}


// *******************************************************************
// Disable graphics aperture feature in Via VT8606 Northbridge.  Call
// this function as soon after startup as possible (i.e. before setting
// up PCI devices).  This function is benign if no VT8606 exists.
//
static void
chipset_init(void)
{
   cyg_pci_device_id pci_device_id;
   cyg_pci_device    pci_device_info;
#if DEBUG_VT8606_SETUP > 0
   cyg_uint8         b;
#endif
   cyg_uint32        dw;

   pci_device_id = CYG_PCI_NULL_DEVID;
   cyg_pci_init();
   if ( cyg_pci_find_matching(&pci_find_match_func, NULL,
                                &pci_device_id))
   {
     cyg_pci_get_device_info(pci_device_id, &pci_device_info);

     if (cyg_pci_configure_device(&pci_device_info))
     {

       // read GA base
#if DEBUG_VT8606_SETUP > 0
       cyg_pci_read_config_uint32(pci_device_info.devid, 0x10, &dw);
       diag_printf(" GA BASE (0x88):    0x%08x\n", dw);
#endif

       // read TLB and disable aperture
       cyg_pci_read_config_uint32(pci_device_info.devid, 0x88, &dw);
#if DEBUG_VT8606_SETUP > 0
       diag_printf(" GA TLB (0x88):    0x%08x\n", dw);
#endif
       dw &= ~2;
       cyg_pci_write_config_uint32(pci_device_info.devid, 0x88, dw);
#if DEBUG_VT8606_SETUP > 0
       cyg_pci_read_config_uint32(pci_device_info.devid, 0x88, &dw);
       diag_printf(" GA TLB (after disabling aperture) (0x88): 
0x%08x\n", dw);
#endif

       // read aperture size and set to 0
#if DEBUG_VT8606_SETUP > 0
       cyg_pci_read_config_uint8(pci_device_info.devid, 0x84, &b);
       diag_printf(" Aperture Size (0x84):    0x%02x\n", b);
#endif
       cyg_pci_write_config_uint8(pci_device_info.devid, 0x84, 0);
#if DEBUG_VT8606_SETUP > 0
       cyg_pci_read_config_uint8(pci_device_info.devid, 0x84, &b);
       diag_printf(" Aperture Size (after setting to 0) (0x84): 
0x%02x\n", b);
#endif
     }
   }
}

// *******************************************************************

--
Wayne Visser
LSZ PaperTech Inc.




-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-08-14 13:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-07 15:39 [ECOS] High priority thread versus network Wayne Visser
2007-05-07 17:28 ` Robin Randhawa
2007-05-09 19:13   ` Wayne Visser
2007-08-14 13:18 ` [ECOS] High priority thread versus network - FOLLOW-UP Wayne Visser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).