public inbox for ecos-discuss@sourceware.org
 help / color / mirror / Atom feed
* [ECOS] Fedora Core 3 - Ecos synth crashes.
@ 2005-05-04 22:08 John Carter
  2005-05-04 22:45 ` Bart Veer
  0 siblings, 1 reply; 4+ messages in thread
From: John Carter @ 2005-05-04 22:08 UTC (permalink / raw)
  To: ecos-discuss

I'm seeing a case on fedora core 3 where...
  * exactly the same executable whether built under fedora or built under debian (both built using the
    ecosV2.0 gnutools i386-elf-gcc)
  * seg faults under Fedora (on two different machines) (linux-2.6.9)
  * Works under debian unstable / Mandrake 9.1 / Mandrake 10 linux-2.4, 2.6.10&12

My current guess, following things around in the debugger is it is during 
running the __CTOR__ list, in particular setting up the sigactions.

Initially I thought it is due to a mismatch between the current linux
sigaction struct and the cyg_hal_sys_action struct. (Valgrind
certinaly suggested that this may be the case.) I tried fixing that
but it didn't resolve the problem.

My current best guess is things turn to custard when the SIGALRM's
start firing.

Looking ecos CVS I note 7 weeks ago Bart Veer was doing something in
relation with sigreturn so I wonder if it is worth back patching that,
and if so how much do I need? (Curiosly enough this all works under Linux 
2.6.12-rc3, 2.610  but not underfedora 2.6.9)


----------------------------------------------------------------------


Here is a description of the difference between the linux and the
cyg_hal_sys_sigaction structs....

http://ecos.sourceware.org/cgi-bin/cvsweb.cgi/ecos/packages/hal/synth/arch/current/include/hal_io.h?rev=1.13&content-type=text/x-cvsweb-markup&cvsroot=ecos
// The kernel sigaction structure has changed, to allow for >32
// signals. This is the old version, i.e. a struct old_sigaction, for
// use with the sigaction() system call rather than rt_sigaction(). It
// is preferred to the more modern version because gdb knows about
// rt_sigaction() and will start intercepting signals, but it seems to
// ignore sigaction().
struct cyg_hal_sys_sigaction {
     void        (*hal_handler)(int);
     long        hal_mask;
     int         hal_flags;
     void        (*hal_restorer)(void);
};

However, looking in /usr/include/bits/sigaction.h

struct sigaction
   {
     /* Signal handler.  */
#ifdef __USE_POSIX199309
     union
       {
 	/* Used if SA_SIGINFO is not set.  */
 	__sighandler_t sa_handler;
 	/* Used if SA_SIGINFO is set.  */
 	void (*sa_sigaction) (int, siginfo_t *, void *);
       }
     __sigaction_handler;
# define sa_handler	__sigaction_handler.sa_handler
# define sa_sigaction	__sigaction_handler.sa_sigaction
#else
     __sighandler_t sa_handler;
#endif

     /* Additional set of signals to be blocked.  */
     __sigset_t sa_mask;

     /* Special flags.  */
     int sa_flags;

     /* Restore handler.  */
     void (*sa_restorer) (void);
   };

Where, and this is the interesting bit...
/usr/include/bits/sigset.h says

# define _SIGSET_NWORDS	(1024 / (8 * sizeof (unsigned long int)))
typedef struct
   {
     unsigned long int __val[_SIGSET_NWORDS];
   } __sigset_t;


ie. Ecos thinks sa_mask is one int, linux thinks it's 32 int's in a
row. If a sigaction is near the edge of available VM a call to
sigaction crashes, otherwise it magically (sort of) works.





John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand


Somewhere on the edge of a Galaxy, one of literally billions of such
galaxies, is a sun, one of literally billions of suns in that
galaxy.

Orbiting that sun is a small rock 330000 times smaller than that
sun.

This rock is  covered by a very very thin scum  of life. (Think 6000km
of rock followed by a meter or so of biomass.)

Amongst the millions of species in that scum are many hundreds of
thousands of types beetle and a mere handful of primates.

Surprisingly enough, this email does not originate from a beetle.

It originates from just one of the 6 billion vastly outnumbered humans.

I trust you will keep this perspective and context in mind when
reacting to this email.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] Fedora Core 3 - Ecos synth crashes.
  2005-05-04 22:08 [ECOS] Fedora Core 3 - Ecos synth crashes John Carter
@ 2005-05-04 22:45 ` Bart Veer
  2005-05-04 23:56   ` John Carter
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Veer @ 2005-05-04 22:45 UTC (permalink / raw)
  To: john.carter; +Cc: ecos-discuss

>>>>> "John" == John Carter <john.carter@tait.co.nz> writes:

    John> I'm seeing a case on fedora core 3 where...

    John>   * exactly the same executable whether built under fedora
    John>   or built under debian (both built using the ecosV2.0
    John>   gnutools i386-elf-gcc)
    John>   * seg faults under Fedora (on two different machines)
    John>   (linux-2.6.9)
    John>   * Works under debian unstable / Mandrake 9.1 / Mandrake 10
    John>   linux-2.4, 2.6.10&12

    <snip>

    John> My current best guess is things turn to custard when the
    John> SIGALRM's start firing.

    John> Looking ecos CVS I note 7 weeks ago Bart Veer was doing
    John> something in relation with sigreturn so I wonder if it is
    John> worth back patching that, and if so how much do I need?
    John> (Curiosly enough this all works under Linux 2.6.12-rc3,
    John> 2.610 but not underfedora 2.6.9)

Yes. The Fedora folks appear to have done something strange to the
kernel. I did not try to figure out exactly which of their patches
causes the problem. Other distributions are not affected. When a
signal handler is invoked the Fedora kernel places a bogus return
address on the signal handler's stack so when that signal handler
returns things blow up. Most applications are not affected because
they go via glibc signal handling code, which triggers a slightly
different path through the kernel. What my patch does is basically
replicate the glibc behaviour and the problems go away.

This is not the only synthetic target problem fixed since the v2
release. There have been a couple of compiler-related issues which
had to be worked around. I suggest switching to anoncvs, at the very
least everything below hal/synth

    <snip>

    John> ie. Ecos thinks sa_mask is one int, linux thinks it's 32
    John> int's in a row. If a sigaction is near the edge of available
    John> VM a call to sigaction crashes, otherwise it magically (sort
    John> of) works.

What you are looking at there is a struct sigaction, not a struct
old_sigaction. Originally Linux only supported up to 32 signals, each
requiring one bit in the sa_mask, so in a struct old_sigaction the
sa_mask is just one integer. The synthetic target does not need any of
the newer real-time signal support.

Bart

-- 
Bart Veer                       eCos Configuration Architect
http://www.ecoscentric.com/     The eCos and RedBoot experts


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] Fedora Core 3 - Ecos synth crashes.
  2005-05-04 22:45 ` Bart Veer
@ 2005-05-04 23:56   ` John Carter
  2005-05-05 16:57     ` Bart Veer
  0 siblings, 1 reply; 4+ messages in thread
From: John Carter @ 2005-05-04 23:56 UTC (permalink / raw)
  To: Bart Veer; +Cc: ecos-discuss

On Wed, 4 May 2005, Bart Veer wrote:

> What you are looking at there is a struct sigaction, not a struct
> old_sigaction.

Hmm, I saw that in the code, but I'm not totally convinced.

As I read it, somewhere down the bottom of everything the linux sigaction 
system call is made and it is expecting the new type sigaction. Linux is 
expecting distributors to recompile #including the new headers which would 
result in it all just working. Unfortunately ecos copies and pastes the 
old struct definition hence remains out of step.

Valgrind seems to second my reading of this in that it whinges about 
uninitialized data being passed to sigaction without my mods, and is 
happy with them.


John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand


Somewhere on the edge of a Galaxy, one of literally billions of such
galaxies, is a sun, one of literally billions of suns in that
galaxy.

Orbiting that sun is a small rock 330000 times smaller than that
sun.

This rock is  covered by a very very thin scum  of life. (Think 6000km
of rock followed by a meter or so of biomass.)

Amongst the millions of species in that scum are many hundreds of
thousands of types beetle and a mere handful of primates.

Surprisingly enough, this email does not originate from a beetle.

It originates from just one of the 6 billion vastly outnumbered humans.

I trust you will keep this perspective and context in mind when
reacting to this email.

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ECOS] Fedora Core 3 - Ecos synth crashes.
  2005-05-04 23:56   ` John Carter
@ 2005-05-05 16:57     ` Bart Veer
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Veer @ 2005-05-05 16:57 UTC (permalink / raw)
  To: john.carter; +Cc: ecos-discuss

>>>>> "John" == John Carter <john.carter@tait.co.nz> writes:

    John> On Wed, 4 May 2005, Bart Veer wrote:
    >> What you are looking at there is a struct sigaction, not a struct
    >> old_sigaction.

    John> Hmm, I saw that in the code, but I'm not totally convinced.

    John> As I read it, somewhere down the bottom of everything the
    John> linux sigaction system call is made and it is expecting the
    John> new type sigaction. Linux is expecting distributors to
    John> recompile #including the new headers which would result in
    John> it all just working. Unfortunately ecos copies and pastes
    John> the old struct definition hence remains out of step.

    John> Valgrind seems to second my reading of this in that it
    John> whinges about uninitialized data being passed to sigaction
    John> without my mods, and is happy with them.

The synthetic target makes the original sigaction call, system call
67, which takes a struct old_sigaction. There is also rt_sigaction,
system call 173, which takes the new version of the structure. These
days glibc and hence nearly all Linux applications will use the
latter. The Linux kernel supports both system calls and is likely to
continue to do so for the foreseeable future to preserve binary
compatibility. The synthetic target uses the old call in preference
because gdb tries to do some clever stuff for the new call which gets
in the way of the synthetic target operation.

For further confirmation, consider the structure layout:

struct cyg_hal_sys_sigaction {
    void        (*hal_handler)(int);
    long        hal_mask;
    int         hal_flags;
    void        (*hal_restorer)(void);
};

If the Linux kernel was expecting 32 ints for sa_mask instead of just
one then the supplied flags and restorer fields would be interpreted
as part of the sa_mask, and random data would be interpreted as the
flags and restorer. This is not what happens. The flags and restorer
fields are critical to the correct behaviour of the system, and work
as intended.

>>>>> "John" == John Carter <john.carter@tait.co.nz> writes:

    John> On Wed, 4 May 2005, Bart Veer wrote:

    >> There have been a couple of compiler-related issues which
    >> had to be worked around. I suggest switching to anoncvs, at the very
    >> least everything below hal/synth

    John> Ok, that worked. Except for one minor glitch in synth.ld

    John> We are still using the gnutools i386-elf-gcc version 3.2.1
    John> that came with ecos V2.0, which doesn't have libgcc_eh.a and
    John> libsupc++.a

The synthetic target is normally built with the native gcc, not with
i386-elf-gcc which is primarily a cross-compiler for real x86 embedded
targets. Using i386-elf-gcc is necessary only if you are building on a
Windows box and then running the application on a Linux box. Not a
common scenario.

Bart

-- 
Bart Veer                       eCos Configuration Architect
http://www.ecoscentric.com/     The eCos and RedBoot experts


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-05-05 16:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-04 22:08 [ECOS] Fedora Core 3 - Ecos synth crashes John Carter
2005-05-04 22:45 ` Bart Veer
2005-05-04 23:56   ` John Carter
2005-05-05 16:57     ` Bart Veer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).