public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* Opteron Stack Woes
@ 2004-08-17 11:46 David Lecomber
  2004-08-17 13:11 ` Daniel Jacobowitz
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Lecomber @ 2004-08-17 11:46 UTC (permalink / raw)
  To: gdb

Chaps,

What's the best way to get enough info to you to help fix some ropey
stacks that we're seeing on Opteron (SuSE 9), even with the latest CVS?

Typically we see things like

(gdb) n
During symbol reading, Incomplete CFI data; unspecified registers at
0x000000000040d7ad.

and a stacktrace of thousands (and more?) lines

#0  main__ () at trees.f90:912
#1  0x00000000004188b8 in __f90_main ()
#2  0x0000000000418890 in main ()
#3  0x0000002a95dbbc9e in __libc_start_main () from /lib64/libc.so.6
#4  0x0000000000400f2a in _start () at ../sysdeps/x86_64/elf/start.S:96
#5  0x0000007fbffff2a8 in ?? ()
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000001 in ?? ()
#8  0x0000007fbffff5b2 in ?? ()
#9  0x0000000000000000 in ?? ()
#10 0x0000007fbffff5f7 in ?? ()
#11 0x0000007fbffff610 in ?? ()
#12 0x0000007fbffff654 in ?? ()
#13 0x0000007fbffff686 in ?? ()
#14 0x0000007fbffff696 in ?? ()
#15 0x0000007fbffff6a7 in ?? ()
#16 0x0000007fbffff6ce in ?? ()
#17 0x0000007fbffff6de in ?? ()
 etc...

Would a readelf -w output assist?  I don't think I can reproduce the
error with any of the GNU compilers.

Cheers
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Opteron Stack Woes
  2004-08-17 11:46 Opteron Stack Woes David Lecomber
@ 2004-08-17 13:11 ` Daniel Jacobowitz
  2004-08-17 13:29   ` David Lecomber
  2004-08-17 13:48   ` Michael Chastain
  2004-08-17 14:47 ` H. J. Lu
  2004-08-25 19:49 ` [RFC] Backtrace limit David Lecomber
  2 siblings, 2 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2004-08-17 13:11 UTC (permalink / raw)
  To: David Lecomber; +Cc: gdb

On Tue, Aug 17, 2004 at 01:01:51PM +0100, David Lecomber wrote:
> Chaps,
> 
> What's the best way to get enough info to you to help fix some ropey
> stacks that we're seeing on Opteron (SuSE 9), even with the latest CVS?
> 
> Typically we see things like
> 
> (gdb) n
> During symbol reading, Incomplete CFI data; unspecified registers at
> 0x000000000040d7ad.
> 
> and a stacktrace of thousands (and more?) lines
> 
> #0  main__ () at trees.f90:912
> #1  0x00000000004188b8 in __f90_main ()
> #2  0x0000000000418890 in main ()
> #3  0x0000002a95dbbc9e in __libc_start_main () from /lib64/libc.so.6
> #4  0x0000000000400f2a in _start () at ../sysdeps/x86_64/elf/start.S:96
> #5  0x0000007fbffff2a8 in ?? ()
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000001 in ?? ()
> #8  0x0000007fbffff5b2 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000007fbffff5f7 in ?? ()
> #11 0x0000007fbffff610 in ?? ()
> #12 0x0000007fbffff654 in ?? ()
> #13 0x0000007fbffff686 in ?? ()
> #14 0x0000007fbffff696 in ?? ()
> #15 0x0000007fbffff6a7 in ?? ()
> #16 0x0000007fbffff6ce in ?? ()
> #17 0x0000007fbffff6de in ?? ()
>  etc...
> 
> Would a readelf -w output assist?  I don't think I can reproduce the
> error with any of the GNU compilers.

readelf -wF is probably your best bet.  But is the only problem the
fact that the backtrace didn't stop at _start?

You might want to investigate why the backtrace didn't stop earlier, at
main or at a fortran entry point.  GDB may be confused about
main_name().

-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Opteron Stack Woes
  2004-08-17 13:11 ` Daniel Jacobowitz
@ 2004-08-17 13:29   ` David Lecomber
  2004-08-17 14:10     ` Daniel Jacobowitz
  2004-08-17 13:48   ` Michael Chastain
  1 sibling, 1 reply; 7+ messages in thread
From: David Lecomber @ 2004-08-17 13:29 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

Thanks Daniel,

> readelf -wF is probably your best bet.  But is the only problem the
> fact that the backtrace didn't stop at _start?

I've seen worse stack traces than this on the Opteron, particularly one
that had Lustre kernel patches applied -- but I don't know if that was
related..  I'll keep an eye out for further problems.

I figured that the "Incomplete CFI data" might be an issue.  I don't
know enough about this area yet to do anything..

> You might want to investigate why the backtrace didn't stop earlier, at
> main or at a fortran entry point.  GDB may be confused about
> main_name().

Attached is the readelf -wF info, can you see anything wrong with the
0x040d7ad entry?

ta,
David


[-- Attachment #2: ReadElf --]
[-- Type: text/plain, Size: 5136 bytes --]

The section .eh_frame contains:

00000000 00000014 00000000 CIE "zR" cf=1 df=-8 ra=16
   LOC   CFA      ra   
00000000 r7+8     c-8  

00000018 0000001c 0000001c FDE cie=00000000 pc=004188d0..00418926
   LOC   CFA      r3   r6   r12  ra   
004188d0 r7+8     u    u    u    c-8  
004188e3 r7+32    c-32 c-24 c-16 c-8  

00000038 0000001c 0000003c FDE cie=00000000 pc=00418930..0041897c
   LOC   CFA      r3   r6   ra   
00418930 r7+8     u    u    c-8  
00418931 r7+16    u    c-16 c-8  
00418940 r7+24    u    c-16 c-8  
0041894b r7+32    c-24 c-16 c-8  

00000058 ZERO terminator

The section .debug_frame contains:

00000000 00000010 ffffffff CIE "" cf=1 df=-8 ra=16
   LOC   CFA      ra   
00000000 r7+8     c-8  

00000014 00000028 00000000 FDE cie=00000000 pc=00400ff8..00401002
   LOC   CFA      r6   ra   
00400ff8 r7+8     u    c-8  
00400ff9 r7+16    c-16 c-8  
00400ffc r6+16    c-16 c-8  

00000040 00000028 00000000 FDE cie=00000000 pc=00401002..00401c8a
   LOC   CFA      r6   ra   
00401002 r7+8     u    c-8  
00401003 r7+16    c-16 c-8  
00401006 r6+16    c-16 c-8  

0000006c 00000028 00000000 FDE cie=00000000 pc=00401c8a..00401de8
   LOC   CFA      r6   ra   
00401c8a r7+8     u    c-8  
00401c8b r7+16    c-16 c-8  
00401c8e r6+16    c-16 c-8  

00000098 00000028 00000000 FDE cie=00000000 pc=00401de8..0040205e
   LOC   CFA      r6   ra   
00401de8 r7+8     u    c-8  
00401de9 r7+16    c-16 c-8  
00401dec r6+16    c-16 c-8  

000000c4 00000028 00000000 FDE cie=00000000 pc=0040205e..00402fc7
   LOC   CFA      r6   ra   
0040205e r7+8     u    c-8  
0040205f r7+16    c-16 c-8  
00402062 r6+16    c-16 c-8  

000000f0 00000028 00000000 FDE cie=00000000 pc=00402fc7..0040332a
   LOC   CFA      r6   ra   
00402fc7 r7+8     u    c-8  
00402fc8 r7+16    c-16 c-8  
00402fcb r6+16    c-16 c-8  

0000011c 00000028 00000000 FDE cie=00000000 pc=0040332a..004039d9
   LOC   CFA      r6   ra   
0040332a r7+8     u    c-8  
0040332b r7+16    c-16 c-8  
0040332e r6+16    c-16 c-8  

00000148 00000028 00000000 FDE cie=00000000 pc=004039d9..00403b68
   LOC   CFA      r6   ra   
004039d9 r7+8     u    c-8  
004039da r7+16    c-16 c-8  
004039dd r6+16    c-16 c-8  

00000174 00000028 00000000 FDE cie=00000000 pc=00403b68..00407136
   LOC   CFA      r6   ra   
00403b68 r7+8     u    c-8  
00403b69 r7+16    c-16 c-8  
00403b6c r6+16    c-16 c-8  

000001a0 00000028 00000000 FDE cie=00000000 pc=00407136..004079dd
   LOC   CFA      r6   ra   
00407136 r7+8     u    c-8  
00407137 r7+16    c-16 c-8  
0040713a r6+16    c-16 c-8  

000001cc 00000028 00000000 FDE cie=00000000 pc=004079dd..0040810b
   LOC   CFA      r6   ra   
004079dd r7+8     u    c-8  
004079de r7+16    c-16 c-8  
004079e1 r6+16    c-16 c-8  

000001f8 00000028 00000000 FDE cie=00000000 pc=0040810b..00408296
   LOC   CFA      r6   ra   
0040810b r7+8     u    c-8  
0040810c r7+16    c-16 c-8  
0040810f r6+16    c-16 c-8  

00000224 00000028 00000000 FDE cie=00000000 pc=00408296..00408f49
   LOC   CFA      r6   ra   
00408296 r7+8     u    c-8  
00408297 r7+16    c-16 c-8  
0040829a r6+16    c-16 c-8  

00000250 00000028 00000000 FDE cie=00000000 pc=00408f49..00409074
   LOC   CFA      r6   ra   
00408f49 r7+8     u    c-8  
00408f4a r7+16    c-16 c-8  
00408f4d r6+16    c-16 c-8  

0000027c 00000028 00000000 FDE cie=00000000 pc=00409074..004096bb
   LOC   CFA      r6   ra   
00409074 r7+8     u    c-8  
00409075 r7+16    c-16 c-8  
00409078 r6+16    c-16 c-8  

000002a8 00000028 00000000 FDE cie=00000000 pc=004096bb..0040c62a
   LOC   CFA      r6   ra   
004096bb r7+8     u    c-8  
004096bc r7+16    c-16 c-8  
004096bf r6+16    c-16 c-8  

000002d4 00000028 00000000 FDE cie=00000000 pc=0040c62a..0040cc12
   LOC   CFA      r6   ra   
0040c62a r7+8     u    c-8  
0040c62b r7+16    c-16 c-8  
0040c62e r6+16    c-16 c-8  

00000300 00000028 00000000 FDE cie=00000000 pc=0040cc12..0040cdf3
   LOC   CFA      r6   ra   
0040cc12 r7+8     u    c-8  
0040cc13 r7+16    c-16 c-8  
0040cc16 r6+16    c-16 c-8  

0000032c 00000028 00000000 FDE cie=00000000 pc=0040cdf3..0040d1bb
   LOC   CFA      r6   ra   
0040cdf3 r7+8     u    c-8  
0040cdf4 r7+16    c-16 c-8  
0040cdf7 r6+16    c-16 c-8  

00000358 00000028 00000000 FDE cie=00000000 pc=0040d1bb..0040d2fa
   LOC   CFA      r6   ra   
0040d1bb r7+8     u    c-8  
0040d1bc r7+16    c-16 c-8  
0040d1bf r6+16    c-16 c-8  

00000384 00000028 00000000 FDE cie=00000000 pc=0040d2fa..0040d640
   LOC   CFA      r6   ra   
0040d2fa r7+8     u    c-8  
0040d2fb r7+16    c-16 c-8  
0040d2fe r6+16    c-16 c-8  

000003b0 00000028 00000000 FDE cie=00000000 pc=0040d640..0040d79f
   LOC   CFA      r6   ra   
0040d640 r7+8     u    c-8  
0040d641 r7+16    c-16 c-8  
0040d644 r6+16    c-16 c-8  

000003dc 00000028 00000000 FDE cie=00000000 pc=0040d79f..0040d7a9
   LOC   CFA      r6   ra   
0040d79f r7+8     u    c-8  
0040d7a0 r7+16    c-16 c-8  
0040d7a3 r6+16    c-16 c-8  

00000408 00000028 00000000 FDE cie=00000000 pc=0040d7a9..00418751
   LOC   CFA      r6   ra   
0040d7a9 r7+8     u    c-8  
0040d7aa r7+16    c-16 c-8  
0040d7ad r6+16    c-16 c-8  

00000434 ZERO terminator


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Opteron Stack Woes
  2004-08-17 13:11 ` Daniel Jacobowitz
  2004-08-17 13:29   ` David Lecomber
@ 2004-08-17 13:48   ` Michael Chastain
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Chastain @ 2004-08-17 13:48 UTC (permalink / raw)
  To: drow, david; +Cc: gdb

> You might want to investigate why the backtrace didn't stop earlier, at
> main or at a fortran entry point.  GDB may be confused about
> main_name().

This hit me too when I tried to write the first fortran program for the
test suite.  I haven't filed a PR yet, but gdb is clueless about
main_name for fortran programs compiled with g77 3.4.1, both dwarf-2 and
stabs+.  When I say 'start', gdb puts the breakpoint in function 'main'
in the fortran runtime library, not at the 'program' statement.

The only call to set_main_name in gdb is in dbxread.c
and it's not getting called.  I don't see any N_MAIN stab line
in hello.s, so g77 3.4.1 isn't doing what we want.

Note that David is using a different fortran compiler.
I don't know why David's stack trace blows right through main.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Opteron Stack Woes
  2004-08-17 13:29   ` David Lecomber
@ 2004-08-17 14:10     ` Daniel Jacobowitz
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2004-08-17 14:10 UTC (permalink / raw)
  To: David Lecomber; +Cc: gdb

On Tue, Aug 17, 2004 at 02:44:59PM +0100, David Lecomber wrote:
> Thanks Daniel,
> 
> > readelf -wF is probably your best bet.  But is the only problem the
> > fact that the backtrace didn't stop at _start?
> 
> I've seen worse stack traces than this on the Opteron, particularly one
> that had Lustre kernel patches applied -- but I don't know if that was
> related..  I'll keep an eye out for further problems.
> 
> I figured that the "Incomplete CFI data" might be an issue.  I don't
> know enough about this area yet to do anything..

It's not a problem - take a look in the archive.
> 
> > You might want to investigate why the backtrace didn't stop earlier, at
> > main or at a fortran entry point.  GDB may be confused about
> > main_name().
> 
> Attached is the readelf -wF info, can you see anything wrong with the
> 0x040d7ad entry?

No, it looks fine.



-- 
Daniel Jacobowitz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Opteron Stack Woes
  2004-08-17 11:46 Opteron Stack Woes David Lecomber
  2004-08-17 13:11 ` Daniel Jacobowitz
@ 2004-08-17 14:47 ` H. J. Lu
  2004-08-25 19:49 ` [RFC] Backtrace limit David Lecomber
  2 siblings, 0 replies; 7+ messages in thread
From: H. J. Lu @ 2004-08-17 14:47 UTC (permalink / raw)
  To: David Lecomber; +Cc: gdb

On Tue, Aug 17, 2004 at 01:01:51PM +0100, David Lecomber wrote:
> Chaps,
> 
> What's the best way to get enough info to you to help fix some ropey
> stacks that we're seeing on Opteron (SuSE 9), even with the latest CVS?
> 
> Typically we see things like
> 
> (gdb) n
> During symbol reading, Incomplete CFI data; unspecified registers at
> 0x000000000040d7ad.
> 
> and a stacktrace of thousands (and more?) lines
> 
> #0  main__ () at trees.f90:912
> #1  0x00000000004188b8 in __f90_main ()
> #2  0x0000000000418890 in main ()
> #3  0x0000002a95dbbc9e in __libc_start_main () from /lib64/libc.so.6
> #4  0x0000000000400f2a in _start () at ../sysdeps/x86_64/elf/start.S:96
> #5  0x0000007fbffff2a8 in ?? ()
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000001 in ?? ()
> #8  0x0000007fbffff5b2 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000007fbffff5f7 in ?? ()
> #11 0x0000007fbffff610 in ?? ()
> #12 0x0000007fbffff654 in ?? ()
> #13 0x0000007fbffff686 in ?? ()
> #14 0x0000007fbffff696 in ?? ()
> #15 0x0000007fbffff6a7 in ?? ()
> #16 0x0000007fbffff6ce in ?? ()
> #17 0x0000007fbffff6de in ?? ()
>  etc...
> 
> Would a readelf -w output assist?  I don't think I can reproduce the
> error with any of the GNU compilers.
> 

Try:

http://sources.redhat.com/ml/gdb-patches/2004-05/msg00763.html


H.J.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Backtrace limit
  2004-08-17 11:46 Opteron Stack Woes David Lecomber
  2004-08-17 13:11 ` Daniel Jacobowitz
  2004-08-17 14:47 ` H. J. Lu
@ 2004-08-25 19:49 ` David Lecomber
  2 siblings, 0 replies; 7+ messages in thread
From: David Lecomber @ 2004-08-25 19:49 UTC (permalink / raw)
  To: gdb


I've just filed a bug (1790) and done some investigation into the
backtrace limit variable.

Essentially the limiting doesn't work: set it to 10, and it always
claims to be true.  The culprit is  that the type of frame->level is
int, and the type of backtrace_limit is uint.

This wouldn't be a problem except that there is always a frame with
level -1 (it seems to be the 0 level-frame when printed out).

So, the non-elegant fix is:

RCS file: /cvs/src/src/gdb/frame.c,v
retrieving revision 1.190
diff -c -p -r1.190 frame.c
*** frame.c     2 Aug 2004 03:36:24 -0000       1.190
--- frame.c     25 Aug 2004 19:48:30 -0000
*************** get_prev_frame (struct frame_info *this_
*** 1179,1185 ****
        return NULL;
      }

!   if (this_frame->level > backtrace_limit)
      {
        error ("Backtrace limit of %d exceeded", backtrace_limit);
      }
--- 1179,1186 ----
        return NULL;
      }

!   if (this_frame->level != -1 &&
!       (unsigned int) this_frame->level > backtrace_limit)
      {
        error ("Backtrace limit of %d exceeded", backtrace_limit);


But I'm open to suggestions involving making backtrace_limit a signed
int.

Any comments?

Cheers
David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-08-25 19:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-17 11:46 Opteron Stack Woes David Lecomber
2004-08-17 13:11 ` Daniel Jacobowitz
2004-08-17 13:29   ` David Lecomber
2004-08-17 14:10     ` Daniel Jacobowitz
2004-08-17 13:48   ` Michael Chastain
2004-08-17 14:47 ` H. J. Lu
2004-08-25 19:49 ` [RFC] Backtrace limit David Lecomber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).