timestamp problems

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* timestamp problems
@ 2008-10-12 20:37 satya komaragiri
  2008-10-12 23:44 ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: satya komaragiri @ 2008-10-12 20:37 UTC (permalink / raw)
  To: systemtap

Hello

I was working on a few scripts[1] that together profile the boot-up
process using systemtap and render an svg image very similar to
bootchart.
I noticed this strange error in timestamps that used to scramble the
XML log generated.  I had initially thought something was wrong with
my scripts and had (after a bit of complaining on my blog) managed a
workaround that problem. Now that I upgraded to F9 and I see that
error still persisting, and considering that I do nothing apart from
just calling a gettimeofday_ms(), am quite convinced that it is not a
problem with my script and even if it is.. not an obvious error that
glares at your face.
( It might also be some boot time peculiarity as when I run the
scripts after boot-up there is no such problem.)

The error:  The timestamps after a sometime  jump back by a huge number.

on Fedora 8
The timestamps starts off with something like 1220226924420
and goes on till 1220226928771
(process executing rc.sysinitentry number 133, time in ms since  first
probe : 4351 ms )
then suddenly jumps back to 1220207130000
(process executing rc.sysinit, entry number 134 )
and then continues normally till the end of the boot process.
The jump is  229.15244213 days.

( the output can be found at [2].. F8 output)

on Fedora 9
The timestamps starts with 1223857645400
and goes till 1223857652873
(process executing udev, entry number 1089, time in ms since first
probe: 7473 ms)
then jumps back to 1223837854000
( process executing udev, entry number 1090)
and then continues normally.
The jump is 229.153622685 days.

( entire output uploaded at [2] F9 output 2 with XMLs )

(the exact numbers are from one particular case.. but the trend and
order of difference is similar)

The process name or the number of entries or time elapsed before such
an event takes place is not constant but the jump is ( ~229 days in
all cases).

My **temporary** workaround:
This solution just serves my purpose and does not solve the  main
issue in anyway.

As the boot process will never last for more than a few minutes,
taking a substring (the last 7 digits) also works for me ( and works
better as then I can use them as int type ) . Luckily, the substring
is always in order resulting in proper functioning of bootlimn. But I
know this is not a solution.

Was wondering if this is a systemtap problem. I'll be grateful if
someone checks it out.

Regards
Satya Komaragiri

[1] http://code.google.com/p/bootlimn/
[2] http://code.google.com/p/bootlimn/downloads/list
project blog: http://meworkstoo.blogspot.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: timestamp problems
  2008-10-12 20:37 timestamp problems satya komaragiri
@ 2008-10-12 23:44 ` Frank Ch. Eigler
  2008-10-13 10:28   ` satya komaragiri
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2008-10-12 23:44 UTC (permalink / raw)
  To: satya komaragiri; +Cc: systemtap


Hi -

> [...] considering that I do nothing apart from
> just calling a gettimeofday_ms(), am quite convinced that it is not a
> problem with my script [...]

It seems like a systemtap (runtime) bug.

> The error:  The timestamps after a sometime  jump back by a huge number.
>
> on Fedora 8
> The timestamps starts off with something like 1220226924420
> and goes on till 1220226928771
> (process executing rc.sysinitentry number 133, time in ms since  first
> probe : 4351 ms )
> then suddenly jumps back to 1220207130000

That is "only" 0.229 days (20 million units if we're talking about
milliseconds), but of course still unacceptable.  (I'm assuming the
main system clock does not jump similarly.)

Can you try something like this:

probe timer.s(10) { now=gettimeofday_ns()
                    printf("systemtap %s %6d\n",ctime(now/1000000000),now%1000000000)
                    system("date -u +\"system    %a %b %d %H:%M:%S %Y %N\"")
                    system("grep -i mhz /proc/cpuinfo") }'

and watch for a divergence over time.  The timer.s(NN) parameter could
be changed to narrow in on a suspected trigger time.


- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: timestamp problems
  2008-10-12 23:44 ` Frank Ch. Eigler
@ 2008-10-13 10:28   ` satya komaragiri
  2008-10-13 12:11     ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: satya komaragiri @ 2008-10-13 10:28 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Hi

On Mon, Oct 13, 2008 at 5:13 AM, Frank Ch. Eigler <fche@redhat.com> wrote:

>
> That is "only" 0.229 days (20 million units if we're talking about
> milliseconds), but of course still unacceptable.  (I'm assuming the
> main system clock does not jump similarly.)
>
Sorry for that.

> Can you try something like this:
>
> probe timer.s(10) { now=gettimeofday_ns()
>                    printf("systemtap %s %6d\n",ctime(now/1000000000),now%1000000000)
>                    system("date -u +\"system    %a %b %d %H:%M:%S %Y %N\"")
>                    system("grep -i mhz /proc/cpuinfo") }'
>
> and watch for a divergence over time.  The timer.s(NN) parameter could
> be changed to narrow in on a suspected trigger time.

I ran the script several times for different intervals and the
timestamps are working fine at normal time
just that the logging is  weird at times.( not in the order that out
probe tells it to print)

snips from the outputs ( mainly showing the messed up logging at times.. )
cpu MHz		: 3065.653
system    Mon Oct 13 04:24:45 2008 408697424
systemtap Mon Oct 13 04:24:45 2008 389369690
systemtap Mon Oct 13 04:24:55 2008 389369783
system    Mon Oct 13 04:24:55 2008 395816927
cpu MHz		: 3065.653
systemtap Mon Oct 13 04:25:05 2008 389369404
system    Mon Oct 13 04:25:05 2008 395788064
cpu MHz		: 3065.653
systemtap Mon Oct 13 04:25:15 2008 389369407
system    Mon Oct 13 04:25:15 2008 395763673
cpu MHz		: 3065.653
.......

==============================
system    Mon Oct 13 04:30:23 2008 725795912
cpu MHz		: 3065.653
systemtap Mon Oct 13 04:30:23 2008 713991064
system    Mon Oct 13 04:30:33 2008 717822396
cpu MHz		: 3065.653
systemtap Mon Oct 13 04:30:33 2008 713992080
system    Mon Oct 13 04:30:43 2008 725775669
.......
systemtap Mon Oct 13 04:32:13 2008 713990711
systemtap Mon Oct 13 04:32:23 2008 713993789
system    Mon Oct 13 04:32:23 2008 729103889
cpu MHz		: 3065.653
systemtap Mon Oct 13 04:32:33 2008 713992105
system    Mon Oct 13 04:32:33 2008 725866561
cpu MHz		: 3065.65


But when I run it at boot time  (with some modifications)
i get


systemtap Mon Oct 13 14:42:16 2008 159315516
systemtap Mon Oct 13 14:42:17 2008 159315673
systemtap Mon Oct 13 14:42:18 2008 159315837
systemtap Mon Oct 13 14:42:19 2008 159316258
//(then jumps)
systemtap Mon Oct 13 09:12:20 2008 645426712
systemtap Mon Oct 13 09:12:21 2008 645426460
systemtap Mon Oct 13 09:12:22 2008 645426486
systemtap Mon Oct 13 09:12:23 2008 645426933
systemtap Mon Oct 13 09:12:24 2008 645426848
systemtap Mon Oct 13 09:12:25 2008 645426421


system    Mon Oct 13 14:42:16 2008 195582695
system    Mon Oct 13 14:42:17 2008 164907660
system    Mon Oct 13 14:42:18 2008 162838435
system    Mon Oct 13 14:42:19 2008 162789467
//(similar jump in system time too )
system    Mon Oct 13 09:12:20 2008 649037391
system    Mon Oct 13 09:12:21 2008 653586541
system    Mon Oct 13 09:12:22 2008 653056695
system    Mon Oct 13 09:12:23 2008 657430857
system    Mon Oct 13 09:12:24 2008 656989129
system    Mon Oct 13 09:12:25 2008 650496853

The time after the jump was the correct UTC time ( i had checked immediately)

I don't know if that's normal or is my clock messed up.

What I did:

Stp script:

probe timer.s(1)
{
  now=gettimeofday_ns()
  printf("systemtap %s %6d\n",ctime(now/1000000000),now%1000000000)
  system("date -u +\"system    %a %b %d %H:%M:%S %Y %N\" >> /debugout/sys.txt")
  system("grep -i mhz /proc/cpuinfo")
}
===========================

mkdir  /debugko
mkdir /debugout

chmod 777  /debugko
chmod 777  /debugout
 (we'll only run it once for trial.. so no harm)

stap -k -g -p4 -m debug ./debug.stp
cp debug.ko /debugko/debug.ko

=============================
have a script  in sbin
say /sbin/trial

contents of /sbin/trial
#!/bin/sh

# Use tmpfs to collect data, since root is still readonly
mount -n -t tmpfs -o size=5M none /debugout

# run as long until exit condition satisfied
# Start systemtap scriot & probe

/usr/bin/staprun /debugko/debug.ko -c 'sleep 10' -o /debugout/trial.txt &

# Give daemon time to start collecting...
sleep  3

# Hand off to real init
exec /sbin/init

===========================================

change the grub to include( copy pasting from my menu.lst)

title trial
	root (hd0,2)
	kernel /vmlinuz-2.6.25-14.fc9.i686 ro
root=UUID=be43b64a-bfa8-43fa-8c8d-cd9a036a3301 rhgb quiet
init=/sbin/trial
	initrd /initrd-2.6.25-14.fc9.i686.img

=============================================

reboot into 'trial'  open terminal to do a sleep 10 and the check in /debugout

Regards
Satya

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: timestamp problems
  2008-10-13 10:28   ` satya komaragiri
@ 2008-10-13 12:11     ` Frank Ch. Eigler
  2008-10-15 13:49       ` satya komaragiri
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2008-10-13 12:11 UTC (permalink / raw)
  To: satya komaragiri; +Cc: systemtap

Hi -

On Mon, Oct 13, 2008 at 03:58:01PM +0530, satya komaragiri wrote:
> [...at boot time ...]
> systemtap Mon Oct 13 14:42:18 2008 159315837
> systemtap Mon Oct 13 14:42:19 2008 159316258
> //(then jumps)
> systemtap Mon Oct 13 09:12:20 2008 645426712
> systemtap Mon Oct 13 09:12:21 2008 645426460
> [...]
> system    Mon Oct 13 14:42:19 2008 162789467
> //(similar jump in system time too )
> system    Mon Oct 13 09:12:20 2008 649037391
> [...]
> The time after the jump was the correct UTC time ( i had checked immediately)
> I don't know if that's normal or is my clock messed up.

My guess is that your hardware RTC clock is set to some local time
zone, and the kernel can't recompute its system time in UTC until
userspace comes up and tells it what the RTC time zone was.  If so,
setting your hardware clock to UTC should fix this problem.


- FChE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: timestamp problems
  2008-10-13 12:11     ` Frank Ch. Eigler
@ 2008-10-15 13:49       ` satya komaragiri
  0 siblings, 0 replies; 5+ messages in thread
From: satya komaragiri @ 2008-10-15 13:49 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Hello

> My guess is that your hardware RTC clock is set to some local time
> zone, and the kernel can't recompute its system time in UTC until
> userspace comes up and tells it what the RTC time zone was.  If so,
> setting your hardware clock to UTC should fix this problem.

Thank you. Sorry for the late response.


Regards
Satya

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-15 13:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-12 20:37 timestamp problems satya komaragiri
2008-10-12 23:44 ` Frank Ch. Eigler
2008-10-13 10:28   ` satya komaragiri
2008-10-13 12:11     ` Frank Ch. Eigler
2008-10-15 13:49       ` satya komaragiri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).