public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later
@ 2008-12-28 20:52 hvengel at astound dot net
  2009-01-10  9:11 ` [Bug libc/9690] " vapier at gentoo dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: hvengel at astound dot net @ 2008-12-28 20:52 UTC (permalink / raw)
  To: glibc-bugs

Kernel 2.6.26 changed it's time functionality from being microsecond based
(microkernel) to being nanosecond based (nanokernel).  As a result there needs
to be changes made to sysdeps/unix/sysv/linux/sys/timex.h and also to
sysdeps/unix/sysv/linux/ntp_gettime.c to make glibc compatible with these newer
kernel versions.  If this is not done ntp will not function correctly.  Please
see http://sourceware.org/ml/libc-alpha/2008-03/msg00076.html for a set of
patches based on glibc 2.7 that will correct this problem. 

With the current code base ntp builds for a microkernel and it's functions to
set the clocks rate will be off by 3 orders of magnitude since the kernel is
passing and expecting to get nanosecond values.  This results in the clock being
unstable.  This is most noticeable when using a high quality reference clock
like a GPS where it should be possible to get near microsecond offsets with a
properly functioning clock but with the current glibc code it is more typical to
see around 100 microsecond offsets with the clock slewing between positive and
negative offsets and sometimes being off by as much as 500 microseconds. 

I have been using the patches from the above link for several months without
issue and these patches are being widely used by users of the LinuxPPS kernel
patch set.  The LinuxPPS patch set is now appearing in the daily mm kernel
snapshots and it is highly likely that kernel 2.6.29 will be shipped with the
LinuxPPS patches.  This will allow more users to attach reference clocks to
their machines and the issues with glibc and ntp and these newer kernels will
become painfully apparent when these users do not get the clock accuracy that
they were expecting.

-- 
           Summary: glibc time functionality broken with kernel 2.6.26 and
                    later
           Product: glibc
           Version: 2.8
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: hvengel at astound dot net
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
@ 2009-01-10  9:11 ` vapier at gentoo dot org
  2009-02-07  5:13 ` drepper at redhat dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vapier at gentoo dot org @ 2009-01-10  9:11 UTC (permalink / raw)
  To: glibc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |toolchain at gentoo dot org


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
  2009-01-10  9:11 ` [Bug libc/9690] " vapier at gentoo dot org
@ 2009-02-07  5:13 ` drepper at redhat dot com
  2009-02-07  6:31 ` hvengel at astound dot net
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2009-02-07  5:13 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2009-02-07 05:13 -------
The patch changes the ABI.  The size of ntptimeval cannot be changed.  Just
imagine what happens if an old application is used with such a new glibc.  The
assignment to ->tai in ntp_gettime would corrupt memory.

You have to create a different data structure and a new version of the
ntp_gettime function.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
  2009-01-10  9:11 ` [Bug libc/9690] " vapier at gentoo dot org
  2009-02-07  5:13 ` drepper at redhat dot com
@ 2009-02-07  6:31 ` hvengel at astound dot net
  2009-04-26 23:45 ` samuel dot thibault at ens-lyon dot org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hvengel at astound dot net @ 2009-02-07  6:31 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From hvengel at astound dot net  2009-02-07 06:31 -------
I know it changes the ABI.  In any case the situation with current kernels,
glibc and ntp is clearly broken and should to be fixed.

The patch in question was submitted to the glibc email list almost a year ago
and the kernel changes that the patch was designed to address were released as a
stable kernel about 5 months ago.  The current timex.h header is based on linux
version 2.2 and is now out of sync with recent kernels.  

The patch is not mine and I was under the impression that it was submitted by a
glibc developer but I could be wrong.  In any case the person who submitted the
patch to the mailing list asked about how the ABI issue should be addressed and
it appears that no one replied to him with any suggestions about how to handle
this issue.

Also I did a search on my system to see what was using ntp_gettime and only
found two things - ntptime and libc.  So this call is not very widely used. 
However I can understand your concern over maintaining ABI compatibility as an
old ntp installation would be broken by a glibc with this patch applied.  

Prior to discovering this patch I and other LinuxPPS users had been using a
modified version of timex.h that had the new *NANO* declarations added but that
did not change the data structure or use ->tai and this seemed to work OK.   So
it might be possible to only use the part of the patch that affects timex.h
since this would avoid the ABI issue but would fix the NANO related issues that
ntp would otherwise have. 

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (2 preceding siblings ...)
  2009-02-07  6:31 ` hvengel at astound dot net
@ 2009-04-26 23:45 ` samuel dot thibault at ens-lyon dot org
  2009-04-26 23:52 ` samuel dot thibault at ens-lyon dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: samuel dot thibault at ens-lyon dot org @ 2009-04-26 23:45 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From samuel dot thibault at ens-lyon dot org  2009-04-26 23:45 -------
Please do not mix two things:  - the kernel now exposes nanoseconds instead of microseconds.  That's a   kernel ABI break.  It is announced via a STA_NANO flag in   timex.status, but still, old applications are broken when started   under kernels >= 2.6.26.  That's really a concern as it's not even   easy to notice while it can irritate users (unstable ntp time). - the kernel now exposes a new tai field.  That's not a kernel ABI   break as it just takes a reserved room.  To expose it to applications   we however need to change the userland ABI.  I'd really much rather see a kernel fix for the first issue: the kernel should report nanoseconds _only_ if userland requests it.  And the case of a new application running with an old kernel _has_ to be taken care of as well.  As for the second issue, see Ulrich's comment: just define a new version. See for instance the sched_setaffinity() function that has changed its ABI (and API too actually). 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |samuel dot thibault at ens-
                   |                            |lyon dot org


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (3 preceding siblings ...)
  2009-04-26 23:45 ` samuel dot thibault at ens-lyon dot org
@ 2009-04-26 23:52 ` samuel dot thibault at ens-lyon dot org
  2009-05-07 21:28 ` johnstul at us dot ibm dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: samuel dot thibault at ens-lyon dot org @ 2009-04-26 23:52 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From samuel dot thibault at ens-lyon dot org  2009-04-26 23:52 -------
Subject: Re:  glibc time functionality broken with kernel 2.6.26 and later

Grah, I just hate these stupid web interfaces.  Hopefully this time it
doesn't thrashes my layout.

Please do not mix two things:

- the kernel now exposes nanoseconds instead of microseconds.  That's a
kernel ABI break.  It is announced via a STA_NANO flag in timex.status,
but still, old applications are broken when started under kernels >=
2.6.26.  That's really a concern as it's not even easy to notice while
it can irritate users (unstable ntp time).
- the kernel now exposes a new tai field.  That's not a kernel ABI break
as it just takes a reserved room.  To expose it to applications we
however need to change the userland ABI.

I'd really much rather see a kernel fix for the first issue: the kernel
should report nanoseconds _only_ if userland requests it.  And the case
of a new application running with an old kernel _has_ to be taken care
of as well.

As for the second issue, see Ulrich's comment: just define a new
version. See for instance the sched_setaffinity() function that has
changed its ABI (and API too actually).


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (4 preceding siblings ...)
  2009-04-26 23:52 ` samuel dot thibault at ens-lyon dot org
@ 2009-05-07 21:28 ` johnstul at us dot ibm dot com
  2009-05-07 22:59 ` samuel dot thibault at ens-lyon dot org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: johnstul at us dot ibm dot com @ 2009-05-07 21:28 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From johnstul at us dot ibm dot com  2009-05-07 21:28 -------
(In reply to comment #4)
> Please do not mix two things:
> 
> - the kernel now exposes nanoseconds instead of microseconds.  That's a
> kernel ABI break.  It is announced via a STA_NANO flag in timex.status,
> but still, old applications are broken when started under kernels >=
> 2.6.26.  That's really a concern as it's not even easy to notice while
> it can irritate users (unstable ntp time).

I'm not sure this is true. The kernel internally multiplies microseconds up to
nanoseconds if the STA_NANO bit is not set. So old applications should behave
properly.

> - the kernel now exposes a new tai field.  That's not a kernel ABI break
> as it just takes a reserved room.  To expose it to applications we
> however need to change the userland ABI.

This is my understanding as well.

> I'd really much rather see a kernel fix for the first issue: the kernel
> should report nanoseconds _only_ if userland requests it.  And the case
> of a new application running with an old kernel _has_ to be taken care
> of as well.

Please bring this up on lkml and CC me if you have evidence of problems here.
I'll be happy to look at it.

> As for the second issue, see Ulrich's comment: just define a new
> version. See for instance the sched_setaffinity() function that has
> changed its ABI (and API too actually).

Do we know if anyone is still working this? Roman's patch was seemingly ignored
with no feedback. Additionally he's not been around much recently, so I'm not
sure if he will be following up with fixes.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |johnstul at us dot ibm dot
                   |                            |com, mlichvar at redhat dot
                   |                            |com


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (5 preceding siblings ...)
  2009-05-07 21:28 ` johnstul at us dot ibm dot com
@ 2009-05-07 22:59 ` samuel dot thibault at ens-lyon dot org
  2009-05-07 23:10 ` hvengel at astound dot net
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: samuel dot thibault at ens-lyon dot org @ 2009-05-07 22:59 UTC (permalink / raw)
  To: glibc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]


------- Additional Comments From samuel dot thibault at ens-lyon dot org  2009-05-07 22:59 -------
Subject: Re:  glibc time functionality broken with kernel 2.6.26 and later

johnstul at us dot ibm dot com, le Thu 07 May 2009 21:28:25 -0000, a écrit :
> > - the kernel now exposes nanoseconds instead of microseconds.  That's a
> > kernel ABI break.  It is announced via a STA_NANO flag in timex.status,
> > but still, old applications are broken when started under kernels >=
> > 2.6.26.  That's really a concern as it's not even easy to notice while
> > it can irritate users (unstable ntp time).
> 
> I'm not sure this is true. The kernel internally multiplies microseconds up to
> nanoseconds if the STA_NANO bit is not set. So old applications should behave
> properly.

Again, there are two issues:
- What the kernel takes as parameter. As you say, there is no problem
  indeed, if the application hasn't set the STA_NANO flag, the kernel
  converts properly.
- What the kernel returns. nanoseconds values are advertised by the
  kernel through the STA_NANO flag. But old applications didn't even
  know that flag, and thus can not know that these are nanosecond
  values.

> > As for the second issue, see Ulrich's comment: just define a new
> > version. See for instance the sched_setaffinity() function that has
> > changed its ABI (and API too actually).
> 
> Do we know if anyone is still working this? Roman's patch was seemingly ignored
> with no feedback.

There was: "define a new version to avoid breaking the ABI".

Samuel


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (6 preceding siblings ...)
  2009-05-07 22:59 ` samuel dot thibault at ens-lyon dot org
@ 2009-05-07 23:10 ` hvengel at astound dot net
  2009-05-07 23:12 ` johnstul at us dot ibm dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hvengel at astound dot net @ 2009-05-07 23:10 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From hvengel at astound dot net  2009-05-07 23:10 -------
(In reply to comment #5)
> (In reply to comment #4)
> > Please do not mix two things:
> > 
> > - the kernel now exposes nanoseconds instead of microseconds.  That's a
> > kernel ABI break.  It is announced via a STA_NANO flag in timex.status,
> > but still, old applications are broken when started under kernels >=
> > 2.6.26.  That's really a concern as it's not even easy to notice while
> > it can irritate users (unstable ntp time).
> 
> I'm not sure this is true. The kernel internally multiplies microseconds up to
> nanoseconds if the STA_NANO bit is not set. So old applications should behave
> properly.

The issue is that when ntp builds it looks in sys/timex.h (IE. the glibc
timex.h) and if it does not see STA_NANO and friends it builds as a microsecond
only app with the assumption that it is going to be running against a
microkernel.  When this happens it never asks the kernel if it is a nanokernel
and this results in things not working correctly.  

What LinuxPPS users noticed was that using the combination of a non-nano aware
ntp with the newer nanokernels resulted in time keeping that was, by our
standards and expectations, unstable and we would see the offsets slewing
through a +-500 microsecond range.  Tis was almost two orders of magnitude more
than we were seeing before the switch to the nanokernel.  There was a long
string of emails on the linuxpps email list about this and it almost goes
without saying that we were not happy campers.  We were clueless about what the
cause was and it took a while for us to figure out that the missing STA_NANO &
friends stuff in sys/timex.h was a big part of what we were seeing.  We
discovered this because we have some users who also use FreeBSD and they were
able to point out that for some reason ntp was not detecting the nanokernel like
it should. 

As soon as we added the STA_NANO & friends declarations to sys/timex.h and
rebuilt ntp these issues were greatly improved and offsets were reduced by an
order of magnitude.  The offsets were still higher than before the nanokernel so
there were other issues as well but this was a very big one.  I think John's new
convergence kernel patch may be the other piece of this puzzle.   So I don't
think it is that the kernel is giving out the time in a different format (in
fact it is not) but rather it has something to do with how ntp interacts with
the kernel to make frequency adjustments to the clock (that my guess anyway). 
It clearly does something different if it is in microseond mode than it does in
nanosecond mode. 


I should add that this appears to be a Linux only problem and ntp is not having
these issues on any other OS with a nanokernel.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (7 preceding siblings ...)
  2009-05-07 23:10 ` hvengel at astound dot net
@ 2009-05-07 23:12 ` johnstul at us dot ibm dot com
  2009-05-08 21:07 ` drepper at redhat dot com
  2009-05-08 22:15 ` hvengel at astound dot net
  10 siblings, 0 replies; 12+ messages in thread
From: johnstul at us dot ibm dot com @ 2009-05-07 23:12 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From johnstul at us dot ibm dot com  2009-05-07 23:12 -------
(In reply to comment #6)
> Again, there are two issues:
> - What the kernel takes as parameter. As you say, there is no problem
>   indeed, if the application hasn't set the STA_NANO flag, the kernel
>   converts properly.
> - What the kernel returns. nanoseconds values are advertised by the
>   kernel through the STA_NANO flag. But old applications didn't even
>   know that flag, and thus can not know that these are nanosecond
>   values.

So the concern is only with running old applications after a new application has
set the STA_NANO flag?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (8 preceding siblings ...)
  2009-05-07 23:12 ` johnstul at us dot ibm dot com
@ 2009-05-08 21:07 ` drepper at redhat dot com
  2009-05-08 22:15 ` hvengel at astound dot net
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2009-05-08 21:07 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2009-05-08 21:06 -------
I have added the tai field and various macros to the header on April 21st. 
These changes match the kernel changes.  If this is not correct bring it up with
the kernel people.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/9690] glibc time functionality broken with kernel 2.6.26 and later
  2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
                   ` (9 preceding siblings ...)
  2009-05-08 21:07 ` drepper at redhat dot com
@ 2009-05-08 22:15 ` hvengel at astound dot net
  10 siblings, 0 replies; 12+ messages in thread
From: hvengel at astound dot net @ 2009-05-08 22:15 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From hvengel at astound dot net  2009-05-08 22:14 -------
Thank you for getting these changes in.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=9690

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-05-08 22:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-28 20:52 [Bug libc/9690] New: glibc time functionality broken with kernel 2.6.26 and later hvengel at astound dot net
2009-01-10  9:11 ` [Bug libc/9690] " vapier at gentoo dot org
2009-02-07  5:13 ` drepper at redhat dot com
2009-02-07  6:31 ` hvengel at astound dot net
2009-04-26 23:45 ` samuel dot thibault at ens-lyon dot org
2009-04-26 23:52 ` samuel dot thibault at ens-lyon dot org
2009-05-07 21:28 ` johnstul at us dot ibm dot com
2009-05-07 22:59 ` samuel dot thibault at ens-lyon dot org
2009-05-07 23:10 ` hvengel at astound dot net
2009-05-07 23:12 ` johnstul at us dot ibm dot com
2009-05-08 21:07 ` drepper at redhat dot com
2009-05-08 22:15 ` hvengel at astound dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).