public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* last days of htdig
@ 2003-09-27  7:18 Hans-Peter Nilsson
  2003-09-27 15:21 ` Christopher Faylor
  2003-09-28 22:40 ` Hans-Peter Nilsson
  0 siblings, 2 replies; 15+ messages in thread
From: Hans-Peter Nilsson @ 2003-09-27  7:18 UTC (permalink / raw)
  To: overseers

There's been no update since 2003-07-31; that's when some file
passed 2G and it all fell apart, save for the existing DB.  I'm
going to try to exclude parts of gcc.gnu.org from indexing,
probably some mailing lists.  Just so you know when the machine
slows down. :-)  If that doesn't work, I think I'm just going to
leave it.  No fun in that, and anyway it doesn't seem a critical
function anymore.

(Chris, what happened to the mnogosearch initiative?)

brgds, H-P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27  7:18 last days of htdig Hans-Peter Nilsson
@ 2003-09-27 15:21 ` Christopher Faylor
  2003-09-27 15:54   ` Hans-Peter Nilsson
  2003-09-27 19:45   ` Matthew Galgoci
  2003-09-28 22:40 ` Hans-Peter Nilsson
  1 sibling, 2 replies; 15+ messages in thread
From: Christopher Faylor @ 2003-09-27 15:21 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

On Sat, Sep 27, 2003 at 03:18:10AM -0400, Hans-Peter Nilsson wrote:
>There's been no update since 2003-07-31; that's when some file
>passed 2G and it all fell apart, save for the existing DB.  I'm
>going to try to exclude parts of gcc.gnu.org from indexing,
>probably some mailing lists.  Just so you know when the machine
>slows down. :-)  If that doesn't work, I think I'm just going to
>leave it.  No fun in that, and anyway it doesn't seem a critical
>function anymore.
>
>(Chris, what happened to the mnogosearch initiative?)

Matt, are you reading this?  Want to take a stab at moving to
mnogosearch?

I think we've already been over this ground, HP, but couldn't we
just recompile htdig on the new kernel to bypass the 2G limit?

cgf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 15:21 ` Christopher Faylor
@ 2003-09-27 15:54   ` Hans-Peter Nilsson
  2003-09-27 16:03     ` Christopher Faylor
  2003-09-27 16:06     ` Joseph S. Myers
  2003-09-27 19:45   ` Matthew Galgoci
  1 sibling, 2 replies; 15+ messages in thread
From: Hans-Peter Nilsson @ 2003-09-27 15:54 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers

On Sat, 27 Sep 2003, Christopher Faylor wrote:
> I think we've already been over this ground,

Nah, not really.  I think I mentioned a vague hunch that there
were some large-file-patches for the kernel, allowing >2G (or
better, 4G) files and that they would help.  But I think that
was far from reality, or at least reality of today(s kernel).

> HP, but couldn't we
> just recompile htdig on the new kernel to bypass the 2G limit?

Off the top of my head, I think gnu "sort", sizeof int and
sizeof long (or if we're lucky, just size_t :-) would be the
obvious open sores^Wissues.

Are we on 64-bit iron yet?  When, when, when...? :-) :-) :-)

brgds, H-P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 15:54   ` Hans-Peter Nilsson
@ 2003-09-27 16:03     ` Christopher Faylor
  2003-09-27 16:06     ` Joseph S. Myers
  1 sibling, 0 replies; 15+ messages in thread
From: Christopher Faylor @ 2003-09-27 16:03 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

On Sat, Sep 27, 2003 at 11:54:23AM -0400, Hans-Peter Nilsson wrote:
>On Sat, 27 Sep 2003, Christopher Faylor wrote:
>> I think we've already been over this ground,
>
>Nah, not really.  I think I mentioned a vague hunch that there
>were some large-file-patches for the kernel, allowing >2G (or
>better, 4G) files and that they would help.  But I think that
>was far from reality, or at least reality of today(s kernel).
>
>> HP, but couldn't we
>> just recompile htdig on the new kernel to bypass the 2G limit?
>
>Off the top of my head, I think gnu "sort", sizeof int and
>sizeof long (or if we're lucky, just size_t :-) would be the
>obvious open sores^Wissues.

Ah, ok.  It's never easy.

>Are we on 64-bit iron yet?  When, when, when...? :-) :-) :-)

I know I'm counting my pennies for my home machine.  Athlon64.
Mmmmm...

cgf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 15:54   ` Hans-Peter Nilsson
  2003-09-27 16:03     ` Christopher Faylor
@ 2003-09-27 16:06     ` Joseph S. Myers
  2003-09-27 16:16       ` Hans-Peter Nilsson
  1 sibling, 1 reply; 15+ messages in thread
From: Joseph S. Myers @ 2003-09-27 16:06 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Christopher Faylor, overseers

On Sat, 27 Sep 2003, Hans-Peter Nilsson wrote:

> Off the top of my head, I think gnu "sort", sizeof int and
> sizeof long (or if we're lucky, just size_t :-) would be the
> obvious open sores^Wissues.

GNU utilities have been compiling by default with -D_FILE_OFFSET_BITS=64
for ages (though this doesn't help if they would need too much memory,
just with large files).  The problem would be whether htdig (and any
libraries it uses) is clean about using off_t where appropriate (including
not using library interfaces such as fseek and ftell, in their place
fseeko and ftello) and whether it needs too much memory.  And the large
files support in many of the GNU utilities probably isn't much exercised
beyond their testsuites.

Glibc hides the question of what kernel you're running on from the program
- a program built to use large-files interfaces doesn't itself need to be
built with a >= 2.4 kernel, it just needs to be run under one with a glibc
built against headers from one, and if either of the conditions fails at
runtime there will just be errors when a file would get too large.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 16:06     ` Joseph S. Myers
@ 2003-09-27 16:16       ` Hans-Peter Nilsson
  2003-09-27 16:24         ` Joseph S. Myers
  0 siblings, 1 reply; 15+ messages in thread
From: Hans-Peter Nilsson @ 2003-09-27 16:16 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Christopher Faylor, overseers

On Sat, 27 Sep 2003, Joseph S. Myers wrote:
> On Sat, 27 Sep 2003, Hans-Peter Nilsson wrote:
>
> > Off the top of my head, I think gnu "sort", sizeof int and
> > sizeof long (or if we're lucky, just size_t :-) would be the
> > obvious open sores^Wissues.
>
> GNU utilities have been compiling by default with -D_FILE_OFFSET_BITS=64
> for ages (though this doesn't help if they would need too much memory,
> just with large files).

That's nice to know.  sort --version:
sort (textutils) 2.0.21

>  The problem would be whether htdig (and any
> libraries it uses) is clean about using off_t where appropriate (including
> not using library interfaces such as fseek and ftell, in their place
> fseeko and ftello)

I doubt it does, but that's just me.  Maybe it eventually boils
down to a libstdc++ issue.  Though htdig uses a nice mixture of
stdio and streams IIRC.

> and whether it needs too much memory.

And don't forget the Sleepycat Berkeley DB code.  Though maybe,
just maybe they already had all their chickens in a row.

brgds, H-P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 16:16       ` Hans-Peter Nilsson
@ 2003-09-27 16:24         ` Joseph S. Myers
  0 siblings, 0 replies; 15+ messages in thread
From: Joseph S. Myers @ 2003-09-27 16:24 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Christopher Faylor, overseers

On Sat, 27 Sep 2003, Hans-Peter Nilsson wrote:

> I doubt it does, but that's just me.  Maybe it eventually boils
> down to a libstdc++ issue.  Though htdig uses a nice mixture of
> stdio and streams IIRC.

libstdc++ is the sort of external library there are likely to be problems
with - we don't build multiple versions with/without _FILE_OFFSET_BITS=64
(and there'd be the problem of selecting the correct one at link-time,
even if g++ were to force _FILE_OFFSET_BITS=64 like it forces _GNU_SOURCE
for other reasons).  zlib is another library that can often cause problems
in this way.

Libraries where you're expected to use `foolib-config --cflags` (or in
more modern versions `pkg-config --cflags library-name`) by contrast could
avoid that problem - if their authors had taken the initiative to force
large-files mode for that library and its users.

There's a lot to be said for NetBSD's choice of making off_t 64 bits
unconditionally regardless of whether on a 32-bit or 64-bit system.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 15:21 ` Christopher Faylor
  2003-09-27 15:54   ` Hans-Peter Nilsson
@ 2003-09-27 19:45   ` Matthew Galgoci
  2003-09-27 21:50     ` Christopher Faylor
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Galgoci @ 2003-09-27 19:45 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Hans-Peter Nilsson, overseers

On Sat, 27 Sep 2003, Christopher Faylor wrote:

> On Sat, Sep 27, 2003 at 03:18:10AM -0400, Hans-Peter Nilsson wrote:
> >There's been no update since 2003-07-31; that's when some file
> >passed 2G and it all fell apart, save for the existing DB.  I'm
> >going to try to exclude parts of gcc.gnu.org from indexing,
> >probably some mailing lists.  Just so you know when the machine
> >slows down. :-)  If that doesn't work, I think I'm just going to
> >leave it.  No fun in that, and anyway it doesn't seem a critical
> >function anymore.
> >
> >(Chris, what happened to the mnogosearch initiative?)
> 
> Matt, are you reading this?  Want to take a stab at moving to
> mnogosearch?

Sure. I think we can do that.

> I think we've already been over this ground, HP, but couldn't we
> just recompile htdig on the new kernel to bypass the 2G limit?

Ewwwwww....

-- 

Matthew Galgoci		"If you were a woman I'd kiss you right now."
System Administrator
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 19:45   ` Matthew Galgoci
@ 2003-09-27 21:50     ` Christopher Faylor
  2003-09-27 22:05       ` Ulrich Drepper
  0 siblings, 1 reply; 15+ messages in thread
From: Christopher Faylor @ 2003-09-27 21:50 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: Hans-Peter Nilsson, overseers

On Sat, Sep 27, 2003 at 03:45:26PM -0400, Matthew Galgoci wrote:
>On Sat, 27 Sep 2003, Christopher Faylor wrote:
>
>> On Sat, Sep 27, 2003 at 03:18:10AM -0400, Hans-Peter Nilsson wrote:
>> >There's been no update since 2003-07-31; that's when some file
>> >passed 2G and it all fell apart, save for the existing DB.  I'm
>> >going to try to exclude parts of gcc.gnu.org from indexing,
>> >probably some mailing lists.  Just so you know when the machine
>> >slows down. :-)  If that doesn't work, I think I'm just going to
>> >leave it.  No fun in that, and anyway it doesn't seem a critical
>> >function anymore.
>> >
>> >(Chris, what happened to the mnogosearch initiative?)
>> 
>> Matt, are you reading this?  Want to take a stab at moving to
>> mnogosearch?
>
>Sure. I think we can do that.

Well, that's a relief.

>> I think we've already been over this ground, HP, but couldn't we
>> just recompile htdig on the new kernel to bypass the 2G limit?
>
>Ewwwwww....

?  I said "kernel" when I really meant glibc but it could have been a
real simple way to get things working quickly.

cgf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 21:50     ` Christopher Faylor
@ 2003-09-27 22:05       ` Ulrich Drepper
  2003-09-27 22:21         ` Christopher Faylor
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Drepper @ 2003-09-27 22:05 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Matthew Galgoci, Hans-Peter Nilsson, overseers

Christopher Faylor wrote:

>>>I think we've already been over this ground, HP, but couldn't we
>>>just recompile htdig on the new kernel to bypass the 2G limit?
>>
>>Ewwwwww....
> 
> 
> ?  I said "kernel" when I really meant glibc but it could have been a
> real simple way to get things working quickly.

???  glibc doesn't have a 2GB limit.  You have to use the right
interfaces to get beyond 2GB (e.g. -D_FILE_OFFSET_BITS=64), though.
Recompiling glibc cannot make this the default.  You'd change the size
of all kinds of data types after which I doubt you can reboot the system.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 22:05       ` Ulrich Drepper
@ 2003-09-27 22:21         ` Christopher Faylor
  2003-09-27 22:29           ` Ulrich Drepper
  2003-09-27 22:36           ` Hans-Peter Nilsson
  0 siblings, 2 replies; 15+ messages in thread
From: Christopher Faylor @ 2003-09-27 22:21 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Matthew Galgoci, Hans-Peter Nilsson, overseers

On Sat, Sep 27, 2003 at 03:05:28PM -0700, Ulrich Drepper wrote:
>Christopher Faylor wrote:
>
>>>>I think we've already been over this ground, HP, but couldn't we
>>>>just recompile htdig on the new kernel to bypass the 2G limit?
>>>
>>>Ewwwwww....
>> 
>> ?  I said "kernel" when I really meant glibc but it could have been a
>> real simple way to get things working quickly.
>
>???  glibc doesn't have a 2GB limit.  You have to use the right
>interfaces to get beyond 2GB (e.g. -D_FILE_OFFSET_BITS=64), though.
>Recompiling glibc cannot make this the default.  You'd change the size
>of all kinds of data types after which I doubt you can reboot the system.

cgf said: I think we've already been over this ground, HP, but couldn't we
just recompile htdig on the new kernel to bypass the 2G limit?

then he said: I said "kernel" when I really meant glibc but it could
have been a real simple way to get things working quickly.

Correcting the sentence would give us: I think we've already been over
this ground, HP, but couldn't we just recompile htdig on the new glibc
to bypass the 2G limit?

The "new glibc" in this case would be a glibc (and kernel) from 2003,
when the new sourceware was brought online, rather than one from 1999,
or whenever Jason and company put the old sourceware together.  I wasn't
suggesting that a rebuild of glibc would be necessary.  That would be
counter to my attempts to standardize sourceware so that I can just
run 'up2date' and not worry about anything breaking.

I guess I was assuming that the newer glibc would deal better with
larger file sizes, which may have been an incorrect assumption.  Perhaps
the old glibc worked fine, too.  Regardless, I certainly know that it is
possible to manipulate files over 2GB on linux without rebuilding glibc.

cgf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 22:21         ` Christopher Faylor
@ 2003-09-27 22:29           ` Ulrich Drepper
  2003-09-27 22:36           ` Hans-Peter Nilsson
  1 sibling, 0 replies; 15+ messages in thread
From: Ulrich Drepper @ 2003-09-27 22:29 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Matthew Galgoci, Hans-Peter Nilsson, overseers

Christopher Faylor wrote:

> I guess I was assuming that the newer glibc would deal better with
> larger file sizes, which may have been an incorrect assumption.  Perhaps
> the old glibc worked fine, too.  Regardless, I certainly know that it is
> possible to manipulate files over 2GB on linux without rebuilding glibc.

I think the LFS support predates 1999.  The problem probably is that the
application isn't compiled with -D_FILE_OFFSET_BITS=64 and it also
doesn't use the *64 functions (e.g., fopen64 instead of fopen).  The
latter is an alternative.  If you want to replace something I'd put the
highest priority on the program itself.  Maybe look at the build process
and add, if necessary -D_FILE_OFFSET_BITS=64.

-- 
--------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27 22:21         ` Christopher Faylor
  2003-09-27 22:29           ` Ulrich Drepper
@ 2003-09-27 22:36           ` Hans-Peter Nilsson
  1 sibling, 0 replies; 15+ messages in thread
From: Hans-Peter Nilsson @ 2003-09-27 22:36 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Ulrich Drepper, Matthew Galgoci, overseers

On Sat, 27 Sep 2003, Christopher Faylor wrote:
> Correcting the sentence would give us: I think we've already been over
> this ground, HP, but couldn't we just recompile htdig on the new glibc
> to bypass the 2G limit?

Yeah, but I'm (still) pessimistic, for one due to libstdc++.

(Heh, I too know about -D_FILE_OFFSET_BITS=64 and friends.
I have an unreviewed glibc patch, that I came to think of...)

Yay mnogosearch.  Might it be the silver bullet!

brgds, H-P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-27  7:18 last days of htdig Hans-Peter Nilsson
  2003-09-27 15:21 ` Christopher Faylor
@ 2003-09-28 22:40 ` Hans-Peter Nilsson
  2003-09-29 22:04   ` Gerald Pfeifer
  1 sibling, 1 reply; 15+ messages in thread
From: Hans-Peter Nilsson @ 2003-09-28 22:40 UTC (permalink / raw)
  To: overseers

On Sat, 27 Sep 2003, Hans-Peter Nilsson wrote:
> There's been no update since 2003-07-31; that's when some file
> passed 2G and it all fell apart, save for the existing DB.  I'm
> going to try to exclude parts of gcc.gnu.org from indexing,
> probably some mailing lists.

By excluding /ml/gccadmin, documentation for released versions
matching /onlinedocs/gcc-, and adding some words (see the new
file gcc_bad_words in the htdig-conf dir) that appear in all or
half the messages, like "gcc", "gnu", "org", "patches", "from",
abbrev. day of month, day of week -- except "sun" :-) etc. to
those not indexed, the gcc htdig setup seems to be up and
indexing again.  For a few months that is.

brgds, H-P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: last days of htdig
  2003-09-28 22:40 ` Hans-Peter Nilsson
@ 2003-09-29 22:04   ` Gerald Pfeifer
  0 siblings, 0 replies; 15+ messages in thread
From: Gerald Pfeifer @ 2003-09-29 22:04 UTC (permalink / raw)
  To: Joseph S. Myers, Hans-Peter Nilsson; +Cc: overseers

On Sat, 27 Sep 2003, Joseph S. Myers wrote:
> There's a lot to be said for NetBSD's choice of making off_t 64 bits
> unconditionally regardless of whether on a 32-bit or 64-bit system.

FreeBSD also has 64 bit off_t. ;-)

On Sun, 28 Sep 2003, Hans-Peter Nilsson wrote:
> By excluding /ml/gccadmin, documentation for released versions
> matching /onlinedocs/gcc-, and adding some words (see the new
> file gcc_bad_words in the htdig-conf dir) that appear in all or
> half the messages, like "gcc", "gnu", "org", "patches", "from",
> abbrev. day of month, day of week -- except "sun" :-)

Cute. Well spotted! :-)

> etc. to those not indexed, the gcc htdig setup seems to be up and
> indexing again.  For a few months that is.

Thanks a lot.

Gerald
-- 
Gerald "Jerry"   pfeifer@dbai.tuwien.ac.at   http://www.pfeifer.com/gerald/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-09-29 22:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-27  7:18 last days of htdig Hans-Peter Nilsson
2003-09-27 15:21 ` Christopher Faylor
2003-09-27 15:54   ` Hans-Peter Nilsson
2003-09-27 16:03     ` Christopher Faylor
2003-09-27 16:06     ` Joseph S. Myers
2003-09-27 16:16       ` Hans-Peter Nilsson
2003-09-27 16:24         ` Joseph S. Myers
2003-09-27 19:45   ` Matthew Galgoci
2003-09-27 21:50     ` Christopher Faylor
2003-09-27 22:05       ` Ulrich Drepper
2003-09-27 22:21         ` Christopher Faylor
2003-09-27 22:29           ` Ulrich Drepper
2003-09-27 22:36           ` Hans-Peter Nilsson
2003-09-28 22:40 ` Hans-Peter Nilsson
2003-09-29 22:04   ` Gerald Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).