public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* Just use google for archive searching?
@ 2002-12-19 21:41 Christopher Faylor
  2002-12-19 21:52 ` Hans-Peter Nilsson
  2002-12-21 21:29 ` Zack Weinberg
  0 siblings, 2 replies; 14+ messages in thread
From: Christopher Faylor @ 2002-12-19 21:41 UTC (permalink / raw)
  To: overseers

It seems like we might be able to just stop doing htdig and let google
do our archiving for us.

Something like:

site:gcc.gnu.org "Index Nav:"  search term here

might be enough.

Is there any reason why we can't just let google do our work for us?
It's already hitting the web server archiving messages so why not let it
deal with archive searches, too?

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-19 21:41 Just use google for archive searching? Christopher Faylor
@ 2002-12-19 21:52 ` Hans-Peter Nilsson
  2002-12-19 23:46   ` Hans-Peter Nilsson
  2002-12-21 21:29 ` Zack Weinberg
  1 sibling, 1 reply; 14+ messages in thread
From: Hans-Peter Nilsson @ 2002-12-19 21:52 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers

On Thu, 19 Dec 2002, Christopher Faylor wrote:
> Is there any reason why we can't just let google do our work for us?
> It's already hitting the web server archiving messages so why not let it
> deal with archive searches, too?

For the sourceware side, indeed.  Though google has an update
frequency which (IIRC and rumors) is in the order of "weekly".
Maybe it isn't an issue, but I recall a report that assumed
updates were less than the current 48h for the htdig index.

For the gcc side, I think, as I've mentioned before, that there
are political issues with using a non-free (as in source-code)
resource.  Replacing a free-software indexer with a non-free one
would be a no-no.

brgds, H-P

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-19 21:52 ` Hans-Peter Nilsson
@ 2002-12-19 23:46   ` Hans-Peter Nilsson
  2002-12-20  2:27     ` Zack Weinberg
  0 siblings, 1 reply; 14+ messages in thread
From: Hans-Peter Nilsson @ 2002-12-19 23:46 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers

On Fri, 20 Dec 2002, Hans-Peter Nilsson wrote:
> For the gcc side, I think, as I've mentioned before, that there
> are political issues with using a non-free (as in source-code)
> resource.  Replacing a free-software indexer with a non-free one
> would be a no-no.

I forgot to insert a word: "Replacing a *working* free-software
indexer".  That might very soon make a difference.  I don't plan
to tweak htdig no more once it really gets over the 1<<31
file-size limit.

brgds, H-P

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-19 23:46   ` Hans-Peter Nilsson
@ 2002-12-20  2:27     ` Zack Weinberg
  2002-12-20 11:50       ` Hans-Peter Nilsson
  0 siblings, 1 reply; 14+ messages in thread
From: Zack Weinberg @ 2002-12-20  2:27 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Christopher Faylor, overseers

Hans-Peter Nilsson <hp@bitrange.com> writes:

> On Fri, 20 Dec 2002, Hans-Peter Nilsson wrote:
>> For the gcc side, I think, as I've mentioned before, that there
>> are political issues with using a non-free (as in source-code)
>> resource.  Replacing a free-software indexer with a non-free one
>> would be a no-no.
>
> I forgot to insert a word: "Replacing a *working* free-software
> indexer".  That might very soon make a difference.  I don't plan
> to tweak htdig no more once it really gets over the 1<<31
> file-size limit.

What happened to the idea of using mnogosearch or other alternative?

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20  2:27     ` Zack Weinberg
@ 2002-12-20 11:50       ` Hans-Peter Nilsson
  2002-12-20 12:31         ` Christopher Faylor
  0 siblings, 1 reply; 14+ messages in thread
From: Hans-Peter Nilsson @ 2002-12-20 11:50 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Christopher Faylor, overseers

On Thu, 19 Dec 2002, Zack Weinberg wrote:
> Hans-Peter Nilsson <hp@bitrange.com> writes:
> What happened to the idea of using mnogosearch or other alternative?

I guess that question is related to the advent of the new
machine.  I haven't looked into it myself.

brgds, H-P

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20 11:50       ` Hans-Peter Nilsson
@ 2002-12-20 12:31         ` Christopher Faylor
  2002-12-20 14:15           ` Hans-Peter Nilsson
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Faylor @ 2002-12-20 12:31 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Zack Weinberg, overseers

On Fri, Dec 20, 2002 at 05:26:48AM -0500, Hans-Peter Nilsson wrote:
>On Thu, 19 Dec 2002, Zack Weinberg wrote:
>>Hans-Peter Nilsson <hp@bitrange.com> writes: What happened to the idea
>>of using mnogosearch or other alternative?
>
>I guess that question is related to the advent of the new machine.  I
>haven't looked into it myself.

I ran an alternative on the new machine and was surprised by how long it
took to index.  I've since researched other methods that can be used for
incremental updating so the overall effect won't be as noticeable.
However, it's hard to compare "won't be as noticeable" with "almost no
impact".

I'm not sure I understand the political issues involved in using google.
Maybe this is an obvious statement but taking the political agenda to
the point of dictating what archiving method can be used is ridiculous.

I suppose I'll have another political battle very soon when we switch
the IP address...

Btw, my reason for mentioning this is because of the previously mentioned
problems with htdig and gcc.

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20 12:31         ` Christopher Faylor
@ 2002-12-20 14:15           ` Hans-Peter Nilsson
  2002-12-20 21:33             ` Joseph S. Myers
  2002-12-21  5:11             ` Christopher Faylor
  0 siblings, 2 replies; 14+ messages in thread
From: Hans-Peter Nilsson @ 2002-12-20 14:15 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Zack Weinberg, overseers

On Fri, 20 Dec 2002, Christopher Faylor wrote:
> On Fri, Dec 20, 2002 at 05:26:48AM -0500, Hans-Peter Nilsson wrote:
> >On Thu, 19 Dec 2002, Zack Weinberg wrote:
> >>Hans-Peter Nilsson <hp@bitrange.com> writes: What happened to the idea
> >>of using mnogosearch or other alternative?
> >
> >I guess that question is related to the advent of the new machine.  I
> >haven't looked into it myself.
>
> I ran an alternative on the new machine and was surprised by how long it
> took to index.

*How* long?  Hours?  Days?  Today it's about ten hours from
scratch with htdig-3.1.5.  (I may be wrong about that, looking
at failure logs; I'll follow up with figures from a *successful*
run.)  With htdig-3.2.x it was *days* before I stopped it.

>  I've since researched other methods that can be used for
> incremental updating so the overall effect won't be as noticeable.
> However, it's hard to compare "won't be as noticeable" with "almost no
> impact".
>
> I'm not sure I understand the political issues involved in using google.

It's an external resource that you don't have the source for
(with a GNU/free license).

> Maybe this is an obvious statement but taking the political agenda to
> the point of dictating what archiving method can be used is ridiculous.

Whatever.  Using non-free (as in code) resources for GNU
flagship projects has been a no-no in the past...

> I suppose I'll have another political battle very soon when we switch
> the IP address...

I don't understand what significance the number would have.
Maybe best that way.  Bliss! :-)

> Btw, my reason for mentioning this is because of the previously mentioned
> problems with htdig and gcc.

Thanks.  The idea has come up in the past, and I think there was
the non-free argument then.

brgds, H-P

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20 14:15           ` Hans-Peter Nilsson
@ 2002-12-20 21:33             ` Joseph S. Myers
  2002-12-21  9:31               ` Jonathan Larmour
  2002-12-21  5:11             ` Christopher Faylor
  1 sibling, 1 reply; 14+ messages in thread
From: Joseph S. Myers @ 2002-12-20 21:33 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

On Fri, 20 Dec 2002, Hans-Peter Nilsson wrote:

> > I'm not sure I understand the political issues involved in using google.
> 
> It's an external resource that you don't have the source for
> (with a GNU/free license).

If google have any software patents (that haven't been generally licensed
for at least GPL software) on any of their search algorithms, that would
be another problem.

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20 14:15           ` Hans-Peter Nilsson
  2002-12-20 21:33             ` Joseph S. Myers
@ 2002-12-21  5:11             ` Christopher Faylor
  1 sibling, 0 replies; 14+ messages in thread
From: Christopher Faylor @ 2002-12-21  5:11 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Zack Weinberg, overseers

On Fri, Dec 20, 2002 at 02:49:46PM -0500, Hans-Peter Nilsson wrote:
>> Btw, my reason for mentioning this is because of the previously mentioned
>> problems with htdig and gcc.
>
>Thanks.  The idea has come up in the past, and I think there was
>the non-free argument then.

Probably.  I was probably even dead-set against the idea last time.  :-)

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-20 21:33             ` Joseph S. Myers
@ 2002-12-21  9:31               ` Jonathan Larmour
  2002-12-21 16:21                 ` Joseph S. Myers
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Larmour @ 2002-12-21  9:31 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: overseers

Joseph S. Myers wrote:
 > On Fri, 20 Dec 2002, Hans-Peter Nilsson wrote:
 >
 >
 >>> I'm not sure I understand the political issues involved in using
 >>> google.
 >>
 >> It's an external resource that you don't have the source for (with a
 >> GNU/free license).
 >
 >
 > If google have any software patents (that haven't been generally
 > licensed for at least GPL software) on any of their search algorithms,
 > that would be another problem.

Okay, maybe this is none of my business. But Joseph, who made the machine
you are using? Is it a PC? Got the BIOS source? What about the software
on the (physical) hard disk device controller? I certainly know my mouse 
has patents.

I can't see the problem with using, or rather, _exploiting_ an external
resource. Does the FSF insist on using an ISP that exclusively uses Free
software and with no hardware covered by patents? What's different about
that external resource?

I think as a temporary solution it seems fine. There are definitely
sufficient drawbacks, primarily update frequency, to ensure it would
indeed only be temporary until a replacement for htdig is in place. I 
certainly wouldn't want an archive searcher that wouldn't know about 
messages up to a week old. But I'd like something that worked, and htdig 
suffers from a lot of hiccups despite the efforts of H-P, Chris and others.

Jifl
-- 
--[ "You can complain because roses have thorns, or you ]--
--[  can rejoice because thorns have roses." -Lincoln   ]-- Opinions==mine

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-21  9:31               ` Jonathan Larmour
@ 2002-12-21 16:21                 ` Joseph S. Myers
  2002-12-21 16:34                   ` Jonathan Larmour
  0 siblings, 1 reply; 14+ messages in thread
From: Joseph S. Myers @ 2002-12-21 16:21 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: overseers

On Sat, 21 Dec 2002, Jonathan Larmour wrote:

> Okay, maybe this is none of my business. But Joseph, who made the machine
> you are using? Is it a PC? Got the BIOS source? What about the software
> on the (physical) hard disk device controller? I certainly know my mouse 
> has patents.

I'm not aware of an FSF objection to hardware patents, but there's an FSF
call for a boycott of Amazon on software patent grounds.  The problem of
nonfree BIOSes is being worked on.

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-21 16:21                 ` Joseph S. Myers
@ 2002-12-21 16:34                   ` Jonathan Larmour
  2002-12-21 20:21                     ` Joseph S. Myers
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Larmour @ 2002-12-21 16:34 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: overseers

Joseph S. Myers wrote:
> On Sat, 21 Dec 2002, Jonathan Larmour wrote:
> 
> 
>>Okay, maybe this is none of my business. But Joseph, who made the machine
>>you are using? Is it a PC? Got the BIOS source? What about the software
>>on the (physical) hard disk device controller? I certainly know my mouse 
>>has patents.
> 
> I'm not aware of an FSF objection to hardware patents, but there's an FSF
> call for a boycott of Amazon on software patent grounds.

But from reading the appropriate web page on this, and in particular the 
last 4 paragraphs of it, it is still pretty clear that the reason for that 
boycott in particular is that the idea is in fact obvious, and is an abuse 
of the patent system. And Amazon are dumb enough to try and enforce it.

Whereas Google has patented (and it is indeed patented) the PageRank 
algorithm which is genuinely novel and non-obvious.

Must the Free Software community *boycott* every single company that holds 
a software patent? And does that boycott even extend to services the 
company provides completely free of charge?

 >  The problem of
> nonfree BIOSes is being worked on.

What about the rest of my post, particularly re ISPs?

Do you use google? Do you think RMS never ever has?

Jifl
-- 
--[ "You can complain because roses have thorns, or you ]--
--[  can rejoice because thorns have roses." -Lincoln   ]-- Opinions==mine

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-21 16:34                   ` Jonathan Larmour
@ 2002-12-21 20:21                     ` Joseph S. Myers
  0 siblings, 0 replies; 14+ messages in thread
From: Joseph S. Myers @ 2002-12-21 20:21 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: overseers

On Sat, 21 Dec 2002, Jonathan Larmour wrote:

> But from reading the appropriate web page on this, and in particular the 
> last 4 paragraphs of it, it is still pretty clear that the reason for that 
> boycott in particular is that the idea is in fact obvious, and is an abuse 
> of the patent system. And Amazon are dumb enough to try and enforce it.
> 
> Whereas Google has patented (and it is indeed patented) the PageRank 
> algorithm which is genuinely novel and non-obvious.
> 
> Must the Free Software community *boycott* every single company that holds 
> a software patent? And does that boycott even extend to services the 
> company provides completely free of charge?

The FSF object in general to aggressive use of software patents (i.e. 
threatening or suing with them otherwise than in reply to another software 
patent threat).  I don't know whether google has done that or just been 
silent on whether it would.

The question of what should be boycotted (which the FSF can only suggest,
not mandate) in any case is separate from that of what can be linked to
from GNU web pages, which has much more stringent and sometimes
undocumented requirements; for example, links to generic pages of
companies that also promote nonfree software aren't permitted
<http://gcc.gnu.org/ml/gcc-patches/2002-07/msg01189.html>.  I'm not
claiming to agree with all these restrictions, simply that there are such
restrictions for GNU work and the use of software patents could be one
issue of controversy meaning such a link should be cleared with RMS in
advance; agreement or disagreement with such restrictions is irrelevant to
the question of what issues there might be with using a particular link.

> What about the rest of my post, particularly re ISPs?

I am not familiar with the FSF's ISP selection policies, but I'd be
surprised if they don't use free software based ISPs.

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Just use google for archive searching?
  2002-12-19 21:41 Just use google for archive searching? Christopher Faylor
  2002-12-19 21:52 ` Hans-Peter Nilsson
@ 2002-12-21 21:29 ` Zack Weinberg
  1 sibling, 0 replies; 14+ messages in thread
From: Zack Weinberg @ 2002-12-21 21:29 UTC (permalink / raw)
  To: overseers

Christopher Faylor <cgf@redhat.com> writes:

> It seems like we might be able to just stop doing htdig and let google
> do our archiving for us.

I have thought of a technical reason why this is not a good idea.

When I do site searches, it's always the mailing lists I'm searching,
it's almost always for an exact string, and I want to see the results
sorted by date.  Google doesn't provide a convenient way to limit the
search to just the mailing lists, nor to sort the results by date, and
I'm not sure what their rules are regarding characters that can appear
in strings.  (The search I want boils down to recursive 'grep' --
regular expressions and all -- which htdig doesn't give me either, of
course.)

zw

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-12-22  0:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-19 21:41 Just use google for archive searching? Christopher Faylor
2002-12-19 21:52 ` Hans-Peter Nilsson
2002-12-19 23:46   ` Hans-Peter Nilsson
2002-12-20  2:27     ` Zack Weinberg
2002-12-20 11:50       ` Hans-Peter Nilsson
2002-12-20 12:31         ` Christopher Faylor
2002-12-20 14:15           ` Hans-Peter Nilsson
2002-12-20 21:33             ` Joseph S. Myers
2002-12-21  9:31               ` Jonathan Larmour
2002-12-21 16:21                 ` Joseph S. Myers
2002-12-21 16:34                   ` Jonathan Larmour
2002-12-21 20:21                     ` Joseph S. Myers
2002-12-21  5:11             ` Christopher Faylor
2002-12-21 21:29 ` Zack Weinberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).