public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* sourceware on search engines
@ 2010-11-10  7:16 Jonathan Larmour
  2010-11-10 14:21 ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Larmour @ 2010-11-10  7:16 UTC (permalink / raw)
  To: overseers

I'm sure someone must have provided some rationale at some point which I
missed, but why aren't the sourceware web pages (particularly the mailing
list archives) indexed by search engines? Relying on sourceware's internal
search only seems very limiting.

If it's a general site-wide principle, can I opt my project out so that it
can be indexed?

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-10  7:16 sourceware on search engines Jonathan Larmour
@ 2010-11-10 14:21 ` Frank Ch. Eigler
  2010-11-11 17:59   ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-11-10 14:21 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: overseers

Hi -

> I'm sure someone must have provided some rationale at some point which I
> missed, but why aren't the sourceware web pages (particularly the mailing
> list archives) indexed by search engines? [...]

They generally are indexed (not included in robots.txt).  Can you give
an example of what you see missing?

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-10 14:21 ` Frank Ch. Eigler
@ 2010-11-11 17:59   ` Tom Tromey
  2010-11-11 18:06     ` Per Bothner
  2010-11-11 18:09     ` Frank Ch. Eigler
  0 siblings, 2 replies; 7+ messages in thread
From: Tom Tromey @ 2010-11-11 17:59 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Jonathan Larmour, overseers

>>>>> "Frank" == Frank Ch Eigler <fche@redhat.com> writes:

Frank> They generally are indexed (not included in robots.txt).  Can you give
Frank> an example of what you see missing?

I've seen this too.
Almost any search that I would expect to hit on sourceware.org instead
pulls up results from elsewhere, often cygwin.ru.

E.g., search for "systemtap signedness roland" on Google.
This shows cygwin.ru, nabble.com, but not sourceware.
Now add "site:sourceware.org" -- I see no hits.

Tom

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-11 17:59   ` Tom Tromey
@ 2010-11-11 18:06     ` Per Bothner
  2010-11-11 18:09     ` Frank Ch. Eigler
  1 sibling, 0 replies; 7+ messages in thread
From: Per Bothner @ 2010-11-11 18:06 UTC (permalink / raw)
  To: overseers

On 11/11/2010 10:03 AM, Tom Tromey wrote:
>>>>>> "Frank" == Frank Ch Eigler<fche@redhat.com>  writes:
>
> Frank>  They generally are indexed (not included in robots.txt).  Can you give
> Frank>  an example of what you see missing?
>
> I've seen this too.
> Almost any search that I would expect to hit on sourceware.org instead
> pulls up results from elsewhere, often cygwin.ru.
>
> E.g., search for "systemtap signedness roland" on Google.
> This shows cygwin.ru, nabble.com, but not sourceware.
> Now add "site:sourceware.org" -- I see no hits.

I Googled for "kawa mailing list" and http://www.cygwin.com/ml/kawa/
came up, rather than sourceware,org,  So it *is* being indexed - just under
the wrong hostname.
-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-11 17:59   ` Tom Tromey
  2010-11-11 18:06     ` Per Bothner
@ 2010-11-11 18:09     ` Frank Ch. Eigler
  2010-11-11 19:50       ` Jonathan Larmour
  1 sibling, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-11-11 18:09 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Jonathan Larmour, overseers

Hi -

> Frank> They generally are indexed (not included in robots.txt).  Can you give
> Frank> an example of what you see missing?
> 
> I've seen this too.
> Almost any search that I would expect to hit on sourceware.org instead
> pulls up results from elsewhere, often cygwin.ru.

google has funny heuristics about which copy of a mirror to present for
any given query.


> E.g., search for "systemtap signedness roland" on Google.
> This shows cygwin.ru, nabble.com, but not sourceware.
> Now add "site:sourceware.org" -- I see no hits.

OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
we get hits just fine.  So whatever the problem is, it's not as simple
as it being blocked.  It's more about freshness or crawling rate or
something.

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-11 18:09     ` Frank Ch. Eigler
@ 2010-11-11 19:50       ` Jonathan Larmour
  2010-11-11 20:28         ` Christopher Faylor
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Larmour @ 2010-11-11 19:50 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: overseers

On 11/11/10 18:09, Frank Ch. Eigler wrote:
> Hi -
> 
>> Frank> They generally are indexed (not included in robots.txt).  Can you give
>> Frank> an example of what you see missing?
>>
>> I've seen this too.
>> Almost any search that I would expect to hit on sourceware.org instead
>> pulls up results from elsewhere, often cygwin.ru.
> 
> google has funny heuristics about which copy of a mirror to present for
> any given query.

It seems like this is indeed the origin of my belief too. If I look much
later in search results I do eventually see sourceware aliases crop up.
For example a google for "gdb cortex registers" does eventually show a
result straight from sourceware, but only on p.10.

>> E.g., search for "systemtap signedness roland" on Google.
>> This shows cygwin.ru, nabble.com, but not sourceware.
>> Now add "site:sourceware.org" -- I see no hits.
> 
> OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
> we get hits just fine.  So whatever the problem is, it's not as simple
> as it being blocked.  It's more about freshness or crawling rate or
> something.

Perhaps it's the presence of all the site aliases? I vaguely recall that
google lowers the rank of sites that have aliases pointing to the same
pages - that's a ploy that people have done in the past to try and improve
their search ranking. Maybe cygwin.com should only have links to the
cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on?

Jifl
-- 
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sourceware on search engines
  2010-11-11 19:50       ` Jonathan Larmour
@ 2010-11-11 20:28         ` Christopher Faylor
  0 siblings, 0 replies; 7+ messages in thread
From: Christopher Faylor @ 2010-11-11 20:28 UTC (permalink / raw)
  To: Frank Ch. Eigler, overseers, Jonathan Larmour

On Thu, Nov 11, 2010 at 07:50:34PM +0000, Jonathan Larmour wrote:
>On 11/11/10 18:09, Frank Ch. Eigler wrote:
>> Hi -
>> 
>>> Frank> They generally are indexed (not included in robots.txt).  Can you give
>>> Frank> an example of what you see missing?
>>>
>>> I've seen this too.
>>> Almost any search that I would expect to hit on sourceware.org instead
>>> pulls up results from elsewhere, often cygwin.ru.
>> 
>> google has funny heuristics about which copy of a mirror to present for
>> any given query.
>
>It seems like this is indeed the origin of my belief too. If I look much
>later in search results I do eventually see sourceware aliases crop up.
>For example a google for "gdb cortex registers" does eventually show a
>result straight from sourceware, but only on p.10.
>
>>> E.g., search for "systemtap signedness roland" on Google.
>>> This shows cygwin.ru, nabble.com, but not sourceware.
>>> Now add "site:sourceware.org" -- I see no hits.
>> 
>> OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
>> we get hits just fine.  So whatever the problem is, it's not as simple
>> as it being blocked.  It's more about freshness or crawling rate or
>> something.
>
>Perhaps it's the presence of all the site aliases? I vaguely recall that
>google lowers the rank of sites that have aliases pointing to the same
>pages - that's a ploy that people have done in the past to try and improve
>their search ranking. Maybe cygwin.com should only have links to the
>cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on?

And don't forget: sources.redhat.com.

http://google.com/search?q=systemtap+signedness+roland+site:sources.redhat.com

cgf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-11 20:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-10  7:16 sourceware on search engines Jonathan Larmour
2010-11-10 14:21 ` Frank Ch. Eigler
2010-11-11 17:59   ` Tom Tromey
2010-11-11 18:06     ` Per Bothner
2010-11-11 18:09     ` Frank Ch. Eigler
2010-11-11 19:50       ` Jonathan Larmour
2010-11-11 20:28         ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).