* sourceware on search engines
@ 2010-11-10 7:16 Jonathan Larmour
2010-11-10 14:21 ` Frank Ch. Eigler
0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Larmour @ 2010-11-10 7:16 UTC (permalink / raw)
To: overseers
I'm sure someone must have provided some rationale at some point which I
missed, but why aren't the sourceware web pages (particularly the mailing
list archives) indexed by search engines? Relying on sourceware's internal
search only seems very limiting.
If it's a general site-wide principle, can I opt my project out so that it
can be indexed?
Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-10 7:16 sourceware on search engines Jonathan Larmour
@ 2010-11-10 14:21 ` Frank Ch. Eigler
2010-11-11 17:59 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-11-10 14:21 UTC (permalink / raw)
To: Jonathan Larmour; +Cc: overseers
Hi -
> I'm sure someone must have provided some rationale at some point which I
> missed, but why aren't the sourceware web pages (particularly the mailing
> list archives) indexed by search engines? [...]
They generally are indexed (not included in robots.txt). Can you give
an example of what you see missing?
- FChE
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-10 14:21 ` Frank Ch. Eigler
@ 2010-11-11 17:59 ` Tom Tromey
2010-11-11 18:06 ` Per Bothner
2010-11-11 18:09 ` Frank Ch. Eigler
0 siblings, 2 replies; 7+ messages in thread
From: Tom Tromey @ 2010-11-11 17:59 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Jonathan Larmour, overseers
>>>>> "Frank" == Frank Ch Eigler <fche@redhat.com> writes:
Frank> They generally are indexed (not included in robots.txt). Can you give
Frank> an example of what you see missing?
I've seen this too.
Almost any search that I would expect to hit on sourceware.org instead
pulls up results from elsewhere, often cygwin.ru.
E.g., search for "systemtap signedness roland" on Google.
This shows cygwin.ru, nabble.com, but not sourceware.
Now add "site:sourceware.org" -- I see no hits.
Tom
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-11 17:59 ` Tom Tromey
@ 2010-11-11 18:06 ` Per Bothner
2010-11-11 18:09 ` Frank Ch. Eigler
1 sibling, 0 replies; 7+ messages in thread
From: Per Bothner @ 2010-11-11 18:06 UTC (permalink / raw)
To: overseers
On 11/11/2010 10:03 AM, Tom Tromey wrote:
>>>>>> "Frank" == Frank Ch Eigler<fche@redhat.com> writes:
>
> Frank> They generally are indexed (not included in robots.txt). Can you give
> Frank> an example of what you see missing?
>
> I've seen this too.
> Almost any search that I would expect to hit on sourceware.org instead
> pulls up results from elsewhere, often cygwin.ru.
>
> E.g., search for "systemtap signedness roland" on Google.
> This shows cygwin.ru, nabble.com, but not sourceware.
> Now add "site:sourceware.org" -- I see no hits.
I Googled for "kawa mailing list" and http://www.cygwin.com/ml/kawa/
came up, rather than sourceware,org, So it *is* being indexed - just under
the wrong hostname.
--
--Per Bothner
per@bothner.com http://per.bothner.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-11 17:59 ` Tom Tromey
2010-11-11 18:06 ` Per Bothner
@ 2010-11-11 18:09 ` Frank Ch. Eigler
2010-11-11 19:50 ` Jonathan Larmour
1 sibling, 1 reply; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-11-11 18:09 UTC (permalink / raw)
To: Tom Tromey; +Cc: Jonathan Larmour, overseers
Hi -
> Frank> They generally are indexed (not included in robots.txt). Can you give
> Frank> an example of what you see missing?
>
> I've seen this too.
> Almost any search that I would expect to hit on sourceware.org instead
> pulls up results from elsewhere, often cygwin.ru.
google has funny heuristics about which copy of a mirror to present for
any given query.
> E.g., search for "systemtap signedness roland" on Google.
> This shows cygwin.ru, nabble.com, but not sourceware.
> Now add "site:sourceware.org" -- I see no hits.
OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
we get hits just fine. So whatever the problem is, it's not as simple
as it being blocked. It's more about freshness or crawling rate or
something.
- FChE
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-11 18:09 ` Frank Ch. Eigler
@ 2010-11-11 19:50 ` Jonathan Larmour
2010-11-11 20:28 ` Christopher Faylor
0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Larmour @ 2010-11-11 19:50 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: overseers
On 11/11/10 18:09, Frank Ch. Eigler wrote:
> Hi -
>
>> Frank> They generally are indexed (not included in robots.txt). Can you give
>> Frank> an example of what you see missing?
>>
>> I've seen this too.
>> Almost any search that I would expect to hit on sourceware.org instead
>> pulls up results from elsewhere, often cygwin.ru.
>
> google has funny heuristics about which copy of a mirror to present for
> any given query.
It seems like this is indeed the origin of my belief too. If I look much
later in search results I do eventually see sourceware aliases crop up.
For example a google for "gdb cortex registers" does eventually show a
result straight from sourceware, but only on p.10.
>> E.g., search for "systemtap signedness roland" on Google.
>> This shows cygwin.ru, nabble.com, but not sourceware.
>> Now add "site:sourceware.org" -- I see no hits.
>
> OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
> we get hits just fine. So whatever the problem is, it's not as simple
> as it being blocked. It's more about freshness or crawling rate or
> something.
Perhaps it's the presence of all the site aliases? I vaguely recall that
google lowers the rank of sites that have aliases pointing to the same
pages - that's a ploy that people have done in the past to try and improve
their search ranking. Maybe cygwin.com should only have links to the
cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on?
Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: sourceware on search engines
2010-11-11 19:50 ` Jonathan Larmour
@ 2010-11-11 20:28 ` Christopher Faylor
0 siblings, 0 replies; 7+ messages in thread
From: Christopher Faylor @ 2010-11-11 20:28 UTC (permalink / raw)
To: Frank Ch. Eigler, overseers, Jonathan Larmour
On Thu, Nov 11, 2010 at 07:50:34PM +0000, Jonathan Larmour wrote:
>On 11/11/10 18:09, Frank Ch. Eigler wrote:
>> Hi -
>>
>>> Frank> They generally are indexed (not included in robots.txt). Can you give
>>> Frank> an example of what you see missing?
>>>
>>> I've seen this too.
>>> Almost any search that I would expect to hit on sourceware.org instead
>>> pulls up results from elsewhere, often cygwin.ru.
>>
>> google has funny heuristics about which copy of a mirror to present for
>> any given query.
>
>It seems like this is indeed the origin of my belief too. If I look much
>later in search results I do eventually see sourceware aliases crop up.
>For example a google for "gdb cortex registers" does eventually show a
>result straight from sourceware, but only on p.10.
>
>>> E.g., search for "systemtap signedness roland" on Google.
>>> This shows cygwin.ru, nabble.com, but not sourceware.
>>> Now add "site:sourceware.org" -- I see no hits.
>>
>> OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
>> we get hits just fine. So whatever the problem is, it's not as simple
>> as it being blocked. It's more about freshness or crawling rate or
>> something.
>
>Perhaps it's the presence of all the site aliases? I vaguely recall that
>google lowers the rank of sites that have aliases pointing to the same
>pages - that's a ploy that people have done in the past to try and improve
>their search ranking. Maybe cygwin.com should only have links to the
>cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on?
And don't forget: sources.redhat.com.
http://google.com/search?q=systemtap+signedness+roland+site:sources.redhat.com
cgf
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-11-11 20:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-10 7:16 sourceware on search engines Jonathan Larmour
2010-11-10 14:21 ` Frank Ch. Eigler
2010-11-11 17:59 ` Tom Tromey
2010-11-11 18:06 ` Per Bothner
2010-11-11 18:09 ` Frank Ch. Eigler
2010-11-11 19:50 ` Jonathan Larmour
2010-11-11 20:28 ` Christopher Faylor
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).