From: Christopher Faylor <cgf-use-the-mailinglist-please@sourceware.org>
To: "Frank Ch. Eigler" <fche@redhat.com>,
overseers@sourceware.org, Jonathan Larmour <jifl@jifvik.org>
Subject: Re: sourceware on search engines
Date: Thu, 11 Nov 2010 20:28:00 -0000 [thread overview]
Message-ID: <20101111202737.GA26142@ednor.casa.cgf.cx> (raw)
In-Reply-To: <4CDC490A.8080901@jifvik.org>
On Thu, Nov 11, 2010 at 07:50:34PM +0000, Jonathan Larmour wrote:
>On 11/11/10 18:09, Frank Ch. Eigler wrote:
>> Hi -
>>
>>> Frank> They generally are indexed (not included in robots.txt). Can you give
>>> Frank> an example of what you see missing?
>>>
>>> I've seen this too.
>>> Almost any search that I would expect to hit on sourceware.org instead
>>> pulls up results from elsewhere, often cygwin.ru.
>>
>> google has funny heuristics about which copy of a mirror to present for
>> any given query.
>
>It seems like this is indeed the origin of my belief too. If I look much
>later in search results I do eventually see sourceware aliases crop up.
>For example a google for "gdb cortex registers" does eventually show a
>result straight from sourceware, but only on p.10.
>
>>> E.g., search for "systemtap signedness roland" on Google.
>>> This shows cygwin.ru, nabble.com, but not sourceware.
>>> Now add "site:sourceware.org" -- I see no hits.
>>
>> OTOH, sourceware.org/ml/* is indexed by google, and for other queries,
>> we get hits just fine. So whatever the problem is, it's not as simple
>> as it being blocked. It's more about freshness or crawling rate or
>> something.
>
>Perhaps it's the presence of all the site aliases? I vaguely recall that
>google lowers the rank of sites that have aliases pointing to the same
>pages - that's a ploy that people have done in the past to try and improve
>their search ranking. Maybe cygwin.com should only have links to the
>cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on?
And don't forget: sources.redhat.com.
http://google.com/search?q=systemtap+signedness+roland+site:sources.redhat.com
cgf
prev parent reply other threads:[~2010-11-11 20:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-10 7:16 Jonathan Larmour
2010-11-10 14:21 ` Frank Ch. Eigler
2010-11-11 17:59 ` Tom Tromey
2010-11-11 18:06 ` Per Bothner
2010-11-11 18:09 ` Frank Ch. Eigler
2010-11-11 19:50 ` Jonathan Larmour
2010-11-11 20:28 ` Christopher Faylor [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101111202737.GA26142@ednor.casa.cgf.cx \
--to=cgf-use-the-mailinglist-please@sourceware.org \
--cc=fche@redhat.com \
--cc=jifl@jifvik.org \
--cc=overseers@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).