From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12148 invoked by alias); 11 Nov 2010 20:28:06 -0000 Received: (qmail 11361 invoked by uid 22791); 11 Nov 2010 20:27:45 -0000 X-Spam-Check-By: sourceware.org Received: from pool-173-76-56-137.bstnma.fios.verizon.net (HELO cgf.cx) (173.76.56.137) by sourceware.org (qpsmtpd/0.83/v0.83-20-g38e4449) with ESMTP; Thu, 11 Nov 2010 20:27:40 +0000 Received: from ednor.cgf.cx (ednor.casa.cgf.cx [192.168.187.5]) by cgf.cx (Postfix) with ESMTP id 10F7C13C061; Thu, 11 Nov 2010 15:27:38 -0500 (EST) Received: by ednor.cgf.cx (Postfix, from userid 201) id 054C02B352; Thu, 11 Nov 2010 15:27:37 -0500 (EST) Date: Thu, 11 Nov 2010 20:28:00 -0000 From: Christopher Faylor To: "Frank Ch. Eigler" , overseers@sourceware.org, Jonathan Larmour Subject: Re: sourceware on search engines Message-ID: <20101111202737.GA26142@ednor.casa.cgf.cx> Mail-Followup-To: "Frank Ch. Eigler" , overseers@sourceware.org, Jonathan Larmour References: <4CDA46DC.50709@jifvik.org> <20101110142102.GF26790@redhat.com> <20101111180921.GL26790@redhat.com> <4CDC490A.8080901@jifvik.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CDC490A.8080901@jifvik.org> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2010-q4/txt/msg00040.txt.bz2 On Thu, Nov 11, 2010 at 07:50:34PM +0000, Jonathan Larmour wrote: >On 11/11/10 18:09, Frank Ch. Eigler wrote: >> Hi - >> >>> Frank> They generally are indexed (not included in robots.txt). Can you give >>> Frank> an example of what you see missing? >>> >>> I've seen this too. >>> Almost any search that I would expect to hit on sourceware.org instead >>> pulls up results from elsewhere, often cygwin.ru. >> >> google has funny heuristics about which copy of a mirror to present for >> any given query. > >It seems like this is indeed the origin of my belief too. If I look much >later in search results I do eventually see sourceware aliases crop up. >For example a google for "gdb cortex registers" does eventually show a >result straight from sourceware, but only on p.10. > >>> E.g., search for "systemtap signedness roland" on Google. >>> This shows cygwin.ru, nabble.com, but not sourceware. >>> Now add "site:sourceware.org" -- I see no hits. >> >> OTOH, sourceware.org/ml/* is indexed by google, and for other queries, >> we get hits just fine. So whatever the problem is, it's not as simple >> as it being blocked. It's more about freshness or crawling rate or >> something. > >Perhaps it's the presence of all the site aliases? I vaguely recall that >google lowers the rank of sites that have aliases pointing to the same >pages - that's a ploy that people have done in the past to try and improve >their search ranking. Maybe cygwin.com should only have links to the >cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on? And don't forget: sources.redhat.com. http://google.com/search?q=systemtap+signedness+roland+site:sources.redhat.com cgf