From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30119 invoked by alias); 11 Nov 2010 19:50:59 -0000 Received: (qmail 30110 invoked by uid 22791); 11 Nov 2010 19:50:58 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,FSL_RU_URL X-Spam-Check-By: sourceware.org Received: from virtual.bogons.net (HELO virtual.bogons.net) (193.178.223.136) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 11 Nov 2010 19:50:52 +0000 Received: from jifvik.dyndns.org (jifvik.dyndns.org [85.158.45.40]) by virtual.bogons.net (8.10.2+Sun/8.11.2) with ESMTP id oABJobW14946; Thu, 11 Nov 2010 19:50:37 GMT Received: from [192.168.7.9] (unknown [78.32.57.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by jifvik.dyndns.org (Postfix) with ESMTP id 631603FE1; Thu, 11 Nov 2010 19:50:36 +0000 (GMT) Message-ID: <4CDC490A.8080901@jifvik.org> Date: Thu, 11 Nov 2010 19:50:00 -0000 From: Jonathan Larmour User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Fedora/3.0.10-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.10 MIME-Version: 1.0 To: "Frank Ch. Eigler" Cc: overseers@sourceware.org Subject: Re: sourceware on search engines References: <4CDA46DC.50709@jifvik.org> <20101110142102.GF26790@redhat.com> <20101111180921.GL26790@redhat.com> In-Reply-To: <20101111180921.GL26790@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2010-q4/txt/msg00039.txt.bz2 On 11/11/10 18:09, Frank Ch. Eigler wrote: > Hi - > >> Frank> They generally are indexed (not included in robots.txt). Can you give >> Frank> an example of what you see missing? >> >> I've seen this too. >> Almost any search that I would expect to hit on sourceware.org instead >> pulls up results from elsewhere, often cygwin.ru. > > google has funny heuristics about which copy of a mirror to present for > any given query. It seems like this is indeed the origin of my belief too. If I look much later in search results I do eventually see sourceware aliases crop up. For example a google for "gdb cortex registers" does eventually show a result straight from sourceware, but only on p.10. >> E.g., search for "systemtap signedness roland" on Google. >> This shows cygwin.ru, nabble.com, but not sourceware. >> Now add "site:sourceware.org" -- I see no hits. > > OTOH, sourceware.org/ml/* is indexed by google, and for other queries, > we get hits just fine. So whatever the problem is, it's not as simple > as it being blocked. It's more about freshness or crawling rate or > something. Perhaps it's the presence of all the site aliases? I vaguely recall that google lowers the rank of sites that have aliases pointing to the same pages - that's a ploy that people have done in the past to try and improve their search ranking. Maybe cygwin.com should only have links to the cygwin* mailing lists, ecos.sourceware.org to the ecos* lists, and so on? Jifl -- --["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine