From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21214 invoked by alias); 11 Nov 2010 18:09:39 -0000 Received: (qmail 21106 invoked by uid 22791); 11 Nov 2010 18:09:38 -0000 X-SWARE-Spam-Status: No, hits=-5.8 required=5.0 tests=AWL,BAYES_00,FSL_RU_URL,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 11 Nov 2010 18:09:34 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id oABI9Mq0007631 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 11 Nov 2010 13:09:22 -0500 Received: from fche.csb (vpn-229-149.phx2.redhat.com [10.3.229.149]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id oABI9LkD021054; Thu, 11 Nov 2010 13:09:22 -0500 Received: by fche.csb (Postfix, from userid 2569) id 4E28F58548; Thu, 11 Nov 2010 13:09:21 -0500 (EST) Date: Thu, 11 Nov 2010 18:09:00 -0000 From: "Frank Ch. Eigler" To: Tom Tromey Cc: Jonathan Larmour , overseers@sourceware.org Subject: Re: sourceware on search engines Message-ID: <20101111180921.GL26790@redhat.com> References: <4CDA46DC.50709@jifvik.org> <20101110142102.GF26790@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.2i Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2010-q4/txt/msg00038.txt.bz2 Hi - > Frank> They generally are indexed (not included in robots.txt). Can you give > Frank> an example of what you see missing? > > I've seen this too. > Almost any search that I would expect to hit on sourceware.org instead > pulls up results from elsewhere, often cygwin.ru. google has funny heuristics about which copy of a mirror to present for any given query. > E.g., search for "systemtap signedness roland" on Google. > This shows cygwin.ru, nabble.com, but not sourceware. > Now add "site:sourceware.org" -- I see no hits. OTOH, sourceware.org/ml/* is indexed by google, and for other queries, we get hits just fine. So whatever the problem is, it's not as simple as it being blocked. It's more about freshness or crawling rate or something. - FChE