From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36203 invoked by alias); 6 Sep 2018 06:01:33 -0000 Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org Received: (qmail 36160 invoked by uid 89); 6 Sep 2018 06:01:31 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=frank, Frank, listing, repeatedly X-HELO: ICGRIDDB04.SEAS.upenn.edu Received: from ICGRIDDB04.SEAS.UPENN.EDU (HELO ICGRIDDB04.SEAS.upenn.edu) (158.130.57.72) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 06 Sep 2018 06:01:28 +0000 Received: from [10.13.0.14] (helo=ofb.net) by ICGRIDDB04.SEAS.upenn.edu with esmtp (Exim 4.89) (envelope-from ) id 1fxnM3-00015F-SC; Thu, 06 Sep 2018 02:01:24 -0400 Received: from localhost.localdomain (unknown [173.239.75.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ofb.net (Postfix) with ESMTPSA id A40A33E835; Wed, 5 Sep 2018 23:01:18 -0700 (PDT) Received: from frederik by localhost.localdomain with local (Exim 4.91) (envelope-from ) id 1fxnLx-0002Jj-Nt; Wed, 05 Sep 2018 23:01:17 -0700 Date: Thu, 06 Sep 2018 06:01:00 -0000 From: frederik@ofb.net To: "Frank Ch. Eigler" Cc: overseers@sourceware.org, joseph@codesourcery.com Subject: Re: bugs not showing up on Google Message-ID: <20180906060117.GJ27595@ofb.net> Reply-To: frederik@ofb.net References: <20180829173439.GU8901@ofb.net> <20180829234015.GC2249929@elastic.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180829234015.GC2249929@elastic.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: X-SW-Source: 2018-q3/txt/msg00022.txt.bz2 > > joseph@codesourcery.com suggested that I email you about my > > observation that most of your bugs are not showing up on Google. > > [...] > > I don't know about "most"; undoubtedly many appear and some do not. > It may be relevant that we have had to throttle googlebot from > full access to the sourceware web servers because it was repeatedly > found ignoring robots.txt and saturating the server with traffic. > So we have reluctantly slowed its access down. I expect it to > get around to all the bugzilla entries over time, just maybe not as > fast as you expect. Thanks Frank for your reply. The entry I was looking at was over a year old. I don't know what you mean by "over time" but I would consider that too long. Also I don't think it would take that long for even a throttled Googlebot to crawl your site. I'm not sure how a crawler is supposed to see all the bugs, is there a way of listing them all without going through a search form? Apparently there are ways to enforce robots.txt using mod_rewrite: as long as Googlebot doesn't change its user agent, I think you can more or less easily prevent it from accessing a given URL: https://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/comment-page-4/ That seems easier to me than QoS tuning. Even better would be if we could report bugs to Google but ... yeah. For me it's always been a Wall of Silence. By the way, I couldn't find a public archive of this mailing list, should we be discussing this on Bugzilla in case other Bugzilla maintainers want to benefit from your experience? https://sourceware.org/bugzilla/show_bug.cgi?id=23581 Maybe I can paste these messages into a comment on that bug and then add overseers to the Cc list? Or am I tripping and no one cares? Thanks, Frederick