public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* Conf changes to htdig sourceware side
@ 2005-02-13 20:19 Hans-Peter Nilsson
  2005-02-13 22:20 ` Jonathan Larmour
  0 siblings, 1 reply; 4+ messages in thread
From: Hans-Peter Nilsson @ 2005-02-13 20:19 UTC (permalink / raw)
  To: overseers

Properly marking sourceware.org as the canonical name, with
sources.redhat.com and www.sourceware.org as aliases.
Committed.

This should avoid excluding stuff only referenced as
"sourceware.org" and also avoid http redirects for the parts
that are only reached by http (as opposed to files).

(Whether or not sourceware.org is actually supposed to be the
canonical name is here of less importance than what DNS thinks.)

Index: sourceware.conf
===================================================================
RCS file: /cvs/sourceware/infra/htdig-conf/sourceware.conf,v
retrieving revision 1.19
diff -p -c -u -p -r1.19 sourceware.conf
cvs diff: conflicting specifications of output style
--- sourceware.conf	20 Oct 2003 20:25:01 -0000	1.19
+++ sourceware.conf	11 Feb 2005 21:11:29 -0000
@@ -33,12 +33,13 @@ exclude_too_long_words: true
 # start_url:	       `${common_dir}/start.url`
 #
 # Keep the included file-path in sync with what's in htupdate-sourceware.sh.
-start_url:		http://sources.redhat.com/ \
+start_url:		http://sourceware.org/ \
            `${database_dir}/noindex-follow-urls`

 # The old hostname (left side) is here changed to the canonical hostname
 # (right side), to avoid a loop of redirects.
-server_aliases:		sourceware.cygnus.com=sources.redhat.com
+server_aliases:		sourceware.cygnus.com=sourceware.org \
+	sources.redhat.com=sourceware.org www.sourceware.org=sourceware.org

 #
 # This attribute limits the scope of the indexing process.  The default is to
@@ -52,8 +53,8 @@ server_aliases:		sourceware.cygnus.com=s
 #
 # Unless we set "limit_normalized", we need this to include all hosts
 # that may be canonicalized into those we are interested in.
-limit_urls_to:		${start_url} http://sourceware.cygnus.com/
-
+limit_urls_to:		${start_url} http://sourceware.cygnus.com/ \
+	http://sources.redhat.com/ http://www.sourceware.org/

 #
 # If there are particular pages that you definately do NOT want to index, you
@@ -131,15 +132,15 @@ exclude_urls:		${site__exclude_urls} ${r
 # This is parsed by generate-htdig-include-list.sh.  Make sure it stays
 # parseable: do not refer to variables and make sure trailing slashes are
 # in place.
-local_urls: http://sources.redhat.com/ml/=/www/sourceware/ml/ \
-            http://sources.redhat.com/=/www/sourceware/htdocs/
+local_urls: http://sourceware.org/ml/=/www/sourceware/ml/ \
+            http://sourceware.org/=/www/sourceware/htdocs/

 # Include one instance of the old base, sourceware.cygnus.com.  Quite a
 # few URLs contain it.  If you change this without indexing from scratch,
 # the tokens for the old parts will be mapped to the new parts.
-common_url_parts: http:// http://sources.redhat.com/ml \
-                  http://sources.redhat.com ftp://sources.redhat.com/pub\
-                  http://sources.redhat.com/ml/cygwin/199 \
+common_url_parts: http:// http://sourceware.org/ml \
+                  http://sourceware.org ftp://sourceware.org/pub\
+                  http://sourceware.org/ml/cygwin/199 \
                   http://www. ftp:// ftp://ftp. /pub/ .html .png .jpg .jpeg \
                   /index.html /index.htm .com/ .com mailto: \
                   sourceware.cygnus.com

brgds, H-P

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Conf changes to htdig sourceware side
  2005-02-13 20:19 Conf changes to htdig sourceware side Hans-Peter Nilsson
@ 2005-02-13 22:20 ` Jonathan Larmour
  2005-02-14  3:21   ` Hans-Peter Nilsson
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Larmour @ 2005-02-13 22:20 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

Hans-Peter Nilsson wrote:
> Properly marking sourceware.org as the canonical name, with
> sources.redhat.com and www.sourceware.org as aliases.

What happens about ecos.sourceware.org and www.cygwin.com? Sorry to be a 
pain :-)

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
Visit us at Embedded World 2005, Nürnberg, Germany, 22-24 Feb, Stand 11-124
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Conf changes to htdig sourceware side
  2005-02-13 22:20 ` Jonathan Larmour
@ 2005-02-14  3:21   ` Hans-Peter Nilsson
  2005-02-14  5:01     ` Jonathan Larmour
  0 siblings, 1 reply; 4+ messages in thread
From: Hans-Peter Nilsson @ 2005-02-14  3:21 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: overseers

On Sat, 12 Feb 2005, Jonathan Larmour wrote:

> Hans-Peter Nilsson wrote:
> > Properly marking sourceware.org as the canonical name, with
> > sources.redhat.com and www.sourceware.org as aliases.
>
> What happens about ecos.sourceware.org and www.cygwin.com? Sorry to be a
> pain :-)

(and cygwin.com; without the www subdomain)

Such URLs aren't followed.  Just like before.  This matters for
scoring and if there's no other link (relative links somehow
rooted at sourceware.org/ work of course).

To fix this better than adding domain names from people's
memory, I'd like a list of all the DNS aliases or a pointer to
such a list.  It doesn't count if the contents of paths are
different (if I can't access /ml/binutils then it's not an alias
in this sense) so e.g. gcc.gnu.org isn't an alias of
sourceware.org in this sense.

This seems to matter for mnogosearch as well.  IIUC it doesn't
have the concept of a server alias unfortunately; but no doubt
fixable with a small amount of hacking. ;-)

brgds, H-P

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Conf changes to htdig sourceware side
  2005-02-14  3:21   ` Hans-Peter Nilsson
@ 2005-02-14  5:01     ` Jonathan Larmour
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Larmour @ 2005-02-14  5:01 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

Hans-Peter Nilsson wrote:
>
> To fix this better than adding domain names from people's
> memory, I'd like a list of all the DNS aliases or a pointer to
> such a list.  It doesn't count if the contents of paths are
> different (if I can't access /ml/binutils then it's not an alias
> in this sense) so e.g. gcc.gnu.org isn't an alias of
> sourceware.org in this sense.

I don't know about any others, but in that case ecos.sourceware.org doesn't 
count either (/ml/* works, but no others would).

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
Visit us at Embedded World 2005, Nürnberg, Germany, 22-24 Feb, Stand 11-124
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-02-12  1:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-13 20:19 Conf changes to htdig sourceware side Hans-Peter Nilsson
2005-02-13 22:20 ` Jonathan Larmour
2005-02-14  3:21   ` Hans-Peter Nilsson
2005-02-14  5:01     ` Jonathan Larmour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).