From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 350 invoked from network); 2 Nov 2002 21:55:35 -0000 Received: from unknown (HELO monty-python.gnu.org) (199.232.76.173) by sources.redhat.com with SMTP; 2 Nov 2002 21:55:35 -0000 Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 18868g-0001t2-00; Sat, 02 Nov 2002 16:49:06 -0500 Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 1885vM-0004qa-00 for help-gnats@gnu.org; Sat, 02 Nov 2002 16:35:20 -0500 Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 1885vF-0004nZ-00 for help-gnats@gnu.org; Sat, 02 Nov 2002 16:35:18 -0500 Received: from cluster1.netman.dk ([193.88.72.47]) by monty-python.gnu.org with esmtp (Exim 4.10) id 1885vD-0004jF-00 for help-gnats@gnu.org; Sat, 02 Nov 2002 16:35:12 -0500 Received: (from lh@localhost) by cluster1.netman.dk (8.11.4/8.11.4) id gA2LZ5k672417; Sat, 2 Nov 2002 22:35:05 +0100 (MET) From: Lars Henriksen To: help-gnats@gnu.org Cc: Yngve Svendsen , Mel Hatzis , Milan Zamazal Subject: Subject header matching--once again Message-ID: <20021102213505.GB646077@cluster1.netman.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i Sender: help-gnats-admin@gnu.org Errors-To: help-gnats-admin@gnu.org X-BeenThere: help-gnats@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: General discussion about GNU GNATS List-Archive: Date: Sun, 03 Nov 2002 21:40:00 -0000 X-SW-Source: 2002-q4/txt/msg00038.txt.bz2 Hello, This is a story in four parts about how to recognize a PR in the Subject: header of an email sent to GNATS. PART I ====== Back in December 2001 there was a thread discussing Subject: matching, why it was so restrictive, how to loosen it and make it more useful. This is the mail that started the discussion: http://mail.gnu.org/pipermail/help-gnats/2001-December/002617.html Milan concluded: > Well, to summarize, I suggest the following: > > - Not to try to match "re", "fw", etc. and simply look for a substring > as stated below. > > - Accept "\ PR NUMBER is present in CATEGORY. > > - Accept "\ corresponds to an existing PR number. The second requirement was changed to: DB> Accept DB> "\ NUMBER is a valid PR number (in any category). It was also debated how to handle a Subject: with several matches (the infamous Subject: containing OS/2). Milan ended the thread: > The matching was adjusted according to our consensus, except that I > didn't bother with the matching cycle, only the first possible match is > considered. I haven't tested it, would you like to do it? PART II ======= In February, someone noticed that Subject: matching had stopped working. http://mail.gnu.org/pipermail/help-gnats/2002-February/002808.html Andrew Gray submitted a patch: http://mail.gnu.org/pipermail/help-gnats/2002-March/002831.html which fixed two bugs and made Subject: matching work again. But the fix was not committed and went unnoticed for a time (see PART III). PART III ======== In May, Mel Hatzis noticed the same problem and submitted a fix. After some email exchanges Andrew Gray's fix from March was committed so all should be well. But it wasn't (and it isn't). For one thing, the agreement from December is not in the GNATS documentation. This may be just as well because the code does not implement it (fully). The regular expression now used for matching a PR is: \<((PR[ \t/])\|([-a-z0-9_+.]+)/)([0-9]+) Here regex groups two and three are not used, and the expression may be simplified to: \<(PR[ \t/]\|[-a-z0-9_+.]+/)([0-9]+) The check for upper case PR is by appearance only because Subject: matching always ignores case (notice that the check for category name has no upper case letters). This point has never been raised before, but I think it should be. I believe that Subject: matching should be case sensitive. Not just to be able to check for PR, but simply because it is useful (see PART IV). Furthermore, PR1234 is not accepted. This form was explicitly mentioned in December as desirable. I would like to add PR#1234 and exclude PR/1234. My proposal is that the regular expression be changed to \<(PR[ \t#]?\|[-\w+.]+/)([0-9]+) and that the regex search be made case sensitive. This will pick up the first appearance of any of PR 1234, PR#1234, PR1234, category/1234 and Category/1234 anywhere in the Subject: header. PART IV ======= During the discussion in May, Mel Hatzis suggested that the regular expression be made configurable via dbconfig: http://mail.gnu.org/pipermail/help-gnats/2002-May/002901.html Everyone who uttered an opinion was in favour, and so am I. Mel submitted a patch that is still pending: http://mail.gnu.org/pipermail/help-gnats/2002-May/002930.html > You can now (optionally) include the following in the database-info > section of the dbconfig file: > > # The regular expression used to determine whether a PR is referenced > # on an email subject line > subject-matching { > "\\<((PR[ \t/])\\|([-a-z0-9_+.]+/))([0-9]+)" > capture-group "4" > } > > (The above example is exactly analogous to the built-in default) The regex group identified by capture-group must capture the PR number. The proposal is fine, but has a drawback that is worth a discussion. With the built-in default (the one discussed in parts I-III) both category, if present, and PR number are checked for validity. With a subject-matching entry in dbconfig, it is impossible to check the validity of a category: there is no way of checking the category since only the PR number is captured. Hence, it is not true that the example above, as stated, is exactly analogous to the built-in default. At least not by design. The last remark in the previous paragraph alludes to a bug in Mel's patch. The code extracts the PR number from the matching substring, but unconditionally checks for a preceding "PR" or category. The patch also makes gnatsd dump core if no subject-matching entry is present in dbconfig, but that is an easy fix. My proposal is to extend Mel's design by allowing the category name to be captured optionally and checked for validity. The dbconfig entry syntax could be: subject-matching { "regular_expression_with_groups" pr-group "integer" category-group "integer" } The first integer is the regex group containing the PR number, the second the regex group containing the category name or 0 (zero) if not used. The following entry is equivalent to the built-in default (with my amendments): subject-matching { "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" pr-group "3" category-group "2" } An entry that does not use category is: subject-matching { "\\