From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13409 invoked from network); 4 Nov 2002 05:40:10 -0000 Received: from unknown (HELO monty-python.gnu.org) (199.232.76.173) by sources.redhat.com with SMTP; 4 Nov 2002 05:40:10 -0000 Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 188ZrV-0001IZ-00; Mon, 04 Nov 2002 00:33:21 -0500 Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 188ZeT-0002gL-00 for help-gnats@gnu.org; Mon, 04 Nov 2002 00:19:53 -0500 Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 188ZeQ-0002fy-00 for help-gnats@gnu.org; Mon, 04 Nov 2002 00:19:53 -0500 Received: from natint.juniper.net ([207.17.136.129] helo=merlot.juniper.net) by monty-python.gnu.org with esmtp (Exim 4.10) id 188ZeQ-0002fo-00 for help-gnats@gnu.org; Mon, 04 Nov 2002 00:19:50 -0500 Received: from juniper.net (ssh.juniper.net [207.17.136.39]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id gA45JZm84750; Sun, 3 Nov 2002 21:19:35 -0800 (PST) (envelope-from hatzis@juniper.net) Message-ID: <3DC602D4.1070706@juniper.net> From: Mel Hatzis Reply-To: hatzis@juniper.net User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lars Henriksen CC: help-gnats@gnu.org, Yngve Svendsen , Milan Zamazal Subject: Re: Subject header matching--once again References: <20021102213505.GB646077@cluster1.netman.dk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: help-gnats-admin@gnu.org Errors-To: help-gnats-admin@gnu.org X-BeenThere: help-gnats@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: General discussion about GNU GNATS List-Archive: Date: Mon, 04 Nov 2002 11:31:00 -0000 X-SW-Source: 2002-q4/txt/msg00039.txt.bz2 Nice summary....more comments below. On 11/02/2002 01:35 PM, Lars Henriksen submitted: >Hello, > >This is a story in four parts about how to recognize a PR in the Subject: header >of an email sent to GNATS. > >PART I >====== >Back in December 2001 there was a thread discussing Subject: matching, why it >was so restrictive, how to loosen it and make it more useful. This is the mail >that started the discussion: > >http://mail.gnu.org/pipermail/help-gnats/2001-December/002617.html > >Milan concluded: > > > >>Well, to summarize, I suggest the following: >> >>- Not to try to match "re", "fw", etc. and simply look for a substring >> as stated below. >> >>- Accept "\> PR NUMBER is present in CATEGORY. >> >>- Accept "\> corresponds to an existing PR number. >> >> > >The second requirement was changed to: > >DB> Accept >DB> "\DB> NUMBER is a valid PR number (in any category). > >It was also debated how to handle a Subject: with several matches (the >infamous Subject: containing OS/2). Milan ended the thread: > > > >>The matching was adjusted according to our consensus, except that I >>didn't bother with the matching cycle, only the first possible match is >>considered. I haven't tested it, would you like to do it? >> >> > >PART II >======= >In February, someone noticed that Subject: matching had stopped working. > >http://mail.gnu.org/pipermail/help-gnats/2002-February/002808.html > >Andrew Gray submitted a patch: > >http://mail.gnu.org/pipermail/help-gnats/2002-March/002831.html > >which fixed two bugs and made Subject: matching work again. But the fix was >not committed and went unnoticed for a time (see PART III). > >PART III >======== >In May, Mel Hatzis noticed the same problem and submitted a fix. After some >email exchanges Andrew Gray's fix from March was committed so all should be >well. But it wasn't (and it isn't). > >For one thing, the agreement from December is not in the GNATS documentation. >This may be just as well because the code does not implement it (fully). The >regular expression now used for matching a PR is: > > \<((PR[ \t/])\|([-a-z0-9_+.]+)/)([0-9]+) > >Here regex groups two and three are not used, and the expression may be >simplified to: > > \<(PR[ \t/]\|[-a-z0-9_+.]+/)([0-9]+ > >The check for upper case PR is by appearance only because Subject: matching >always ignores case (notice that the check for category name has no upper >case letters). This point has never been raised before, but I think it should >be. I believe that Subject: matching should be case sensitive. Not just to be >able to check for PR, but simply because it is useful (see PART IV). > >Furthermore, PR1234 is not accepted. This form was explicitly mentioned in >December as desirable. I would like to add PR#1234 and exclude PR/1234. > >My proposal is that the regular expression be changed to > > \<(PR[ \t#]?\|[-\w+.]+/)([0-9]+) > >and that the regex search be made case sensitive. > >This will pick up the first appearance of any of PR 1234, PR#1234, PR1234, >category/1234 and Category/1234 anywhere in the Subject: header. > >PART IV >======= >During the discussion in May, Mel Hatzis suggested that the regular expression >be made configurable via dbconfig: > >http://mail.gnu.org/pipermail/help-gnats/2002-May/002901.html > >Everyone who uttered an opinion was in favour, and so am I. Mel submitted a >patch that is still pending: > >http://mail.gnu.org/pipermail/help-gnats/2002-May/002930.html > > > >>You can now (optionally) include the following in the database-info >>section of the dbconfig file: >> >> # The regular expression used to determine whether a PR is referenced >> # on an email subject line >> subject-matching { >> "\\<((PR[ \t/])\\|([-a-z0-9_+.]+/))([0-9]+)" >> capture-group "4" >> } >> >>(The above example is exactly analogous to the built-in default) >> >> > >The regex group identified by capture-group must capture the PR number. > >The proposal is fine, but has a drawback that is worth a discussion. With the >built-in default (the one discussed in parts I-III) both category, if present, >and PR number are checked for validity. With a subject-matching entry in >dbconfig, it is impossible to check the validity of a category: there is no way >of checking the category since only the PR number is captured. Hence, it >is not true that the example above, as stated, is exactly analogous to the >built-in default. At least not by design. > Fair enough...your point is well taken. > >The last remark in the previous paragraph alludes to a bug in Mel's patch. The >code extracts the PR number from the matching substring, but unconditionally >checks for a preceding "PR" or category. The patch also makes gnatsd dump core >if no subject-matching entry is present in dbconfig, but that is an easy fix. > Regarding the unconditional check for a preceding "PR" or category, this is what is currently implemented - I still assert that the example I provided is analogous to the current built-in default (except of course, that the category is not verified as you state above). > >My proposal is to extend Mel's design by allowing the category name to be >captured optionally and checked for validity. > >The dbconfig entry syntax could be: > > subject-matching { > "regular_expression_with_groups" > pr-group "integer" > category-group "integer" > } > >The first integer is the regex group containing the PR number, the second the >regex group containing the category name or 0 (zero) if not used. The following >entry is equivalent to the built-in default (with my amendments): > > subject-matching { > "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" > pr-group "3" > category-group "2" > } > >An entry that does not use category is: > > subject-matching { > "\\ pr-group "1" > category-group "0" > } > Building on your proposal, I suggest that it'd be even better if an array of capture groups could be specified, each associated with a field name. This would allow for fields other than 'category' on the subject line. This could take the following form: subject-matching { "\\It should also be decided which syntax bits to use for Subject: matching. >At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR >would avoid the need to escape the alternation operator. Milan suggested >using the same syntax bits as the rest of gnats: > > (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE > >but these are an issue in their own right, and this email is already becoming >too long. > I agree with Milan that we should be consistent....though like you, I find these syntax flags questionable. I was going to test my patch against these syntax flags, and resubmit, but I'll hold off a while and see what comes of this email thread. -- Mel Hatzis Juniper Networks, Inc. _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats