From: Mel Hatzis <hatzis@juniper.net>
To: Lars Henriksen <Lars.Henriksen@netman.dk>
Cc: help-gnats@gnu.org, Yngve Svendsen <yngve.svendsen@sun.com>,
Milan Zamazal <pdm@zamazal.org>
Subject: Re: Subject header matching--once again
Date: Mon, 04 Nov 2002 11:31:00 -0000 [thread overview]
Message-ID: <3DC602D4.1070706@juniper.net> (raw)
In-Reply-To: <20021102213505.GB646077@cluster1.netman.dk>
Nice summary....more comments below.
On 11/02/2002 01:35 PM, Lars Henriksen submitted:
>Hello,
>
>This is a story in four parts about how to recognize a PR in the Subject: header
>of an email sent to GNATS.
>
>PART I
>======
>Back in December 2001 there was a thread discussing Subject: matching, why it
>was so restrictive, how to loosen it and make it more useful. This is the mail
>that started the discussion:
>
>http://mail.gnu.org/pipermail/help-gnats/2001-December/002617.html
>
>Milan concluded:
>
>
>
>>Well, to summarize, I suggest the following:
>>
>>- Not to try to match "re", "fw", etc. and simply look for a substring
>> as stated below.
>>
>>- Accept "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and
>> PR NUMBER is present in CATEGORY.
>>
>>- Accept "\<PR[ \t/]*NUMBER" where "PR" must be all capitals and NUMBER
>> corresponds to an existing PR number.
>>
>>
>
>The second requirement was changed to:
>
>DB> Accept
>DB> "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and
>DB> NUMBER is a valid PR number (in any category).
>
>It was also debated how to handle a Subject: with several matches (the
>infamous Subject: containing OS/2). Milan ended the thread:
>
>
>
>>The matching was adjusted according to our consensus, except that I
>>didn't bother with the matching cycle, only the first possible match is
>>considered. I haven't tested it, would you like to do it?
>>
>>
>
>PART II
>=======
>In February, someone noticed that Subject: matching had stopped working.
>
>http://mail.gnu.org/pipermail/help-gnats/2002-February/002808.html
>
>Andrew Gray submitted a patch:
>
>http://mail.gnu.org/pipermail/help-gnats/2002-March/002831.html
>
>which fixed two bugs and made Subject: matching work again. But the fix was
>not committed and went unnoticed for a time (see PART III).
>
>PART III
>========
>In May, Mel Hatzis noticed the same problem and submitted a fix. After some
>email exchanges Andrew Gray's fix from March was committed so all should be
>well. But it wasn't (and it isn't).
>
>For one thing, the agreement from December is not in the GNATS documentation.
>This may be just as well because the code does not implement it (fully). The
>regular expression now used for matching a PR is:
>
> \<((PR[ \t/])\|([-a-z0-9_+.]+)/)([0-9]+)
>
>Here regex groups two and three are not used, and the expression may be
>simplified to:
>
> \<(PR[ \t/]\|[-a-z0-9_+.]+/)([0-9]+
>
>The check for upper case PR is by appearance only because Subject: matching
>always ignores case (notice that the check for category name has no upper
>case letters). This point has never been raised before, but I think it should
>be. I believe that Subject: matching should be case sensitive. Not just to be
>able to check for PR, but simply because it is useful (see PART IV).
>
>Furthermore, PR1234 is not accepted. This form was explicitly mentioned in
>December as desirable. I would like to add PR#1234 and exclude PR/1234.
>
>My proposal is that the regular expression be changed to
>
> \<(PR[ \t#]?\|[-\w+.]+/)([0-9]+)
>
>and that the regex search be made case sensitive.
>
>This will pick up the first appearance of any of PR 1234, PR#1234, PR1234,
>category/1234 and Category/1234 anywhere in the Subject: header.
>
>PART IV
>=======
>During the discussion in May, Mel Hatzis suggested that the regular expression
>be made configurable via dbconfig:
>
>http://mail.gnu.org/pipermail/help-gnats/2002-May/002901.html
>
>Everyone who uttered an opinion was in favour, and so am I. Mel submitted a
>patch that is still pending:
>
>http://mail.gnu.org/pipermail/help-gnats/2002-May/002930.html
>
>
>
>>You can now (optionally) include the following in the database-info
>>section of the dbconfig file:
>>
>> # The regular expression used to determine whether a PR is referenced
>> # on an email subject line
>> subject-matching {
>> "\\<((PR[ \t/])\\|([-a-z0-9_+.]+/))([0-9]+)"
>> capture-group "4"
>> }
>>
>>(The above example is exactly analogous to the built-in default)
>>
>>
>
>The regex group identified by capture-group must capture the PR number.
>
>The proposal is fine, but has a drawback that is worth a discussion. With the
>built-in default (the one discussed in parts I-III) both category, if present,
>and PR number are checked for validity. With a subject-matching entry in
>dbconfig, it is impossible to check the validity of a category: there is no way
>of checking the category since only the PR number is captured. Hence, it
>is not true that the example above, as stated, is exactly analogous to the
>built-in default. At least not by design.
>
Fair enough...your point is well taken.
>
>The last remark in the previous paragraph alludes to a bug in Mel's patch. The
>code extracts the PR number from the matching substring, but unconditionally
>checks for a preceding "PR" or category. The patch also makes gnatsd dump core
>if no subject-matching entry is present in dbconfig, but that is an easy fix.
>
Regarding the unconditional check for a preceding "PR" or category,
this is what is currently implemented - I still assert that the example
I provided is analogous to the current built-in default (except of course,
that the category is not verified as you state above).
>
>My proposal is to extend Mel's design by allowing the category name to be
>captured optionally and checked for validity.
>
>The dbconfig entry syntax could be:
>
> subject-matching {
> "regular_expression_with_groups"
> pr-group "integer"
> category-group "integer"
> }
>
>The first integer is the regex group containing the PR number, the second the
>regex group containing the category name or 0 (zero) if not used. The following
>entry is equivalent to the built-in default (with my amendments):
>
> subject-matching {
> "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)"
> pr-group "3"
> category-group "2"
> }
>
>An entry that does not use category is:
>
> subject-matching {
> "\\<PR[ \t#]?([0-9]+)"
> pr-group "1"
> category-group "0"
> }
>
Building on your proposal, I suggest that it'd be even better if an
array of capture groups could be specified, each associated with a
field name. This would allow for fields other than 'category' on the
subject line.
This could take the following form:
subject-matching {
"\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)"
captured-fields {
"Number" "Synopsis"
}
}
The example above would match subject lines of the form:
"PR 333 : missing subject-matching clause causes gnatsd to dump core"
(verifying that the synopsis matched the PR number before accepting it
as a reference to PR 333)
>It should also be decided which syntax bits to use for Subject: matching.
>At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR
>would avoid the need to escape the alternation operator. Milan suggested
>using the same syntax bits as the rest of gnats:
>
> (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE
>
>but these are an issue in their own right, and this email is already becoming
>too long.
>
I agree with Milan that we should be consistent....though like you, I find
these syntax flags questionable. I was going to test my patch against these
syntax flags, and resubmit, but I'll hold off a while and see what comes
of this email thread.
--
Mel Hatzis
Juniper Networks, Inc.
_______________________________________________
Help-gnats mailing list
Help-gnats@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnats
next prev parent reply other threads:[~2002-11-04 5:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-03 21:40 Lars Henriksen
2002-11-04 11:31 ` Mel Hatzis [this message]
2002-11-04 15:41 ` Lars Henriksen
2002-11-06 21:43 ` Lars Henriksen
2002-11-09 3:26 ` Mel Hatzis
2002-12-02 14:45 ` Lars Henriksen
2002-12-17 6:38 ` Yngve Svendsen
2003-03-02 11:57 ` Andrew J. Gray
2003-03-02 20:47 ` Mark D. Baushke
2003-03-03 20:22 ` Lars Henriksen
2003-03-03 19:51 ` Lars Henriksen
2003-03-09 2:33 ` Andrew J. Gray
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3DC602D4.1070706@juniper.net \
--to=hatzis@juniper.net \
--cc=Lars.Henriksen@netman.dk \
--cc=help-gnats@gnu.org \
--cc=pdm@zamazal.org \
--cc=yngve.svendsen@sun.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).