* Subject header matching--once again @ 2002-11-03 21:40 Lars Henriksen 2002-11-04 11:31 ` Mel Hatzis 0 siblings, 1 reply; 12+ messages in thread From: Lars Henriksen @ 2002-11-03 21:40 UTC (permalink / raw) To: help-gnats; +Cc: Yngve Svendsen, Mel Hatzis, Milan Zamazal Hello, This is a story in four parts about how to recognize a PR in the Subject: header of an email sent to GNATS. PART I ====== Back in December 2001 there was a thread discussing Subject: matching, why it was so restrictive, how to loosen it and make it more useful. This is the mail that started the discussion: http://mail.gnu.org/pipermail/help-gnats/2001-December/002617.html Milan concluded: > Well, to summarize, I suggest the following: > > - Not to try to match "re", "fw", etc. and simply look for a substring > as stated below. > > - Accept "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and > PR NUMBER is present in CATEGORY. > > - Accept "\<PR[ \t/]*NUMBER" where "PR" must be all capitals and NUMBER > corresponds to an existing PR number. The second requirement was changed to: DB> Accept DB> "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and DB> NUMBER is a valid PR number (in any category). It was also debated how to handle a Subject: with several matches (the infamous Subject: containing OS/2). Milan ended the thread: > The matching was adjusted according to our consensus, except that I > didn't bother with the matching cycle, only the first possible match is > considered. I haven't tested it, would you like to do it? PART II ======= In February, someone noticed that Subject: matching had stopped working. http://mail.gnu.org/pipermail/help-gnats/2002-February/002808.html Andrew Gray submitted a patch: http://mail.gnu.org/pipermail/help-gnats/2002-March/002831.html which fixed two bugs and made Subject: matching work again. But the fix was not committed and went unnoticed for a time (see PART III). PART III ======== In May, Mel Hatzis noticed the same problem and submitted a fix. After some email exchanges Andrew Gray's fix from March was committed so all should be well. But it wasn't (and it isn't). For one thing, the agreement from December is not in the GNATS documentation. This may be just as well because the code does not implement it (fully). The regular expression now used for matching a PR is: \<((PR[ \t/])\|([-a-z0-9_+.]+)/)([0-9]+) Here regex groups two and three are not used, and the expression may be simplified to: \<(PR[ \t/]\|[-a-z0-9_+.]+/)([0-9]+) The check for upper case PR is by appearance only because Subject: matching always ignores case (notice that the check for category name has no upper case letters). This point has never been raised before, but I think it should be. I believe that Subject: matching should be case sensitive. Not just to be able to check for PR, but simply because it is useful (see PART IV). Furthermore, PR1234 is not accepted. This form was explicitly mentioned in December as desirable. I would like to add PR#1234 and exclude PR/1234. My proposal is that the regular expression be changed to \<(PR[ \t#]?\|[-\w+.]+/)([0-9]+) and that the regex search be made case sensitive. This will pick up the first appearance of any of PR 1234, PR#1234, PR1234, category/1234 and Category/1234 anywhere in the Subject: header. PART IV ======= During the discussion in May, Mel Hatzis suggested that the regular expression be made configurable via dbconfig: http://mail.gnu.org/pipermail/help-gnats/2002-May/002901.html Everyone who uttered an opinion was in favour, and so am I. Mel submitted a patch that is still pending: http://mail.gnu.org/pipermail/help-gnats/2002-May/002930.html > You can now (optionally) include the following in the database-info > section of the dbconfig file: > > # The regular expression used to determine whether a PR is referenced > # on an email subject line > subject-matching { > "\\<((PR[ \t/])\\|([-a-z0-9_+.]+/))([0-9]+)" > capture-group "4" > } > > (The above example is exactly analogous to the built-in default) The regex group identified by capture-group must capture the PR number. The proposal is fine, but has a drawback that is worth a discussion. With the built-in default (the one discussed in parts I-III) both category, if present, and PR number are checked for validity. With a subject-matching entry in dbconfig, it is impossible to check the validity of a category: there is no way of checking the category since only the PR number is captured. Hence, it is not true that the example above, as stated, is exactly analogous to the built-in default. At least not by design. The last remark in the previous paragraph alludes to a bug in Mel's patch. The code extracts the PR number from the matching substring, but unconditionally checks for a preceding "PR" or category. The patch also makes gnatsd dump core if no subject-matching entry is present in dbconfig, but that is an easy fix. My proposal is to extend Mel's design by allowing the category name to be captured optionally and checked for validity. The dbconfig entry syntax could be: subject-matching { "regular_expression_with_groups" pr-group "integer" category-group "integer" } The first integer is the regex group containing the PR number, the second the regex group containing the category name or 0 (zero) if not used. The following entry is equivalent to the built-in default (with my amendments): subject-matching { "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" pr-group "3" category-group "2" } An entry that does not use category is: subject-matching { "\\<PR[ \t#]?([0-9]+)" pr-group "1" category-group "0" } It should also be decided which syntax bits to use for Subject: matching. At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR would avoid the need to escape the alternation operator. Milan suggested using the same syntax bits as the rest of gnats: (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE but these are an issue in their own right, and this email is already becoming too long. Lars Henriksen _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-11-03 21:40 Subject header matching--once again Lars Henriksen @ 2002-11-04 11:31 ` Mel Hatzis 2002-11-04 15:41 ` Lars Henriksen 0 siblings, 1 reply; 12+ messages in thread From: Mel Hatzis @ 2002-11-04 11:31 UTC (permalink / raw) To: Lars Henriksen; +Cc: help-gnats, Yngve Svendsen, Milan Zamazal Nice summary....more comments below. On 11/02/2002 01:35 PM, Lars Henriksen submitted: >Hello, > >This is a story in four parts about how to recognize a PR in the Subject: header >of an email sent to GNATS. > >PART I >====== >Back in December 2001 there was a thread discussing Subject: matching, why it >was so restrictive, how to loosen it and make it more useful. This is the mail >that started the discussion: > >http://mail.gnu.org/pipermail/help-gnats/2001-December/002617.html > >Milan concluded: > > > >>Well, to summarize, I suggest the following: >> >>- Not to try to match "re", "fw", etc. and simply look for a substring >> as stated below. >> >>- Accept "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and >> PR NUMBER is present in CATEGORY. >> >>- Accept "\<PR[ \t/]*NUMBER" where "PR" must be all capitals and NUMBER >> corresponds to an existing PR number. >> >> > >The second requirement was changed to: > >DB> Accept >DB> "\<CATEGORY/NUMBER" where CATEGORY is a valid category name and >DB> NUMBER is a valid PR number (in any category). > >It was also debated how to handle a Subject: with several matches (the >infamous Subject: containing OS/2). Milan ended the thread: > > > >>The matching was adjusted according to our consensus, except that I >>didn't bother with the matching cycle, only the first possible match is >>considered. I haven't tested it, would you like to do it? >> >> > >PART II >======= >In February, someone noticed that Subject: matching had stopped working. > >http://mail.gnu.org/pipermail/help-gnats/2002-February/002808.html > >Andrew Gray submitted a patch: > >http://mail.gnu.org/pipermail/help-gnats/2002-March/002831.html > >which fixed two bugs and made Subject: matching work again. But the fix was >not committed and went unnoticed for a time (see PART III). > >PART III >======== >In May, Mel Hatzis noticed the same problem and submitted a fix. After some >email exchanges Andrew Gray's fix from March was committed so all should be >well. But it wasn't (and it isn't). > >For one thing, the agreement from December is not in the GNATS documentation. >This may be just as well because the code does not implement it (fully). The >regular expression now used for matching a PR is: > > \<((PR[ \t/])\|([-a-z0-9_+.]+)/)([0-9]+) > >Here regex groups two and three are not used, and the expression may be >simplified to: > > \<(PR[ \t/]\|[-a-z0-9_+.]+/)([0-9]+ > >The check for upper case PR is by appearance only because Subject: matching >always ignores case (notice that the check for category name has no upper >case letters). This point has never been raised before, but I think it should >be. I believe that Subject: matching should be case sensitive. Not just to be >able to check for PR, but simply because it is useful (see PART IV). > >Furthermore, PR1234 is not accepted. This form was explicitly mentioned in >December as desirable. I would like to add PR#1234 and exclude PR/1234. > >My proposal is that the regular expression be changed to > > \<(PR[ \t#]?\|[-\w+.]+/)([0-9]+) > >and that the regex search be made case sensitive. > >This will pick up the first appearance of any of PR 1234, PR#1234, PR1234, >category/1234 and Category/1234 anywhere in the Subject: header. > >PART IV >======= >During the discussion in May, Mel Hatzis suggested that the regular expression >be made configurable via dbconfig: > >http://mail.gnu.org/pipermail/help-gnats/2002-May/002901.html > >Everyone who uttered an opinion was in favour, and so am I. Mel submitted a >patch that is still pending: > >http://mail.gnu.org/pipermail/help-gnats/2002-May/002930.html > > > >>You can now (optionally) include the following in the database-info >>section of the dbconfig file: >> >> # The regular expression used to determine whether a PR is referenced >> # on an email subject line >> subject-matching { >> "\\<((PR[ \t/])\\|([-a-z0-9_+.]+/))([0-9]+)" >> capture-group "4" >> } >> >>(The above example is exactly analogous to the built-in default) >> >> > >The regex group identified by capture-group must capture the PR number. > >The proposal is fine, but has a drawback that is worth a discussion. With the >built-in default (the one discussed in parts I-III) both category, if present, >and PR number are checked for validity. With a subject-matching entry in >dbconfig, it is impossible to check the validity of a category: there is no way >of checking the category since only the PR number is captured. Hence, it >is not true that the example above, as stated, is exactly analogous to the >built-in default. At least not by design. > Fair enough...your point is well taken. > >The last remark in the previous paragraph alludes to a bug in Mel's patch. The >code extracts the PR number from the matching substring, but unconditionally >checks for a preceding "PR" or category. The patch also makes gnatsd dump core >if no subject-matching entry is present in dbconfig, but that is an easy fix. > Regarding the unconditional check for a preceding "PR" or category, this is what is currently implemented - I still assert that the example I provided is analogous to the current built-in default (except of course, that the category is not verified as you state above). > >My proposal is to extend Mel's design by allowing the category name to be >captured optionally and checked for validity. > >The dbconfig entry syntax could be: > > subject-matching { > "regular_expression_with_groups" > pr-group "integer" > category-group "integer" > } > >The first integer is the regex group containing the PR number, the second the >regex group containing the category name or 0 (zero) if not used. The following >entry is equivalent to the built-in default (with my amendments): > > subject-matching { > "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" > pr-group "3" > category-group "2" > } > >An entry that does not use category is: > > subject-matching { > "\\<PR[ \t#]?([0-9]+)" > pr-group "1" > category-group "0" > } > Building on your proposal, I suggest that it'd be even better if an array of capture groups could be specified, each associated with a field name. This would allow for fields other than 'category' on the subject line. This could take the following form: subject-matching { "\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)" captured-fields { "Number" "Synopsis" } } The example above would match subject lines of the form: "PR 333 : missing subject-matching clause causes gnatsd to dump core" (verifying that the synopsis matched the PR number before accepting it as a reference to PR 333) >It should also be decided which syntax bits to use for Subject: matching. >At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR >would avoid the need to escape the alternation operator. Milan suggested >using the same syntax bits as the rest of gnats: > > (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE > >but these are an issue in their own right, and this email is already becoming >too long. > I agree with Milan that we should be consistent....though like you, I find these syntax flags questionable. I was going to test my patch against these syntax flags, and resubmit, but I'll hold off a while and see what comes of this email thread. -- Mel Hatzis Juniper Networks, Inc. _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-11-04 11:31 ` Mel Hatzis @ 2002-11-04 15:41 ` Lars Henriksen 2002-11-06 21:43 ` Lars Henriksen 2002-11-09 3:26 ` Mel Hatzis 0 siblings, 2 replies; 12+ messages in thread From: Lars Henriksen @ 2002-11-04 15:41 UTC (permalink / raw) To: Mel Hatzis; +Cc: help-gnats, Yngve Svendsen, Milan Zamazal On Sun, Nov 03, 2002 at 09:17:08PM -0800, Mel Hatzis wrote: > (snip) > > Building on your proposal, I suggest that it'd be even better if an > array of capture groups could be specified, each associated with a > field name. This would allow for fields other than 'category' on the > subject line. > > This could take the following form: > > subject-matching { > "\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)" > captured-fields { > "Number" "Synopsis" > } > } I like that, a nice, concise way of identifying the groups and their purpose, The built-in default would then become subject-matching { "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" captured-fields { "" "Category" "Number" } } > The example above would match subject lines of the form: > > "PR 333 : missing subject-matching clause causes gnatsd to dump core" > > (verifying that the synopsis matched the PR number before accepting it > as a reference to PR 333) Would you allow any field name to appear in the list or just an exquisite selection? I am not well versed in the intricacies of gnatsd. Does the existing code allow a check of Synopsis as you suggest? There will, of course, have to be validity checks for each of the field types in the list. > >It should also be decided which syntax bits to use for Subject: matching. > >At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR > >would avoid the need to escape the alternation operator. Milan suggested > >using the same syntax bits as the rest of gnats: > > > > (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE > > > >but these are an issue in their own right, and this email is already > >becoming > >too long. > > > I agree with Milan that we should be consistent Consistency is fine, but not just for its own sake. Subject matching is one thing, query-expressions another. If both are served by the same syntax, then by all means. To give two examples: should '.' match a newline? Should character classes be allowed? Lars Henriksen _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-11-04 15:41 ` Lars Henriksen @ 2002-11-06 21:43 ` Lars Henriksen 2002-11-09 3:26 ` Mel Hatzis 1 sibling, 0 replies; 12+ messages in thread From: Lars Henriksen @ 2002-11-06 21:43 UTC (permalink / raw) To: Mel Hatzis; +Cc: help-gnats, Yngve Svendsen, Milan Zamazal On Sun, Nov 03, 2002 at 09:17:08PM -0800, Mel Hatzis wrote: > (snip) > > Building on your proposal, I suggest that it'd be even better if an > array of capture groups could be specified, each associated with a > field name. This would allow for fields other than 'category' on the > subject line. > > This could take the following form: > > subject-matching { > "\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)" > captured-fields { > "Number" "Synopsis" > } > } I like that, a nice, concise way of identifying the groups and their purpose, The built-in default would then become subject-matching { "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" captured-fields { "" "Category" "Number" } } > The example above would match subject lines of the form: > > "PR 333 : missing subject-matching clause causes gnatsd to dump core" > > (verifying that the synopsis matched the PR number before accepting it > as a reference to PR 333) Would you allow any field name to appear in the list or just an exquisite selection? I am not well versed in the intricacies of gnatsd. Does the existing code allow a check of Synopsis as you suggest? There will, of course, have to be validity checks for each of the field types in the list. > >It should also be decided which syntax bits to use for Subject: matching. > >At present only RE_NO_BK_PARENS is set, but why? Setting e.g. RE_NO_BK_VBAR > >would avoid the need to escape the alternation operator. Milan suggested > >using the same syntax bits as the rest of gnats: > > > > (RE_SYNTAX_POSIX_EXTENDED | RE_BK_PLUS_QM) & ~RE_DOT_NEWLINE > > > >but these are an issue in their own right, and this email is already > >becoming > >too long. > > > I agree with Milan that we should be consistent Consistency is fine, but not just for its own sake. Subject matching is one thing, query-expressions another. If both are served by the same syntax, then by all means. To give two examples: should '.' match a newline? Should character classes be allowed? Lars Henriksen _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-11-04 15:41 ` Lars Henriksen 2002-11-06 21:43 ` Lars Henriksen @ 2002-11-09 3:26 ` Mel Hatzis 2002-12-02 14:45 ` Lars Henriksen 1 sibling, 1 reply; 12+ messages in thread From: Mel Hatzis @ 2002-11-09 3:26 UTC (permalink / raw) To: Lars Henriksen; +Cc: help-gnats, Yngve Svendsen, Milan Zamazal On 11/04/2002 11:11 AM, Lars Henriksen wrote: > On Sun, Nov 03, 2002 at 09:17:08PM -0800, Mel Hatzis wrote: > >>(snip) >> >>Building on your proposal, I suggest that it'd be even better if an >>array of capture groups could be specified, each associated with a >>field name. This would allow for fields other than 'category' on the >>subject line. >> >>This could take the following form: >> >> subject-matching { >> "\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)" >> captured-fields { >> "Number" "Synopsis" >> } >> } > > > I like that, a nice, concise way of identifying the groups and their purpose, > The built-in default would then become > > subject-matching { > "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" > captured-fields { > "" "Category" "Number" > } > } > > >>The example above would match subject lines of the form: >> >> "PR 333 : missing subject-matching clause causes gnatsd to dump core" >> >>(verifying that the synopsis matched the PR number before accepting it >>as a reference to PR 333) > > > Would you allow any field name to appear in the list or just an exquisite > selection? I am not well versed in the intricacies of gnatsd. Does the existing > code allow a check of Synopsis as you suggest? There will, of course, have to > be validity checks for each of the field types in the list. The existing code would need to be modified to do the checking. Currently, the code does a category search for the captured stuff preceding the pr number - so it's fairly hardwired. This could be addressed using 'find_field_index' and 'field_value' for the fields in question. One would also need to be careful to account for changes - for example, if PR 333 is associated with category 'A' and someone updates it to category 'B', a subject line of 'Re: A/333' should probably still match PR 333 - this is currently what happens because the check for category is independent of PR number. This poses a problem for setups like my example which used 'Synopsis' since, if the synopsis for PR 333 is modified, there is no way to verify subject lines that reference the old synopsis. However, this problem also applies to category, if one considers the possibility that a category is deleted from the categories file. Perhaps this is OK - if the category is deleted, it shouldn't surprise anyone if subject lines with the old category no longer match. This is a tougher argument to apply to a modified synopsis - since you're not deleting the synopsis value, you're merely modifying it. With a category modification, you can still do the verification, but with a synopsis modification, you can't. It all comes down to gnats-admin policy - the patch gives you the flexibility to define your own policy. It won't protect against ill-defined policy. -Mel _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-11-09 3:26 ` Mel Hatzis @ 2002-12-02 14:45 ` Lars Henriksen 2002-12-17 6:38 ` Yngve Svendsen 2003-03-02 11:57 ` Andrew J. Gray 0 siblings, 2 replies; 12+ messages in thread From: Lars Henriksen @ 2002-12-02 14:45 UTC (permalink / raw) To: Mel Hatzis; +Cc: help-gnats, Yngve Svendsen, Milan Zamazal On Mon, Nov 04, 2002 at 03:16:45PM -0800, Mel Hatzis wrote: > On 11/04/2002 11:11 AM, Lars Henriksen wrote: > >On Sun, Nov 03, 2002 at 09:17:08PM -0800, Mel Hatzis wrote: (snip) > >> > >>Building on your proposal, I suggest that it'd be even better if an > >>array of capture groups could be specified, each associated with a > >>field name. This would allow for fields other than 'category' on the > >>subject line. > >> > >>This could take the following form: > >> > >>subject-matching { > >> "\\<PR[ \t#/]?([0-9]+)[ \t]?:(.*)" > >> captured-fields { > >> "Number" "Synopsis" > >> } > >>} > > > > > >I like that, a nice, concise way of identifying the groups and their > >purpose, > >The built-in default would then become > > > > subject-matching { > > "\\<(PR[ \t#]?\\|([-\\w+.]+)/)([0-9]+)" > > captured-fields { > > "" "Category" "Number" > > } > > } No one made any comments on this thread, and silence gives consent. Mel, I hope that you are going to replace your pending patch for dbconfig configurable Subject matching with one as discussed, at least in the reduced form with Number and Category. In the meantime the existing Subject match code should be fixed to reflect the agreement reached a year back, see my first mail in this thread: http://mail.gnu.org/pipermail/help-gnats/2002-November/003185.html A patch follows that includes an update to the documentation. The feature is mentioned a couple of times in passing in 'Keeping Track'. I think it deserves a (sub)section of its own and have inserted one called 'Following up via direct email' in the 'Editing existing Problem Reports' section of 'The GNATS User Tools' chapter. I have also corrected a couple of minor errors that I ran across. The regular expression used for matching the Subject line appears in the code as \\<(PR[ \t#/]?|([-A-Za-z0-9_+.]+)/)([0-9]+) whereas the documentation has \<(PR[ \t#/]?|[-\w+.]+/)[0-9]+ I couldn't get the GNU match-word-constituent operator (\w) to work inside the bracket expression and am uncertain as to whether it is allowed there. Perl has it. The parentheses which are in the code, but missing from the manual, do not affect the matching; they are there only to capture Category and Number. I haven't aligned the regular expression syntax with the rest of GNATS as suggested by Milan. This is a non-issue as long as the regular expression is hard-coded and not exposed for users to modify. The regex searching is also case sensitive now. The patch is in production use in the GNATS installation that I am responsible for. I hope it can make it into GNATS 4.0-beta2? Lars Henriksen Index: file-pr.c =================================================================== RCS file: /cvsroot/gnats/gnats/gnats/file-pr.c,v retrieving revision 1.51 diff -u -r1.51 file-pr.c --- file-pr.c 1 Nov 2002 11:37:51 -0000 1.51 +++ file-pr.c 1 Dec 2002 22:04:21 -0000 @@ -595,14 +595,13 @@ static char * checkIfReply (PR *pr, ErrorDesc *err) { - char *prID = NULL; + char *prID; + char *prCat; AdmEntry *cat; const char *headerValue; struct re_pattern_buffer regex; struct re_registers regs; - int i, start, end, idstart; - char case_fold[256]; - char *possiblePrNum; + int i, len; reg_syntax_t old_syntax; headerValue = header_value (pr, SUBJECT); @@ -612,22 +611,15 @@ return NULL; } - old_syntax = re_set_syntax (RE_NO_BK_PARENS); + old_syntax = re_set_syntax (RE_NO_BK_PARENS | RE_NO_BK_VBAR); memset ((void *) ®ex, 0, sizeof (regex)); - for (i=0; i<256; i++) - { - case_fold[i] = tolower(i); - } - regex.translate = case_fold; - { - const char *const PAT = "\\<((PR[ \t/])\\|([-a-z0-9_+.]+)/)([0-9]+)"; + const char *const PAT = "\\<(PR[ \t#/]?|([-A-Za-z0-9_+.]+)/)([0-9]+)"; re_compile_pattern (PAT, strlen (PAT), ®ex); } i = re_search (®ex, headerValue, strlen (headerValue), 0, strlen (headerValue), ®s); - regex.translate = NULL; regfree (®ex); re_set_syntax (old_syntax); @@ -636,43 +628,39 @@ return NULL; } - start = regs.start[0]; - end = regs.end[0]; - idstart = regs.start[4] - start; - - free (regs.start); - free (regs.end); - - possiblePrNum = xmalloc (end - start + 1); - memcpy (possiblePrNum, headerValue + start, end - start); - possiblePrNum[end - start] = '\0'; - - *(possiblePrNum + idstart - 1) = '\0'; - - /* See if the category exists: */ - cat = get_adm_record (CATEGORY (pr->database), possiblePrNum); - - /* If no such category, then this is not a reply to a valid - * problem report. This situtation can arise, for example, when - * someone has the string "OS/2" in their Subject header. - */ - /* Folks often send in "pr/1234" instead of a valid category */ - if ((cat != NULL) || (strcasecmp (possiblePrNum, "pr") == 0)) + /* Check the category if there is one. */ + if (regs.start[2] > -1) { - /* We only needed res, never cat, so free cat. */ - free_adm_entry (cat); - prID = xstrdup (possiblePrNum + idstart); + len = regs.end[2] - regs.start[2]; + prCat = xmalloc (len + 1); + memcpy (prCat, headerValue + regs.start[2], len); + prCat[len] = '\0'; + + /* See if the category exists: */ + cat = get_adm_record (CATEGORY (pr->database), prCat); + free_adm_entry(cat); + free (prCat); + if (cat == NULL) + { + free (regs.start); + free (regs.end); + return NULL; + } } - free (possiblePrNum); + /* Check the PR number. */ + len = regs.end[3] - regs.start[3]; + prID = xmalloc (len + 1); + memcpy (prID, headerValue + regs.start[3], len); + prID[len] = '\0'; + + free (regs.start); + free (regs.end); - if (prID != NULL) + if (! prExists (pr->database, prID, err)) { - if (! prExists (pr->database, prID, err)) - { - free (prID); - prID = NULL; - } + free (prID); + prID = NULL; } return prID; Index: fields.texi =================================================================== RCS file: /cvsroot/gnats/gnats/doc/fields.texi,v retrieving revision 1.7 diff -u -r1.7 fields.texi --- fields.texi 24 Oct 2002 20:30:54 -0000 1.7 +++ fields.texi 1 Dec 2002 22:07:19 -0000 @@ -196,8 +196,6 @@ @c FIXME - this node is loooooooooooooooong... -@subheading Field descriptions - In a standard @sc{gnats} installation, certain fields will always be present in a Problem Report. If a PR arrives without one or more of these fields, @sc{gnats} will add them, and if they have default @@ -503,16 +501,15 @@ The reason for the change. @end table +@cindex follow-up via email @cindex subsequent mail -@cindex other mail -@cindex appending PRs -@cindex saving related mail @cindex related mail @noindent The @code{Audit-Trail} field also contains any mail messages received by @sc{gnats} related to this PR, in the order received. @sc{gnats} needs -to find a @var{category}/@var{number} at the beginning of the Subject -field of received e-mail in order to be able to file it correctly. +to find a reference to the PR in the Subject field of received email in +order to be able to file it correctly, see @ref{follow-up via email,, +Following up via direct email}. @cindex @code{Unformatted} field @item Unformatted Index: p-usage.texi =================================================================== RCS file: /cvsroot/gnats/gnats/doc/p-usage.texi,v retrieving revision 1.10 diff -u -r1.10 p-usage.texi --- p-usage.texi 24 Oct 2002 11:38:25 -0000 1.10 +++ p-usage.texi 1 Dec 2002 22:08:07 -0000 @@ -116,8 +116,13 @@ entries will also cause a copy of the new @samp{Audit-Trail} message to be sent. +Mail received at the PR submission email address and recognized by +@sc{gnats} as relating to an existing PR is also appended to the +@samp{Audit-Trail} field, see @ref{follow-up via email}. + @menu * edit-pr from the shell:: Invoking @code{edit-pr} from the shell +* follow-up via email:: Following up via direct email @end menu @node edit-pr from the shell @@ -184,6 +189,58 @@ information. When you exit the editor, @code{edit-pr} prompts you on standard input for a reason if you have changed a field that requires specifying a reason for the change. + +@node follow-up via email +@subsection Following up via direct email +@cindex follow-up via email +@cindex subsequent mail +@cindex related mail + +If you have some additional information for a PR and for some reason +do not want to (or cannot) edit the PR directly, you may append +the information to the Audit-Trail field by mailing it to the PR +submission address. + +In order for GNATS to be able to recognize the mail as pertaining to an +existing PR (as opposed to a new PR, see @ref{Submitting via e-mail,,}), +the Subject mail header field must contain a reference to the PR. +GNATS matches the Subject header against the regular expression + +@smallexample +\<(PR[ \t#/]?|[-\w+.]+/)[0-9]+ +@end smallexample + +@noindent +to determine whether such a reference is present. Any text may precede +or follow the reference in the Subject header. If more than one reference +is present, the first is used and the rest ignored. + +A PR reference matching the regular expression above has two parts. The +second is the PR number (one or more digits). The first is either the +capital letters 'PR' optionally followed by a separator character (blank, +tab, hash mark or forward slash) or the category name followed by a +forward slash. Following are some examples which match the regular +expression: + +@smallexample +PR 123 PR4567 PR#890 gnats/4711 +@end smallexample + +The PR number and the category (if present) are checked for existence, +and if the outcome is positive, the mail is appended to the Audit-Trail +field of the PR. Note that the PR need not belong to the category because +PRs may move between categories. + +Outgoing emails sent by GNATS itself may be configured to have a Subject +header field that refers to the PR in question: + +@smallexample +Subject: Re: PR @var{category}/@var{gnats-id}: @var{original message subject} +@end smallexample + +This makes it extremely easy to follow up on a PR by replying to such an +email, see @ref{dbconfig file,,The @code{dbconfig} file} and the sample, +default @code{dbconfig} file installed by @code{mkdb}. @c --------------------------------------------------------------- @node query-pr Index: s-usage.texi =================================================================== RCS file: /cvsroot/gnats/gnats/doc/s-usage.texi,v retrieving revision 1.7 diff -u -r1.7 s-usage.texi --- s-usage.texi 24 Oct 2002 21:42:31 -0000 1.7 +++ s-usage.texi 1 Dec 2002 22:08:41 -0000 @@ -3,7 +3,7 @@ submitting,,Submitting Problem Reports from Emacs}.) @menu -* using send-pr:: Creating new Problem Reports +* PR template:: The Problem Report template * send-pr in Emacs:: Using send-pr from within Emacs * send-pr from the shell:: Invoking send-pr from the shell * Submitting via e-mail:: Submitting a Problem Report via direct e-mail @@ -98,7 +98,7 @@ described in more detail in a separate section, @xref{Emacs,,The Emacs interface to @sc{gnats}}. -@node using send-pr +@node PR template @section The Problem Report template Invoking @code{send-pr} presents a PR @dfn{template} with a number of @@ -247,24 +247,6 @@ running @code{send-pr} from Emacs, the Problem Report is placed in the buffer @w{@samp{*gnats-send*}}; you can edit this file and then submit it with @kbd{C-c C-c}. - -@cindex subsequent mail -@cindex other mail -@cindex appending PRs -@cindex saving related mail -@cindex related mail -Any further mail concerning this Problem Report should be carbon-copied -to the @sc{gnats} mailing address as well, with the category and -identification number in the @code{Subject} line of the message. - -@smallexample -Subject: Re: PR @var{category}/@var{gnats-id}: @var{original message subject} -@end smallexample - -@noindent -Messages which arrive with @code{Subject} lines of this form are -automatically appended to the Problem Report in the @code{Audit-Trail} -field in the order they are received. @c ------------------------------------------------------------------------- @node Submitting via e-mail _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-12-02 14:45 ` Lars Henriksen @ 2002-12-17 6:38 ` Yngve Svendsen 2003-03-02 11:57 ` Andrew J. Gray 1 sibling, 0 replies; 12+ messages in thread From: Yngve Svendsen @ 2002-12-17 6:38 UTC (permalink / raw) To: Lars Henriksen, Mel Hatzis; +Cc: help-gnats At 23:19 01.12.2002 +0100, Lars Henriksen wrote: >A patch follows that includes an update to the documentation. The feature >is mentioned a couple of times in passing in 'Keeping Track'. I think it >deserves a (sub)section of its own and have inserted one called 'Following up >via direct email' in the 'Editing existing Problem Reports' section of 'The >GNATS User Tools' chapter. I have also corrected a couple of minor errors >that I ran across. > >The regular expression used for matching the Subject line appears in the code >as > > \\<(PR[ \t#/]?|([-A-Za-z0-9_+.]+)/)([0-9]+) > >whereas the documentation has > > \<(PR[ \t#/]?|[-\w+.]+/)[0-9]+ I have committed the documentation changes. I kept the second regexp form listed below. That isn't consistent with the code as it is now, but I trust that your patch fixing matching behaviour gets committed soon (I'll leave it to Andrew to look it over), and I also trust that Mel's patch gets committed in some form or another soon. Meeeeeeeeel? - Yngve _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2002-12-02 14:45 ` Lars Henriksen 2002-12-17 6:38 ` Yngve Svendsen @ 2003-03-02 11:57 ` Andrew J. Gray 2003-03-02 20:47 ` Mark D. Baushke 2003-03-03 19:51 ` Lars Henriksen 1 sibling, 2 replies; 12+ messages in thread From: Andrew J. Gray @ 2003-03-02 11:57 UTC (permalink / raw) To: Lars.Henriksen; +Cc: hatzis, help-gnats, yngve.svendsen, pdm > In the meantime the existing Subject match code should be fixed to reflect > the agreement reached a year back, see my first mail in this thread: > http://mail.gnu.org/pipermail/help-gnats/2002-November/003185.html > > A patch follows that includes an update to the documentation. The feature > is mentioned a couple of times in passing in 'Keeping Track'. I think it > deserves a (sub)section of its own and have inserted one called 'Following up > via direct email' in the 'Editing existing Problem Reports' section of 'The > GNATS User Tools' chapter. I have also corrected a couple of minor errors > that I ran across. Thanks for that patch, I am sorry it has taken me so long to get to it. > The regular expression used for matching the Subject line appears in the code > as > > \\<(PR[ \t#/]?|([-A-Za-z0-9_+.]+)/)([0-9]+) > > whereas the documentation has > > \<(PR[ \t#/]?|[-\w+.]+/)[0-9]+ > > I couldn't get the GNU match-word-constituent operator (\w) to work inside > the bracket expression and am uncertain as to whether it is allowed there. > Perl has it. The parentheses which are in the code, but missing from the > manual, do not affect the matching; they are there only to capture Category > and Number. As I understand it the match-word-constituent operator (\w) is not meant to work inside matching lists. I am looking at the "info" documentation included with the regex 0.12 (available from http://ftp.gnu.org/pub/gnu/regex/regex-0.12.tar.gz). In the "List Operators" node it says most characters lose any special meaning inside a list. I think the closest equivalent that works in a list is the alnum character class. Using this the regular expression would become: \\<(PR[ \t#/]?|([-[:alnum:]_+.]+)/)([0-9]+) Do you think this is a satisfactory replacement for \w? > I haven't aligned the regular expression syntax with the rest of GNATS as > suggested by Milan. This is a non-issue as long as the regular expression > is hard-coded and not exposed for users to modify. The regex searching is also > case sensitive now. OK. > The patch is in production use in the GNATS installation that I am > responsible for. I hope it can make it into GNATS 4.0-beta2? Sorry that the patch missed the beta 2. Once we have decided whether or not to use the alnum character class I will commit the patch. -- Andrew J. Gray _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2003-03-02 11:57 ` Andrew J. Gray @ 2003-03-02 20:47 ` Mark D. Baushke 2003-03-03 20:22 ` Lars Henriksen 2003-03-03 19:51 ` Lars Henriksen 1 sibling, 1 reply; 12+ messages in thread From: Mark D. Baushke @ 2003-03-02 20:47 UTC (permalink / raw) To: Andrew J. Gray; +Cc: Lars.Henriksen, help-gnats, yngve.svendsen, hatzis, pdm Andrew J. Gray <andrewg@gnu.org> writes: > > In the meantime the existing Subject match code should be fixed to reflect > > the agreement reached a year back, see my first mail in this thread: > > http://mail.gnu.org/pipermail/help-gnats/2002-November/003185.html > > > > A patch follows that includes an update to the documentation. The > > feature is mentioned a couple of times in passing in 'Keeping > > Track'. I think it deserves a (sub)section of its own and have > > inserted one called 'Following up via direct email' in the 'Editing > > existing Problem Reports' section of 'The GNATS User Tools' chapter. > > I have also corrected a couple of minor errors that I ran across. > > Thanks for that patch, I am sorry it has taken me so long to get to > it. > > > The regular expression used for matching the Subject line appears in > > the code as > > > > \\<(PR[ \t#/]?|([-A-Za-z0-9_+.]+)/)([0-9]+) > > > > whereas the documentation has > > > > \<(PR[ \t#/]?|[-\w+.]+/)[0-9]+ > > > > I couldn't get the GNU match-word-constituent operator (\w) to work inside > > the bracket expression and am uncertain as to whether it is allowed there. > > Perl has it. The parentheses which are in the code, but missing from the > > manual, do not affect the matching; they are there only to capture Category > > and Number. > > As I understand it the match-word-constituent operator (\w) is not > meant to work inside matching lists. I am looking at the "info" > documentation included with the regex 0.12 (available from > http://ftp.gnu.org/pub/gnu/regex/regex-0.12.tar.gz). In the "List > Operators" node it says most characters lose any special meaning > inside a list. > > I think the closest equivalent that works in a list is the alnum > character class. Using this the regular expression would become: > > \\<(PR[ \t#/]?|([-[:alnum:]_+.]+)/)([0-9]+) > > Do you think this is a satisfactory replacement for \w? \w is the same as [:alnum:]_ and does not really have "-" in the list, but doesn't have or "." or "+" in it. That said, using ([-:[:alnum:]_+.]+) in the above would seem to match a category name properly. > > I haven't aligned the regular expression syntax with the rest of > > GNATS as suggested by Milan. This is a non-issue as long as the > > regular expression is hard-coded and not exposed for users to > > modify. The regex searching is also case sensitive now. > > OK. > > > The patch is in production use in the GNATS installation that I am > > responsible for. I hope it can make it into GNATS 4.0-beta2? > > Sorry that the patch missed the beta 2. Once we have decided whether > or not to use the alnum character class I will commit the patch. > > -- > Andrew J. Gray Enjoy! -- Mark _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2003-03-02 20:47 ` Mark D. Baushke @ 2003-03-03 20:22 ` Lars Henriksen 0 siblings, 0 replies; 12+ messages in thread From: Lars Henriksen @ 2003-03-03 20:22 UTC (permalink / raw) To: Mark D. Baushke; +Cc: help-gnats, Andrew J. Gray, yngve.svendsen, hatzis, pdm On Sun, Mar 02, 2003 at 12:46:47PM -0800, Mark D. Baushke wrote: > Andrew J. Gray <andrewg@gnu.org> writes: > > I think the closest equivalent that works in a list is the alnum > > character class. Using this the regular expression would become: > > > > \\<(PR[ \t#/]?|([-[:alnum:]_+.]+)/)([0-9]+) > > > > Do you think this is a satisfactory replacement for \w? > > \w is the same as [:alnum:]_ and does not really have "-" in the list, > but doesn't have or "." or "+" in it. That said, using > > ([-:[:alnum:]_+.]+) in the above would seem to match a category name > properly. Hmm. Category names definitely cannot contain a colon (the field separator in the categories file). This made me wonder: which characters _are_ really forbidden/allowed? The manual (section 4.4.1) has a long list of characters that cannot appear in a category name, and is a bit woolly about comma. Forward slash is not mentioned, but ruled out since a category name is also a directory name. The following are implicitly allowed: # % = ? @ \ ^ |, but why? Probably the manual (and the code) should have a list of allowed characters, viz. those allowed by the regex above. Lars Henriksen _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2003-03-02 11:57 ` Andrew J. Gray 2003-03-02 20:47 ` Mark D. Baushke @ 2003-03-03 19:51 ` Lars Henriksen 2003-03-09 2:33 ` Andrew J. Gray 1 sibling, 1 reply; 12+ messages in thread From: Lars Henriksen @ 2003-03-03 19:51 UTC (permalink / raw) To: Andrew J. Gray; +Cc: hatzis, help-gnats, yngve.svendsen, pdm On Sun, Mar 02, 2003 at 09:53:53PM +1100, Andrew J. Gray wrote: > As I understand it the match-word-constituent operator (\w) is not > meant to work inside matching lists. I am looking at the "info" > documentation included with the regex 0.12 (available from > http://ftp.gnu.org/pub/gnu/regex/regex-0.12.tar.gz). In the "List > Operators" node it says most characters lose any special meaning > inside a list. > > I think the closest equivalent that works in a list is the alnum > character class. Using this the regular expression would become: > > \\<(PR[ \t#/]?|([-[:alnum:]_+.]+)/)([0-9]+) > > Do you think this is a satisfactory replacement for \w? Absolutely. I have made a rudimentary test that worked OK. The documentation should then have \<(PR[ \t#/]?|[-[:alnum:]+.]+/)[0-9]+ in file p-usage.texi. I think that the text-capturing parentheses should be left out for clarity. Are you aware that Yngve has committed the documentation changes from my patch already? Regards Lars _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Subject header matching--once again 2003-03-03 19:51 ` Lars Henriksen @ 2003-03-09 2:33 ` Andrew J. Gray 0 siblings, 0 replies; 12+ messages in thread From: Andrew J. Gray @ 2003-03-09 2:33 UTC (permalink / raw) To: Lars.Henriksen; +Cc: hatzis, help-gnats, yngve.svendsen, pdm > > I think the closest equivalent that works in a list is the alnum > > character class. Using this the regular expression would become: > > > > \\<(PR[ \t#/]?|([-[:alnum:]_+.]+)/)([0-9]+) > > > > Do you think this is a satisfactory replacement for \w? > > Absolutely. I have made a rudimentary test that worked OK. Good. I have committed your patch with the change to use [:alnum:]. > The documentation should then have > > \<(PR[ \t#/]?|[-[:alnum:]+.]+/)[0-9]+ > > in file p-usage.texi. I think that the text-capturing parentheses should > be left out for clarity. Agreed. I have committed this change to p-usage.texi. > Are you aware that Yngve has committed the documentation changes from > my patch already? Yes, thanks for the reminder. -- Andrew J. Gray _______________________________________________ Help-gnats mailing list Help-gnats@gnu.org http://mail.gnu.org/mailman/listinfo/help-gnats ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2003-03-09 2:33 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-11-03 21:40 Subject header matching--once again Lars Henriksen 2002-11-04 11:31 ` Mel Hatzis 2002-11-04 15:41 ` Lars Henriksen 2002-11-06 21:43 ` Lars Henriksen 2002-11-09 3:26 ` Mel Hatzis 2002-12-02 14:45 ` Lars Henriksen 2002-12-17 6:38 ` Yngve Svendsen 2003-03-02 11:57 ` Andrew J. Gray 2003-03-02 20:47 ` Mark D. Baushke 2003-03-03 20:22 ` Lars Henriksen 2003-03-03 19:51 ` Lars Henriksen 2003-03-09 2:33 ` Andrew J. Gray
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).