On 19/11/2019 14:56, Jason Merrill wrote: > On Mon, Nov 18, 2019 at 4:38 PM Richard Earnshaw (lists) < > Richard.Earnshaw@arm.com> wrote: > >> On 18/11/2019 20:53, Jason Merrill wrote: >>> On Mon, Nov 18, 2019 at 2:51 PM Segher Boessenkool < >>> segher@kernel.crashing.org> wrote: >>> >>>> On Mon, Nov 18, 2019 at 07:21:22PM +0000, Richard Earnshaw (lists) >> wrote: >>>>> On 18/11/2019 18:53, Segher Boessenkool wrote: >>>>>> PR target/92140: clang vs gcc optimizing with adc/sbb >>>>>> PR fortran/91926: assumed rank optional >>>>>> PR tree-optimization/91532: [SVE] Redundant predicated store in >>>> gcc.target/aarch64/fmla_2.c >>>>>> PR tree-optimization/92161: ICE in vect_get_vec_def_for_stmt_copy, at >>>> tree-vect-stmts.c:1687 >>>>>> PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction, >>>> at tree-vect-loop.c:4252 >>>>>> PR c++/92015: internal compiler error: in cxx_eval_array_reference, at >>>> cp/constexpr.c:2568 >>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at >>>> optabs-tree.c:81 >>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at >>>> optabs-tree.c:81 >>>>>> PR fortran/92174: runtime error: index 15 out of bounds for type >>>> 'gfc_expr *[15] >>>>>> >>>>>> Most of these aren't helpful at all, and none of these are good commit >>>>>> summaries. The PR92173 one actually has identical commit messages >> btw, >>>>>> huh. Ah, the second one (r277288) has the wrong changelog, but in the >>>>>> actual changelog file as well, not something any tool could fix up (or >>>>>> have we reached the singularity?) >>>>> >>>>> Identical commits are normally from where the same commit is made to >>>>> multiple branches. It's not uncommon to see this when bugs are fixed. >>>> >>>> This is an actual mistake. The commits are not identical at all, just >>>> the commit messages are (and the changelog entries, too). Not something >>>> that happens to ften, but of course I hit it in the first random thing I >>>> pick :-) >>>> >>>>> Ultimately the question here is whether something like the above is >> more >>>>> or less useful than what we have today, which is summary lines of the >>>> form: >>>>> >>>>> >>>> >>>> I already said I would prefer things like >>>> Patch related to PR323 >>>> as the patch subject lines. No one argues that the current state of >>>> affairs is good. I argue that replacing this with often wrong and >>>> irrelevant information isn't the best we can do. >>>> >>> >>> How about using the first line that isn't a ChangeLog date/author line, >>> without trying to rewrite/augment it? >>> >>> Jason >>> >> >> It would certainly be another way of doing it. Sometimes it would >> produce almost the same as an unadulterated PR; sometimes it would >> produce something more meaningful and sometimes it would be pretty >> useless. It probably would hit more cases than my current script in >> that it wouldn't require the commit to mention a PR in it. >> >> The main problem is that the first line is often incomplete, and much of >> it is also wasted with elements like the full path to a file that is >> quite deep in the tree. Lets take a quick example (the first I found in >> the dump I have). >> >> 1998-12-17 Vladimir N. Makarov >> * config/i60/i960.md (extendqihi2): Fix typo (usage ',' instead of >> ';'). >> 1998-12-17 Michael Tiemann >> * i960.md (extend*, zero_extend*): Don't generate rtl that looks >> like (subreg:SI (reg:SI N) 0), because it's wrong, and it hides >> optimizations from the combiner. >> >> Firstly, this example misses a blank line between the author and the >> change message itself, which makes distinguishing between this and the >> multiple authors case harder. Secondly, the entry really spans two >> lines and cutting it off at the end of the first line would be, well a >> bit odd. We could try to piece things together more, by combining lines >> until we find a sentence end ( \.$ or \.\s\s ), and we could also strip >> off more of the path components to just leave the name of the file >> committed. For example, >> >> i960.md (extendqihi2): Fix typo (usage ',' instead of ';'). >> >> That might work better, but obviously it's harder to handle and thus >> more likely to create erroneous summaries. >> > > Yep. I don't think we need to worry about getting optimal one-line > summaries for ancient commits; something reasonably unique should be plenty. > Attached is the latest version of my script. I used (very nearly) this to produce a conversion over the weekend and I've uploaded that here: https://gitlab.com/rearnsha/gcc-trial-20191130 Note, that I might blow this away at any time. IT IS NOT A FINAL CONVERSION. Some other things to note: - there are a number of known issues with the version of reposurgeon used for this that are being worked on - emptycommit-* tags - my control script was out-of-date - *deleted* branches - this is being worked on - weird dependencies around merges - this is being worked on - author attributions are sometimes incorrect - reported The main difference between the attached script and the one I used for this conversion is that ChangeLog change that contain :: inside a function list is now handled correctly, resulting in a number of cases that were previously being missed now being correctly handled. Choices I made: - When a PR is used to derive the summary, I prefix this with 're' (as in the Latin 'in re'. - long change hunks produce poor summaries. To reduce the overhead: - path names are removed, leaving just the final file name - multiple files are replaced with [...] after the first filename - similarly, multiple function names are replaced with [...] - very long comments are truncated, preferably at the strongest punctuation mark, but sometimes after key words, such as 'if', 'when', 'unless' and a few more. Ultimately, if the line is still too long, we just break after an arbitrary space. - Where possible useful summary lines that appear after an author, attribution are hoisted as a summary. - certain key words in otherwise not very useful summary lines are also spotted and used to add [revert] or [backport] annotations to the summary. No changes are made to the main commit log, if we add a new summary line, the entire original text is kept. An example of a summary produced by this is for the commit to r278572, where the original log entry is: Backported from mainline 2019-08-02 Jakub Jelinek * quadmath.h (M_Eq, M_LOG2Eq, M_LOG10Eq, M_LN2q, M_LN10q, M_PIq, M_PI_2q, M_PI_4q, M_1_PIq, M_2_PIq, M_2_SQRTPIq, M_SQRT2q, M_SQRT1_2q): Use two more decimal places. And the script then generates: [backport] quadmath.h (M_Eq, [...]): Use two more decimal places. as the summary. R.