Proposal for the transition timetable for the move to GIT

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Proposal for the transition timetable for the move to GIT
@ 2019-09-17 12:02 Richard Earnshaw (lists)
  2019-09-17 12:24 ` Richard Biener
                   ` (6 more replies)
  0 siblings, 7 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-17 12:02 UTC (permalink / raw)
  To: gcc

At the Cauldron this weekend the overwhelming view for the move to GIT 
soon was finally expressed.

But we never discussed when and we didn't really decide which conversion 
we would use from the three that we currently have on the table.

So during the cauldron dinner I discussed with several people a possible 
timetable and route to making the final decision.  This seemed to be 
broadly acceptable to everyone I discussed this with (though obviously) 
that was by no means everyone there.

So here's my proposal, and a few implications that follow from this.

We should switch to GIT at the end of GCC-10 development Stage 3.  Ie 
on, or shortly after the 1st of January 2020.  This gives us time to get 
all the major work that might have been prepared based on SVN committed, 
but also gives us time to get used to using the new repo before we reach 
the GCC 10 branch date.

If we're to meet that date I think that means we need to make a final 
decision as to which conversion about a fortnight before that date (lets 
say Monday 16th December).  The decision should be based on the best 
conversion that we have at that time.  My proposed ranking criteria would be

- Most branches converted (eg, can it handle the SVN 
branches/<vendor>/<branch> layout that we have in addition to the 
engineering branches).
- tweaked committer history (email ids etc - nice to have)
- fixes for accidental trunk/branch deletions/restores (preferred)
- correctness around branch points (nice to have)

The conversion that meets all/most of these at the cut off date will be 
the one chosen.  Additionally, a candidate can only be chosen if it is 
correct at all (converted) branch heads.

The need to make the cut off a couple of weeks before the final 
conversion is to allow time for some trial conversions and to allow us 
time to validate that the commit hooks we want are all in place.

There should be NO CHANGE to the other processes and policies that we 
have, eg patch reviews, ChangeLog policies etc at this time.  Adding 
requirements for this will just slow down the transition by 
over-complicating things.

So after the 16th, I would expect a trial conversion to be done and the 
hooks to be installed on a version that we make available on the web 
site, but which is not the final conversion.  We can still allow commits 
to it for testing purposes, etc and to allow users to check that they 
are integrating properly with the new repo, but any such changes will be 
lost when the final conversion is done at the switch time.

So in summary my proposed timetable would be:

Monday 16th December 2019 - cut off date for picking which git 
conversion to use

Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.

Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes 
on line for live commits.

Doing this over the new year holiday period has both advantages and 
disadvantages.  On the one hand the traffic is light, so the impact to 
most developers will be quite low; on the other, it is a holiday period, 
so getting the right key folk to help might be difficult.  I won't 
object strongly if others feel that slipping a few days (but not weeks) 
would make things significantly easier.

It would be good if we could get agreement on this soon, as there are 
probably some additional logistical things to sort out.  I can think of 
a number of these, but this mail is already long enough for now.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
@ 2019-09-17 12:24 ` Richard Biener
  2019-09-17 13:50   ` Richard Earnshaw (lists)
  2019-09-17 16:35   ` Joseph Myers
  2019-09-17 16:33 ` Joseph Myers
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 198+ messages in thread
From: Richard Biener @ 2019-09-17 12:24 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: GCC Development

On Tue, Sep 17, 2019 at 2:02 PM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> At the Cauldron this weekend the overwhelming view for the move to GIT
> soon was finally expressed.
>
> But we never discussed when and we didn't really decide which conversion
> we would use from the three that we currently have on the table.
>
> So during the cauldron dinner I discussed with several people a possible
> timetable and route to making the final decision.  This seemed to be
> broadly acceptable to everyone I discussed this with (though obviously)
> that was by no means everyone there.
>
> So here's my proposal, and a few implications that follow from this.
>
> We should switch to GIT at the end of GCC-10 development Stage 3.  Ie
> on, or shortly after the 1st of January 2020.  This gives us time to get
> all the major work that might have been prepared based on SVN committed,
> but also gives us time to get used to using the new repo before we reach
> the GCC 10 branch date.
>
> If we're to meet that date I think that means we need to make a final
> decision as to which conversion about a fortnight before that date (lets
> say Monday 16th December).  The decision should be based on the best
> conversion that we have at that time.  My proposed ranking criteria would be
>
> - Most branches converted (eg, can it handle the SVN
> branches/<vendor>/<branch> layout that we have in addition to the
> engineering branches).
> - tweaked committer history (email ids etc - nice to have)
> - fixes for accidental trunk/branch deletions/restores (preferred)
> - correctness around branch points (nice to have)
>
> The conversion that meets all/most of these at the cut off date will be
> the one chosen.  Additionally, a candidate can only be chosen if it is
> correct at all (converted) branch heads.
>
> The need to make the cut off a couple of weeks before the final
> conversion is to allow time for some trial conversions and to allow us
> time to validate that the commit hooks we want are all in place.
>
> There should be NO CHANGE to the other processes and policies that we
> have, eg patch reviews, ChangeLog policies etc at this time.  Adding
> requirements for this will just slow down the transition by
> over-complicating things.
>
> So after the 16th, I would expect a trial conversion to be done and the
> hooks to be installed on a version that we make available on the web
> site, but which is not the final conversion.  We can still allow commits
> to it for testing purposes, etc and to allow users to check that they
> are integrating properly with the new repo, but any such changes will be
> lost when the final conversion is done at the switch time.
>
> So in summary my proposed timetable would be:
>
> Monday 16th December 2019 - cut off date for picking which git
> conversion to use
>
> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
>
> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes
> on line for live commits.

I think that's fine if the repository state from Dec 16th is kept up-to-date
so that effectively two weeks can be used to verify its integrity.  Doing
the update to the Dec 31th state should relatively easy?

If stage3 ends on Dec 31st then stage1 ends Oct 31st to have a two-month
stage3.  That's about two weeks earlier than in the past.

> Doing this over the new year holiday period has both advantages and
> disadvantages.  On the one hand the traffic is light, so the impact to
> most developers will be quite low; on the other, it is a holiday period,
> so getting the right key folk to help might be difficult.  I won't
> object strongly if others feel that slipping a few days (but not weeks)
> would make things significantly easier.
>
> It would be good if we could get agreement on this soon, as there are
> probably some additional logistical things to sort out.  I can think of
> a number of these, but this mail is already long enough for now.

+1

Richard.

> R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:24 ` Richard Biener
@ 2019-09-17 13:50   ` Richard Earnshaw (lists)
  2019-09-17 16:35   ` Joseph Myers
  1 sibling, 0 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-17 13:50 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On 17/09/2019 13:24, Richard Biener wrote:
> On Tue, Sep 17, 2019 at 2:02 PM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>>
>> At the Cauldron this weekend the overwhelming view for the move to GIT
>> soon was finally expressed.
>>
>> But we never discussed when and we didn't really decide which conversion
>> we would use from the three that we currently have on the table.
>>
>> So during the cauldron dinner I discussed with several people a possible
>> timetable and route to making the final decision.  This seemed to be
>> broadly acceptable to everyone I discussed this with (though obviously)
>> that was by no means everyone there.
>>
>> So here's my proposal, and a few implications that follow from this.
>>
>> We should switch to GIT at the end of GCC-10 development Stage 3.  Ie
>> on, or shortly after the 1st of January 2020.  This gives us time to get
>> all the major work that might have been prepared based on SVN committed,
>> but also gives us time to get used to using the new repo before we reach
>> the GCC 10 branch date.
>>
>> If we're to meet that date I think that means we need to make a final
>> decision as to which conversion about a fortnight before that date (lets
>> say Monday 16th December).  The decision should be based on the best
>> conversion that we have at that time.  My proposed ranking criteria would be
>>
>> - Most branches converted (eg, can it handle the SVN
>> branches/<vendor>/<branch> layout that we have in addition to the
>> engineering branches).
>> - tweaked committer history (email ids etc - nice to have)
>> - fixes for accidental trunk/branch deletions/restores (preferred)
>> - correctness around branch points (nice to have)
>>
>> The conversion that meets all/most of these at the cut off date will be
>> the one chosen.  Additionally, a candidate can only be chosen if it is
>> correct at all (converted) branch heads.
>>
>> The need to make the cut off a couple of weeks before the final
>> conversion is to allow time for some trial conversions and to allow us
>> time to validate that the commit hooks we want are all in place.
>>
>> There should be NO CHANGE to the other processes and policies that we
>> have, eg patch reviews, ChangeLog policies etc at this time.  Adding
>> requirements for this will just slow down the transition by
>> over-complicating things.
>>
>> So after the 16th, I would expect a trial conversion to be done and the
>> hooks to be installed on a version that we make available on the web
>> site, but which is not the final conversion.  We can still allow commits
>> to it for testing purposes, etc and to allow users to check that they
>> are integrating properly with the new repo, but any such changes will be
>> lost when the final conversion is done at the switch time.
>>
>> So in summary my proposed timetable would be:
>>
>> Monday 16th December 2019 - cut off date for picking which git
>> conversion to use
>>
>> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
>>
>> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes
>> on line for live commits.
> 
> I think that's fine if the repository state from Dec 16th is kept up-to-date
> so that effectively two weeks can be used to verify its integrity.  Doing
> the update to the Dec 31th state should relatively easy?

I was expecting that we would allow some trial commits to the initial 
conversion in order to test that the hooks were working correctly.  At 
the end of the trial that conversion would be discarded entirely and 
replaced by the final conversion.

Of course, depending on which conversion we chose we could simply clone 
it and then back out any test commits if that was easier than redoing 
the final conversion.


> 
> If stage3 ends on Dec 31st then stage1 ends Oct 31st to have a two-month
> stage3.  That's about two weeks earlier than in the past.
> 
>> Doing this over the new year holiday period has both advantages and
>> disadvantages.  On the one hand the traffic is light, so the impact to
>> most developers will be quite low; on the other, it is a holiday period,
>> so getting the right key folk to help might be difficult.  I won't
>> object strongly if others feel that slipping a few days (but not weeks)
>> would make things significantly easier.
>>
>> It would be good if we could get agreement on this soon, as there are
>> probably some additional logistical things to sort out.  I can think of
>> a number of these, but this mail is already long enough for now.
> 
> +1
> 
> Richard.
> 
>> R.

I've created a page on the wiki to track this and any other work that 
needs doing to achieve the transition.

	https://gcc.gnu.org/wiki/GitConversion

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
  2019-09-17 12:24 ` Richard Biener
@ 2019-09-17 16:33 ` Joseph Myers
  2019-09-19 12:04 ` Janne Blomqvist
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-09-17 16:33 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

On Tue, 17 Sep 2019, Richard Earnshaw (lists) wrote:

> Doing this over the new year holiday period has both advantages and
> disadvantages.  On the one hand the traffic is light, so the impact to most
> developers will be quite low; on the other, it is a holiday period, so getting
> the right key folk to help might be difficult.  I won't object strongly if
> others feel that slipping a few days (but not weeks) would make things
> significantly easier.

I think slipping it one week is better so more people are around to check 
things at the transition time.

I'd also like to add: for any approach using a fresh, clean conversion, we 
should ensure the existing commit ids and git-only branches remain 
available somewhere.  There are two possible approaches:

1. Keep the existing git-svn repository available, read-only, in some 
(renamed) public location.

2. Put the objects and (renamed) refs into the new repository alongside 
the cleanly converted history.  Because most blob and tree objects should 
be identical between the conversions (with it mainly being commit and tag 
objects that aren't shared between the versions of the history), that 
shouldn't enlarge the repository that much.  I've previously suggsted a 
git command (untested) along the following lines (plus repacking the 
repository afterwards) to do this:

git fetch git://gcc.gnu.org/git/gcc.git \
    'refs/heads/*:refs/heads/git-old/*' \
    'refs/remotes/*:refs/heads/git-svn-old/*' \
    'regs/tags/*:refs/tags/git-old/*'

(It is of course possible to do (1) and then decide at a later point to do 
(2) as well.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:24 ` Richard Biener
  2019-09-17 13:50   ` Richard Earnshaw (lists)
@ 2019-09-17 16:35   ` Joseph Myers
  2019-09-17 17:51     ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-09-17 16:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Earnshaw (lists), GCC Development

On Tue, 17 Sep 2019, Richard Biener wrote:

> If stage3 ends on Dec 31st then stage1 ends Oct 31st to have a two-month
> stage3.  That's about two weeks earlier than in the past.

I don't think the repository conversion should constrain the timing of the 
end of stage 1 or stage 3; it should be OK to end stage 3 either before or 
after the repository conversion if that proves convenient.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 16:35   ` Joseph Myers
@ 2019-09-17 17:51     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-17 17:51 UTC (permalink / raw)
  To: Joseph Myers, Richard Biener; +Cc: GCC Development

On 17/09/2019 17:35, Joseph Myers wrote:
> On Tue, 17 Sep 2019, Richard Biener wrote:
> 
>> If stage3 ends on Dec 31st then stage1 ends Oct 31st to have a two-month
>> stage3.  That's about two weeks earlier than in the past.
> 
> I don't think the repository conversion should constrain the timing of the 
> end of stage 1 or stage 3; it should be OK to end stage 3 either before or 
> after the repository conversion if that proves convenient.
> 

It's not the repository conversion that's the problem; it's about not
making it harder for folk to get things in before the end of the stage.
 So I don't mind slipping the switch-over back a few days, but I don't
think we should pull it forwards unless that stage has already finished.
 Immediately after the end of stage 3 seemed to be the best time to do
this while still giving plenty of time before we consider branching.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
  2019-09-17 12:24 ` Richard Biener
  2019-09-17 16:33 ` Joseph Myers
@ 2019-09-19 12:04 ` Janne Blomqvist
  2019-09-19 14:43   ` Damian Rouson
  2019-09-19 15:30   ` Richard Earnshaw (lists)
  2019-09-19 15:35 ` Maxim Kuvyrkov
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 198+ messages in thread
From: Janne Blomqvist @ 2019-09-19 12:04 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc mailing list

On Tue, Sep 17, 2019 at 3:02 PM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
> There should be NO CHANGE to the other processes and policies that we
> have, eg patch reviews, ChangeLog policies etc at this time.  Adding
> requirements for this will just slow down the transition by
> over-complicating things.

A little aside; I fully support the above, lets change one thing at a
time. But it would be nice with some short documentation about the git
workflow that we'll start with (which, presumably, at least initially
shouldn't differ too much from the svn workflow many are familiar with
for the reasons you mention above), particularly for those not that
familiar with git, or have only used git together with github or such.

One thing that's unclear to me is how should I actually make my stuff
appear in the public repo? Say I want to work on some particular
thing:

1. git checkout -b pr1234-foo   # A private branch based on latest trunk
2. Then when I'm happy, I send out a patch for review, either manually
or with git format-patch + send-email.
3. Patch goes through a few revisions, and is approved.
4. Now what?
4a) Do I merge my private branch to master (err, trunk?), then commit and push?
4b) Or do I first rebase my branch on top of the latest master, to
produce a slightly less branchy history?
4c) Or do I (manually?) apply my patch on master, to create a linear history?
4d) Something else entirely?

Thanks,
-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 12:04 ` Janne Blomqvist
@ 2019-09-19 14:43   ` Damian Rouson
  2019-09-19 15:30     ` Janne Blomqvist
  2019-10-25 14:10     ` Richard Earnshaw (lists)
  2019-09-19 15:30   ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 198+ messages in thread
From: Damian Rouson @ 2019-09-19 14:43 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: Richard Earnshaw (lists), gcc mailing list

On Thu, Sep 19, 2019 at 5:04 AM Janne Blomqvist <blomqvist.janne@gmail.com>
wrote:

>
> One thing that's unclear to me is how should I actually make my stuff
> appear in the public repo? Say I want to work on some particular
> thing:
>

This is essentially a git workflow question.  A simple and useful workflow
to consider is the
GitHub Flow: https://guides.github.com/introduction/flow/.  Others to
consider are on the
GitLab Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html and
on Atlassian's
Git Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html.  Where
will the GCC
git repository be hosted?

> 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
> 2. Then when I'm happy, I send out a patch for review, either manually
> or with git format-patch + send-email.
>

Will GCC allow workflows other than emailing patches?  It could make
contributing more
inviting to new developers.   A large community of developers has grown up
around the
above workflows and are used to using the related tools.  I realize
emailing patches
probably seems simple to GCC developers, but that practice is one of the
main reasons I
haven't contributed code to GCC even though I have supported GCC
development financially
and I frequently interact with GCC developers. My problems with email have
been many.
I have often forgotten to set my emails to plain text so my emails to GCC
lists bounce and
I have to resend them (often hours later if I didn't see the bounce right
away).  When I
receive patches from GCC developers, I get frustrated with determining what
-p argument
to pass when applying the patch. I'm equally daunted with the process of
searching through
emails to find related discussions rather than having all the dialogue
about a pull request
(which contains the same information as a patch) in one place.  And with
plain-text emails
as the medium, I really miss the ability to format dialogues with Markdown,
including inserting
hyperlinks but also to tie comments to specific lines of code in a browser
interface to the
pull request, etc.

3. Patch goes through a few revisions, and is approved.
> 4. Now what?
> 4a) Do I merge my private branch to master (err, trunk?), then commit and
> push?
>

It's safer to first merge master into your branch and then retest with all
the new commits
that have hit master since you branched.  If you test right after merging
and find no
problems (and no new commits hit master while you're testing), then the
head of your
branch will reflect the state master will reach when you merge into master
so you know
it's safe to do so.

> 4b) Or do I first rebase my branch on top of the latest master, to
> produce a slightly less branchy history?
>

A lot of people find rebasing to be overly complicated and error-prone
(with the exception
of interactive rebasing for the purpose of squashing commits that haven't
been pushed to
the remote repository).  The above merging steps are easier at the expense
of having
merge commits in the history, which I think is good to better document the
branching
history.

> 4c) Or do I (manually?) apply my patch on master, to create a linear
> history?

See above.  I recommend "git merge" over manually applying patches.

> 4d) Something else entirely?
>

A lot of the testing can be automated.  For example, on GitHub, git hooks
can be set up
to ensure that if a branch has an open pull request against master (or
other designated
branches), tests run for that branch every time a new commit is pushed to
it.

Damian

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 12:04 ` Janne Blomqvist
  2019-09-19 14:43   ` Damian Rouson
@ 2019-09-19 15:30   ` Richard Earnshaw (lists)
  2019-09-19 15:49     ` Damian Rouson
  1 sibling, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-19 15:30 UTC (permalink / raw)
  To: Janne Blomqvist; +Cc: gcc mailing list

On 19/09/2019 13:04, Janne Blomqvist wrote:
> On Tue, Sep 17, 2019 at 3:02 PM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>> There should be NO CHANGE to the other processes and policies that we
>> have, eg patch reviews, ChangeLog policies etc at this time.  Adding
>> requirements for this will just slow down the transition by
>> over-complicating things.
> 
> A little aside; I fully support the above, lets change one thing at a
> time. But it would be nice with some short documentation about the git
> workflow that we'll start with (which, presumably, at least initially
> shouldn't differ too much from the svn workflow many are familiar with
> for the reasons you mention above), particularly for those not that
> familiar with git, or have only used git together with github or such.
> 
> One thing that's unclear to me is how should I actually make my stuff
> appear in the public repo? Say I want to work on some particular
> thing:
> 
> 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
> 2. Then when I'm happy, I send out a patch for review, either manually
> or with git format-patch + send-email.
> 3. Patch goes through a few revisions, and is approved.
> 4. Now what?
> 4a) Do I merge my private branch to master (err, trunk?), then commit and push?
> 4b) Or do I first rebase my branch on top of the latest master, to
> produce a slightly less branchy history?
> 4c) Or do I (manually?) apply my patch on master, to create a linear history?
> 4d) Something else entirely?
> 
> Thanks,
> 

I believe the current intent is that, at least for now, the trunk and 
release branches will be simple linear chains of commits (no merges).

This is the same workflow as is currently used in gdb, binutils and 
glibc, and we will likely lift the hooks to enforce this from those 
projects.  See the separate discussion on the git hooks for a bit more 
detail.

What individuals do on private branches is up to them.  Similarly for 
development branches (policy set by branch owner), but they will need 
linearizing (or maybe squashing) before they can be merged to trunk.

The aim is to keep the workflow as close as possible to the existing one 
to start with.  I'd expect most developers to work by posting patches to 
gcc-patches as before, though 'git format-patch' emails may well be 
acceptable (quite a few developers use a workflow much like that already).

This will all be written up before the switch...

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 14:43   ` Damian Rouson
@ 2019-09-19 15:30     ` Janne Blomqvist
  2019-10-25 14:10     ` Richard Earnshaw (lists)
  1 sibling, 0 replies; 198+ messages in thread
From: Janne Blomqvist @ 2019-09-19 15:30 UTC (permalink / raw)
  To: Damian Rouson; +Cc: Richard Earnshaw (lists), gcc mailing list

On Thu, Sep 19, 2019 at 5:43 PM Damian Rouson
<damian@sourceryinstitute.org> wrote:
>
>
>
> On Thu, Sep 19, 2019 at 5:04 AM Janne Blomqvist <blomqvist.janne@gmail.com> wrote:
>>
>>
>> One thing that's unclear to me is how should I actually make my stuff
>> appear in the public repo? Say I want to work on some particular
>> thing:
>
>
> This is essentially a git workflow question.

Yes. What I'm saying is we ("we" as in whoever is responsible for the
git conversion, or the gcc development community in general) should
have some kind of workflow documented. Doesn't have to be an academic
paper or a best-seller book written by some self-styled "thought
leader", but there should be some guidance. A page in the wiki or in
wwwdocs is good enough for me.

>  A simple and useful workflow to consider is the
> GitHub Flow: https://guides.github.com/introduction/flow/.  Others to consider are on the
> GitLab Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html and on Atlassian's
> Git Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html.  Where will the GCC
> git repository be hosted?

Yes, I'm aware (though I do think gitflow is a bit too overcomplicated
for its own good, but YMMV).

>> 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
>> 2. Then when I'm happy, I send out a patch for review, either manually
>> or with git format-patch + send-email.
>
>
> Will GCC allow workflows other than emailing patches?  It could make contributing more
> inviting to new developers.   A large community of developers has grown up around the
> above workflows and are used to using the related tools.  I realize emailing patches
> probably seems simple to GCC developers, but that practice is one of the main reasons I
> haven't contributed code to GCC even though I have supported GCC development financially
> and I frequently interact with GCC developers. My problems with email have been many.
> I have often forgotten to set my emails to plain text so my emails to GCC lists bounce and
> I have to resend them (often hours later if I didn't see the bounce right away).  When I
> receive patches from GCC developers, I get frustrated with determining what -p argument
> to pass when applying the patch. I'm equally daunted with the process of searching through
> emails to find related discussions rather than having all the dialogue about a pull request
> (which contains the same information as a patch) in one place.  And with plain-text emails
> as the medium, I really miss the ability to format dialogues with Markdown, including inserting
> hyperlinks but also to tie comments to specific lines of code in a browser interface to the
> pull request, etc.

I do see the attractiveness of these kinds of tools, however as the
original message in this thread stated, at this point we have enough
to chew on just getting the git transition done. Spending another year
(or more!) bikeshedding various other workflow improvements to tack on
to the git transition would be a mistake. After the git transition is
done and the smoke has settled, we can start thinking whether we want
to move away from the current email-based workflow.

As for the email-based workflow, the nice thing with git is that it
has nice support for it, via the format-patch, send-email, am, and
apply commands. So at least it will be an improvement upon the current
svn-based workflow.

>> 3. Patch goes through a few revisions, and is approved.
>> 4. Now what?
>> 4a) Do I merge my private branch to master (err, trunk?), then commit and push?
>
>
> It's safer to first merge master into your branch and then retest with all the new commits
> that have hit master since you branched.  If you test right after merging and find no
> problems (and no new commits hit master while you're testing), then the head of your
> branch will reflect the state master will reach when you merge into master so you know
> it's safe to do so.

Ugh. Merging master into your branch and then merging your branch back
into master makes the history somewhat convoluted, IMHO.

> A lot of the testing can be automated.  For example, on GitHub, git hooks can be set up
> to ensure that if a branch has an open pull request against master (or other designated
> branches), tests run for that branch every time a new commit is pushed to it.

Sure. Again, something to look into once the git transition itself is
done, IMHO.

-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
                   ` (2 preceding siblings ...)
  2019-09-19 12:04 ` Janne Blomqvist
@ 2019-09-19 15:35 ` Maxim Kuvyrkov
  2019-12-06 14:44   ` Maxim Kuvyrkov
  2019-09-19 17:04 ` Paul Koning
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-09-19 15:35 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

> On Sep 17, 2019, at 3:02 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> At the Cauldron this weekend the overwhelming view for the move to GIT soon was finally expressed.
> 
...
> 
> So in summary my proposed timetable would be:
> 
> Monday 16th December 2019 - cut off date for picking which git conversion to use
> 
> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
> 
> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.
> 
> Doing this over the new year holiday period has both advantages and disadvantages.  On the one hand the traffic is light, so the impact to most developers will be quite low; on the other, it is a holiday period, so getting the right key folk to help might be difficult.  I won't object strongly if others feel that slipping a few days (but not weeks) would make things significantly easier.

The timetable looks entirely reasonable to me.

I have regenerated my primary version this week, and it's up at https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .  So far I have received only minor issue reports about it, and all known problems have been fixed.  I could use a bit more scrutiny :-).

Regards,

--
Maxim Kuvyrkov
www.linaro.org


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 15:30   ` Richard Earnshaw (lists)
@ 2019-09-19 15:49     ` Damian Rouson
  0 siblings, 0 replies; 198+ messages in thread
From: Damian Rouson @ 2019-09-19 15:49 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: Janne Blomqvist, gcc mailing list

Thanks to you and Janne for the thoughtful replies.  I understand better
the immediate goals now.

Damian

On Thu, Sep 19, 2019 at 08:31 Richard Earnshaw (lists) <
Richard.Earnshaw@arm.com> wrote:

> On 19/09/2019 13:04, Janne Blomqvist wrote:
> > On Tue, Sep 17, 2019 at 3:02 PM Richard Earnshaw (lists)
> > <Richard.Earnshaw@arm.com> wrote:
> >> There should be NO CHANGE to the other processes and policies that we
> >> have, eg patch reviews, ChangeLog policies etc at this time.  Adding
> >> requirements for this will just slow down the transition by
> >> over-complicating things.
> >
> > A little aside; I fully support the above, lets change one thing at a
> > time. But it would be nice with some short documentation about the git
> > workflow that we'll start with (which, presumably, at least initially
> > shouldn't differ too much from the svn workflow many are familiar with
> > for the reasons you mention above), particularly for those not that
> > familiar with git, or have only used git together with github or such.
> >
> > One thing that's unclear to me is how should I actually make my stuff
> > appear in the public repo? Say I want to work on some particular
> > thing:
> >
> > 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
> > 2. Then when I'm happy, I send out a patch for review, either manually
> > or with git format-patch + send-email.
> > 3. Patch goes through a few revisions, and is approved.
> > 4. Now what?
> > 4a) Do I merge my private branch to master (err, trunk?), then commit
> and push?
> > 4b) Or do I first rebase my branch on top of the latest master, to
> > produce a slightly less branchy history?
> > 4c) Or do I (manually?) apply my patch on master, to create a linear
> history?
> > 4d) Something else entirely?
> >
> > Thanks,
> >
>
> I believe the current intent is that, at least for now, the trunk and
> release branches will be simple linear chains of commits (no merges).
>
> This is the same workflow as is currently used in gdb, binutils and
> glibc, and we will likely lift the hooks to enforce this from those
> projects.  See the separate discussion on the git hooks for a bit more
> detail.
>
> What individuals do on private branches is up to them.  Similarly for
> development branches (policy set by branch owner), but they will need
> linearizing (or maybe squashing) before they can be merged to trunk.
>
> The aim is to keep the workflow as close as possible to the existing one
> to start with.  I'd expect most developers to work by posting patches to
> gcc-patches as before, though 'git format-patch' emails may well be
> acceptable (quite a few developers use a workflow much like that already).
>
> This will all be written up before the switch...
>
> R.
>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
                   ` (3 preceding siblings ...)
  2019-09-19 15:35 ` Maxim Kuvyrkov
@ 2019-09-19 17:04 ` Paul Koning
  2019-10-25 14:02   ` Richard Earnshaw (lists)
  2019-09-20 15:49 ` Jeff Law
  2019-09-21  9:26 ` Segher Boessenkool
  6 siblings, 1 reply; 198+ messages in thread
From: Paul Koning @ 2019-09-19 17:04 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

> On Sep 17, 2019, at 8:02 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> ...
> So in summary my proposed timetable would be:
> 
> Monday 16th December 2019 - cut off date for picking which git conversion to use
> 
> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
> 
> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.

That sounds ok but it feels incomplete; there are additional steps and dates needed leading up to the 16th December decision point.

I would suggest: 1 December 2019: final version of each proposed conversion tool is available, trial conversion repository of the full GCC SVN repository is posted for public examination.

That allows 2 weeks for the different tools and their output to get the scrutiny needed for the picking decision to be made.  2 weeks may be more than needed (or possibly, less), but in any case I think this piece needs to be called out.

	paul

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
                   ` (4 preceding siblings ...)
  2019-09-19 17:04 ` Paul Koning
@ 2019-09-20 15:49 ` Jeff Law
  2019-09-21  9:11   ` Segher Boessenkool
  2019-09-21  9:26 ` Segher Boessenkool
  6 siblings, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-09-20 15:49 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc

On 9/17/19 6:02 AM, Richard Earnshaw (lists) wrote:
> At the Cauldron this weekend the overwhelming view for the move to GIT
> soon was finally expressed.
[ ... proposal itself ... ]
So there's nothing in the proposal I would object to, nor do I object to
being slightly flexible.  If we need to move the transition a few days
into the new year because of developer availability, that seems fine.
Similarly if we want to move up the date for a decision to be made
that's fine as well so long as the potentially affected parties are
notified ASAP what that date is.

With the SVN repo going read-only it becomes our fallback plan in case
of major unexpected problems.

Joseph's recommendation for having the old objects/refs in the new repo
makes a lot of sense. So if it works, it's got my support as well.

Anyway, just wanted to chime in with my support for the plan and make it
clear that as long as we get a conversion that is as good as or better
than the mirror is now that I'll be happy :-)

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-20 15:49 ` Jeff Law
@ 2019-09-21  9:11   ` Segher Boessenkool
  2019-09-21  9:39     ` Andreas Schwab
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-09-21  9:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Earnshaw (lists), gcc

On Fri, Sep 20, 2019 at 09:49:36AM -0600, Jeff Law wrote:
> With the SVN repo going read-only it becomes our fallback plan in case
> of major unexpected problems.
> 
> Joseph's recommendation for having the old objects/refs in the new repo
> makes a lot of sense. So if it works, it's got my support as well.

That potentially (and probably) makes the repository a lot bigger.  Better
test that before deciding to do this.

> Anyway, just wanted to chime in with my support for the plan and make it
> clear that as long as we get a conversion that is as good as or better
> than the mirror is now that I'll be happy :-)

Yup.  And if everyone would use it, that already makes it better than
the mirror.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
                   ` (5 preceding siblings ...)
  2019-09-20 15:49 ` Jeff Law
@ 2019-09-21  9:26 ` Segher Boessenkool
  6 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-09-21  9:26 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

On Tue, Sep 17, 2019 at 01:02:20PM +0100, Richard Earnshaw (lists) wrote:
> At the Cauldron this weekend the overwhelming view for the move to GIT 
> soon was finally expressed.

[ cutting and pasting a bit ]

> There should be NO CHANGE to the other processes and policies that we 
> have, eg patch reviews, ChangeLog policies etc at this time.  Adding 
> requirements for this will just slow down the transition by 
> over-complicating things.

And I would add or generalise, NO SCOPE CREEP.  We need to get this done
now (or months or years ago).

And that includes:

> - tweaked committer history (email ids etc - nice to have)
> - fixes for accidental trunk/branch deletions/restores (preferred)
> - correctness around branch points (nice to have)

Whatever of that is already done is fine of course, but we should not
let any of this delay us a second further.

> So in summary my proposed timetable would be:
> 
> Monday 16th December 2019 - cut off date for picking which git 
> conversion to use
> 
> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
> 
> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes 
> on line for live commits.

And from then until the end of stage 4, everyone should learn how to use
git, get used to the new workflow (people's *local* workflow; the global
workflow does not change), etc.  We should help each other getting things
done where needed -- it is stage 4, we want that to go smoothly as well! --
but ideally when GCC 11 opens up everyone has learnt how to use Git
efficiently.

Thank you for this timeline Richard, I support it wholeheartedly.

  - - -  NO MORE SCOPE CREEP  - - -

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-21  9:11   ` Segher Boessenkool
@ 2019-09-21  9:39     ` Andreas Schwab
  2019-09-21  9:51       ` Segher Boessenkool
  0 siblings, 1 reply; 198+ messages in thread
From: Andreas Schwab @ 2019-09-21  9:39 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, Richard Earnshaw (lists), gcc

On Sep 21 2019, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Fri, Sep 20, 2019 at 09:49:36AM -0600, Jeff Law wrote:
>> With the SVN repo going read-only it becomes our fallback plan in case
>> of major unexpected problems.
>> 
>> Joseph's recommendation for having the old objects/refs in the new repo
>> makes a lot of sense. So if it works, it's got my support as well.
>
> That potentially (and probably) makes the repository a lot bigger.

Since all the blobs are identical the overhead will be very small.
There is also the possibility to put the old refs in a different refs
namespace so that they are not cloned by default.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-21  9:39     ` Andreas Schwab
@ 2019-09-21  9:51       ` Segher Boessenkool
  2019-09-21 10:04         ` Andreas Schwab
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-09-21  9:51 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Jeff Law, Richard Earnshaw (lists), gcc

On Sat, Sep 21, 2019 at 11:39:38AM +0200, Andreas Schwab wrote:
> On Sep 21 2019, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > On Fri, Sep 20, 2019 at 09:49:36AM -0600, Jeff Law wrote:
> >> With the SVN repo going read-only it becomes our fallback plan in case
> >> of major unexpected problems.
> >> 
> >> Joseph's recommendation for having the old objects/refs in the new repo
> >> makes a lot of sense. So if it works, it's got my support as well.
> >
> > That potentially (and probably) makes the repository a lot bigger.
> 
> Since all the blobs are identical the overhead will be very small.

Are they though?

> There is also the possibility to put the old refs in a different refs
> namespace so that they are not cloned by default.

Yeah, that is a good point (and a good idea), thanks.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-21  9:51       ` Segher Boessenkool
@ 2019-09-21 10:04         ` Andreas Schwab
  0 siblings, 0 replies; 198+ messages in thread
From: Andreas Schwab @ 2019-09-21 10:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, Richard Earnshaw (lists), gcc

On Sep 21 2019, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Sat, Sep 21, 2019 at 11:39:38AM +0200, Andreas Schwab wrote:
>> On Sep 21 2019, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> > On Fri, Sep 20, 2019 at 09:49:36AM -0600, Jeff Law wrote:
>> >> With the SVN repo going read-only it becomes our fallback plan in case
>> >> of major unexpected problems.
>> >> 
>> >> Joseph's recommendation for having the old objects/refs in the new repo
>> >> makes a lot of sense. So if it works, it's got my support as well.
>> >
>> > That potentially (and probably) makes the repository a lot bigger.
>> 
>> Since all the blobs are identical the overhead will be very small.
>
> Are they though?

The conversion is not supposed to mangle the contents of the files in
the repository.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 17:04 ` Paul Koning
@ 2019-10-25 14:02   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-10-25 14:02 UTC (permalink / raw)
  To: Paul Koning; +Cc: gcc, Maxim Kuvyrkov, Eric S. Raymond

On 19/09/2019 18:04, Paul Koning wrote:
> 
> 
>> On Sep 17, 2019, at 8:02 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> ...
>> So in summary my proposed timetable would be:
>>
>> Monday 16th December 2019 - cut off date for picking which git conversion to use
>>
>> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
>>
>> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.
> 
> That sounds ok but it feels incomplete; there are additional steps and dates needed leading up to the 16th December decision point.
> 
> I would suggest: 1 December 2019: final version of each proposed conversion tool is available, trial conversion repository of the full GCC SVN repository is posted for public examination.
> 
> That allows 2 weeks for the different tools and their output to get the scrutiny needed for the picking decision to be made.  2 weeks may be more than needed (or possibly, less), but in any case I think this piece needs to be called out.

I don't think I want to start setting more hard deadlines.  I'll simply 
say, the decision will be made on that date, based on the evidence we 
have.  It's up to the proponents of the various conversion alternatives 
to be able to demonstrate before that time that their conversion meets 
the required criteria.

I'm not interested in reviewing the tools used, per se.  I am interested 
in ensuring that the conversions produced by them meet our requirements.

Of course, if a sample conversion is only delivered the day before the 
deadline, we may have to decide that that doesn't leave us enough time 
to review it.  But that might depend on what accompanying evidence is 
supplied along with such a conversion.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 14:43   ` Damian Rouson
  2019-09-19 15:30     ` Janne Blomqvist
@ 2019-10-25 14:10     ` Richard Earnshaw (lists)
  2019-10-25 16:32       ` Jeff Law
  1 sibling, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-10-25 14:10 UTC (permalink / raw)
  To: Damian Rouson, Janne Blomqvist; +Cc: gcc mailing list

On 19/09/2019 15:42, Damian Rouson wrote:
> On Thu, Sep 19, 2019 at 5:04 AM Janne Blomqvist <blomqvist.janne@gmail.com>
> wrote:
> 
>>
>> One thing that's unclear to me is how should I actually make my stuff
>> appear in the public repo? Say I want to work on some particular
>> thing:
>>
> 
> This is essentially a git workflow question.  A simple and useful workflow
> to consider is the
> GitHub Flow: https://guides.github.com/introduction/flow/.  Others to
> consider are on the
> GitLab Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html and
> on Atlassian's
> Git Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html.  Where
> will the GCC
> git repository be hosted?
> 
> 
>> 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
>> 2. Then when I'm happy, I send out a patch for review, either manually
>> or with git format-patch + send-email.
>>
> 
> Will GCC allow workflows other than emailing patches?  It could make
> contributing more
> inviting to new developers.   A large community of developers has grown up
> around the
> above workflows and are used to using the related tools.  I realize
> emailing patches
> probably seems simple to GCC developers, but that practice is one of the
> main reasons I
> haven't contributed code to GCC even though I have supported GCC
> development financially
> and I frequently interact with GCC developers. My problems with email have
> been many.
> I have often forgotten to set my emails to plain text so my emails to GCC
> lists bounce and
> I have to resend them (often hours later if I didn't see the bounce right
> away).  When I
> receive patches from GCC developers, I get frustrated with determining what
> -p argument
> to pass when applying the patch. I'm equally daunted with the process of
> searching through
> emails to find related discussions rather than having all the dialogue
> about a pull request
> (which contains the same information as a patch) in one place.  And with
> plain-text emails
> as the medium, I really miss the ability to format dialogues with Markdown,
> including inserting
> hyperlinks but also to tie comments to specific lines of code in a browser
> interface to the
> pull request, etc.
> 
> 3. Patch goes through a few revisions, and is approved.
>> 4. Now what?
>> 4a) Do I merge my private branch to master (err, trunk?), then commit and
>> push?
>>
> 
> It's safer to first merge master into your branch and then retest with all
> the new commits
> that have hit master since you branched.  If you test right after merging
> and find no
> problems (and no new commits hit master while you're testing), then the
> head of your
> branch will reflect the state master will reach when you merge into master
> so you know
> it's safe to do so.
> 
> 
>> 4b) Or do I first rebase my branch on top of the latest master, to
>> produce a slightly less branchy history?
>>
> 
> A lot of people find rebasing to be overly complicated and error-prone
> (with the exception
> of interactive rebasing for the purpose of squashing commits that haven't
> been pushed to
> the remote repository).  The above merging steps are easier at the expense
> of having
> merge commits in the history, which I think is good to better document the
> branching
> history.
> 
> 
>> 4c) Or do I (manually?) apply my patch on master, to create a linear
>> history?
> 
> 
> See above.  I recommend "git merge" over manually applying patches.
> 
> 
>> 4d) Something else entirely?
>>
> 
> A lot of the testing can be automated.  For example, on GitHub, git hooks
> can be set up
> to ensure that if a branch has an open pull request against master (or
> other designated
> branches), tests run for that branch every time a new commit is pushed to
> it.
> 
> Damian
> 

There will be NO changes to the basic workflow at the time of the 
transition, other than those that are strictly required by using git 
instead of SVN (ie you now have to type "git clone" rather than "svn 
checkout" and, for committers, "git push" rather than "svn checkin").

This is not to suggest that at some later date the workflows cannot 
change, but at this point in time the only change will be the underlying 
storage mechanism for the master repository.  As Segher said, NO SCOPE 
CREEP.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-10-25 14:10     ` Richard Earnshaw (lists)
@ 2019-10-25 16:32       ` Jeff Law
  0 siblings, 0 replies; 198+ messages in thread
From: Jeff Law @ 2019-10-25 16:32 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Damian Rouson, Janne Blomqvist; +Cc: gcc mailing list

On 10/25/19 8:10 AM, Richard Earnshaw (lists) wrote:
> On 19/09/2019 15:42, Damian Rouson wrote:
>> On Thu, Sep 19, 2019 at 5:04 AM Janne Blomqvist
>> <blomqvist.janne@gmail.com>
>> wrote:
>>
>>>
>>> One thing that's unclear to me is how should I actually make my stuff
>>> appear in the public repo? Say I want to work on some particular
>>> thing:
>>>
>>
>> This is essentially a git workflow question.  A simple and useful
>> workflow
>> to consider is the
>> GitHub Flow: https://guides.github.com/introduction/flow/.  Others to
>> consider are on the
>> GitLab Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html
>> and
>> on Atlassian's
>> Git Flow page: https://docs.gitlab.com/ee/workflow/gitlab_flow.html. 
>> Where
>> will the GCC
>> git repository be hosted?
>>
>>
>>> 1. git checkout -b pr1234-foo   # A private branch based on latest trunk
>>> 2. Then when I'm happy, I send out a patch for review, either manually
>>> or with git format-patch + send-email.
>>>
>>
>> Will GCC allow workflows other than emailing patches?  It could make
>> contributing more
>> inviting to new developers.   A large community of developers has
>> grown up
>> around the
>> above workflows and are used to using the related tools.  I realize
>> emailing patches
>> probably seems simple to GCC developers, but that practice is one of the
>> main reasons I
>> haven't contributed code to GCC even though I have supported GCC
>> development financially
>> and I frequently interact with GCC developers. My problems with email
>> have
>> been many.
>> I have often forgotten to set my emails to plain text so my emails to GCC
>> lists bounce and
>> I have to resend them (often hours later if I didn't see the bounce right
>> away).  When I
>> receive patches from GCC developers, I get frustrated with determining
>> what
>> -p argument
>> to pass when applying the patch. I'm equally daunted with the process of
>> searching through
>> emails to find related discussions rather than having all the dialogue
>> about a pull request
>> (which contains the same information as a patch) in one place.  And with
>> plain-text emails
>> as the medium, I really miss the ability to format dialogues with
>> Markdown,
>> including inserting
>> hyperlinks but also to tie comments to specific lines of code in a
>> browser
>> interface to the
>> pull request, etc.
>>
>> 3. Patch goes through a few revisions, and is approved.
>>> 4. Now what?
>>> 4a) Do I merge my private branch to master (err, trunk?), then commit
>>> and
>>> push?
>>>
>>
>> It's safer to first merge master into your branch and then retest with
>> all
>> the new commits
>> that have hit master since you branched.  If you test right after merging
>> and find no
>> problems (and no new commits hit master while you're testing), then the
>> head of your
>> branch will reflect the state master will reach when you merge into
>> master
>> so you know
>> it's safe to do so.
>>
>>
>>> 4b) Or do I first rebase my branch on top of the latest master, to
>>> produce a slightly less branchy history?
>>>
>>
>> A lot of people find rebasing to be overly complicated and error-prone
>> (with the exception
>> of interactive rebasing for the purpose of squashing commits that haven't
>> been pushed to
>> the remote repository).  The above merging steps are easier at the
>> expense
>> of having
>> merge commits in the history, which I think is good to better document
>> the
>> branching
>> history.
>>
>>
>>> 4c) Or do I (manually?) apply my patch on master, to create a linear
>>> history?
>>
>>
>> See above.  I recommend "git merge" over manually applying patches.
>>
>>
>>> 4d) Something else entirely?
>>>
>>
>> A lot of the testing can be automated.  For example, on GitHub, git hooks
>> can be set up
>> to ensure that if a branch has an open pull request against master (or
>> other designated
>> branches), tests run for that branch every time a new commit is pushed to
>> it.
>>
>> Damian
>>
> 
> There will be NO changes to the basic workflow at the time of the
> transition, other than those that are strictly required by using git
> instead of SVN (ie you now have to type "git clone" rather than "svn
> checkout" and, for committers, "git push" rather than "svn checkin").
> 
> This is not to suggest that at some later date the workflows cannot
> change, but at this point in time the only change will be the underlying
> storage mechanism for the master repository.  As Segher said, NO SCOPE
> CREEP.
Agreed.  Let's deal with the conversion and only the conversion.  Any
discussion about changing the workflows should be deferred until after
the conversion is done.

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-09-19 15:35 ` Maxim Kuvyrkov
@ 2019-12-06 14:44   ` Maxim Kuvyrkov
  2019-12-06 17:21     ` Eric S. Raymond
  2019-12-16  9:53     ` Mark Wielaard
  0 siblings, 2 replies; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-06 14:44 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Earnshaw (lists), gcc

> On Sep 19, 2019, at 6:34 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
>> On Sep 17, 2019, at 3:02 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>> 
>> At the Cauldron this weekend the overwhelming view for the move to GIT soon was finally expressed.
>> 
> ...
>> 
>> So in summary my proposed timetable would be:
>> 
>> Monday 16th December 2019 - cut off date for picking which git conversion to use
>> 
>> Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
>> 
>> Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.
>> 
>> Doing this over the new year holiday period has both advantages and disadvantages.  On the one hand the traffic is light, so the impact to most developers will be quite low; on the other, it is a holiday period, so getting the right key folk to help might be difficult.  I won't object strongly if others feel that slipping a few days (but not weeks) would make things significantly easier.
> 
> The timetable looks entirely reasonable to me.
> 
> I have regenerated my primary version this week, and it's up at https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .  So far I have received only minor issue reports about it, and all known problems have been fixed.  I could use a bit more scrutiny :-).

I think now is a good time to give status update on the svn->git conversion I maintain.  See https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .

1. The conversion has all SVN live branches converted as branches under refs/heads/* .

2. The conversion has all SVN live tags converted as annotated tags under refs/tags/* .

3. If desired, it would be trivial to add all deleted / leaf SVN branches and tags.  They would be named as branches/my-deleted-branch@12345, where @12345 is the revision at which the branch was deleted.  Branches created and deleted multiple times would have separate entries corresponding to delete revisions.

4. Git committer and git author entries are very accurate (imo, better than reposurgeon's, but I'm biased).  Developers' names and email addresses are mined from commit logs, changelogs and source code and have historically-accurately attributions to employer's email addresses.

5. Since there is interest in reparenting branches to fix cvs2svn merge issues, I've added this feature to my scripts as well (turned out to be trivial).  I'll keep the original gcc-pretty.git repo intact and will upload the new one at https://git.linaro.org/people/maxim-kuvyrkov/gcc-reparent.git/  -- should be live by Monday.

Finally, there seems to be quite a few misunderstandings about the scripts I've developed and their limitations.  Most of these misunderstanding stem from assumption that all git-svn limitations must apply to my scripts.  That's not the case.  SVN merges, branch/tag reparenting, adjusting of commit logs are all handled correctly in my scripts.  I welcome criticism with pointers to revisions which have been incorrectly converted.

The general conversion workflow is (this really is a poor-man's translator of one DAG into another):

1. Parse SVN history of entire SVN root (svn log -qv file:///svnrepo/) and build a list of branch points.
2. From the branch points build a DAG of "basic blocks" of revision history.  Each basic block is a consecutive set of commits where only the last commit can be a branchpoint.
3. Walk the DAG and ...
4. ... use git-svn to individually convert these basic blocks.
4a. Optionally, post-process git result of basic block conversion using "git filter-branch" and similar tools.

Git-svn is used in a limited role, and it does its job very well in this role.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 14:44   ` Maxim Kuvyrkov
@ 2019-12-06 17:21     ` Eric S. Raymond
  2019-12-06 17:39       ` Richard Biener
  2019-12-06 20:49       ` Bernd Schmidt
  2019-12-16  9:53     ` Mark Wielaard
  1 sibling, 2 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-06 17:21 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Earnshaw (lists), gcc

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>:
> The general conversion workflow is (this really is a poor-man's translator of one DAG into another):
> 
> 1. Parse SVN history of entire SVN root (svn log -qv file:///svnrepo/) and build a list of branch points.
> 2. From the branch points build a DAG of "basic blocks" of revision history.  Each basic block is a consecutive set of commits where only the last commit can be a branchpoint.
> 3. Walk the DAG and ...
> 4. ... use git-svn to individually convert these basic blocks.
> 4a. Optionally, post-process git result of basic block conversion using "git filter-branch" and similar tools.
> 
> Git-svn is used in a limited role, and it does its job very well in this role.

Your approach sounds pretty reasonable except for that part. I don't
trust git-svn at *all* - I've collided with it too often during
past conversions.  It has a nasty habit of leaving damage in places
that are difficult to audit.

I agree that you've made a best possible effort to avod being bitten
by using it only for basic blocks. That was clever and the right thing
to do, and I *still* don't trust it.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 17:21     ` Eric S. Raymond
@ 2019-12-06 17:39       ` Richard Biener
  2019-12-06 19:46         ` Eric S. Raymond
  2019-12-06 20:49       ` Bernd Schmidt
  1 sibling, 1 reply; 198+ messages in thread
From: Richard Biener @ 2019-12-06 17:39 UTC (permalink / raw)
  To: esr, Eric S. Raymond, Maxim Kuvyrkov; +Cc: Richard Earnshaw (lists), gcc

On December 6, 2019 6:21:11 PM GMT+01:00, "Eric S. Raymond" <esr@thyrsus.com> wrote:
>Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>:
>> The general conversion workflow is (this really is a poor-man's
>translator of one DAG into another):
>> 
>> 1. Parse SVN history of entire SVN root (svn log -qv
>file:///svnrepo/) and build a list of branch points.
>> 2. From the branch points build a DAG of "basic blocks" of revision
>history.  Each basic block is a consecutive set of commits where only
>the last commit can be a branchpoint.
>> 3. Walk the DAG and ...
>> 4. ... use git-svn to individually convert these basic blocks.
>> 4a. Optionally, post-process git result of basic block conversion
>using "git filter-branch" and similar tools.
>> 
>> Git-svn is used in a limited role, and it does its job very well in
>this role.
>
>Your approach sounds pretty reasonable except for that part. I don't
>trust git-svn at *all* - I've collided with it too often during
>past conversions.  It has a nasty habit of leaving damage in places
>that are difficult to audit.
>
>I agree that you've made a best possible effort to avod being bitten
>by using it only for basic blocks. That was clever and the right thing
>to do, and I *still* don't trust it.

To me, looking from the outside, the talks about reposurgeon doing damage and a rewrite (in the last minute) would fix it doesn't make a trustworthy appearance either ;) 

I guess the basic block usage could be emulated by svn checkouts, svn log and manual diffing and installing revs on the git. And I can't really imagine how that cannot work with git-svn given it is used in the wild. 

Richard. 


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 17:39       ` Richard Biener
@ 2019-12-06 19:46         ` Eric S. Raymond
  2019-12-06 20:43           ` Sandra Loosemore
                             ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-06 19:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

Richard Biener <richard.guenther@gmail.com>:
> To me, looking from the outside, the talks about reposurgeon doing damage and a rewrite (in the last minute) would fix it doesn't make a trustworthy appearance either ;) 

*shrug* Hard problems are hard.

Every time I do a conversion that is at a record size I have to
rebuild parts of the analyzer, because the problem domain is seriously
gnarly. I'm having to rebuild more than usual this time because the
GCC repo is a monster that stresses the analyzer in particularly
unusual ways.

Reposurgeon has been used for several major conversions, including groff and Emacs.  
I don't mean to be nasty to Maxim, but I have not yet seen *anybody* who thought they
could get the job done with ad-hoc scripts turn out to be correct.  Unfortunately,
the costs of failure are often well-hidden problems in the converted history
that people trip over months and years later.

Experience matters at this.  So does staying away from tools like git-svn that
are known to be bad.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 19:46         ` Eric S. Raymond
@ 2019-12-06 20:43           ` Sandra Loosemore
  2019-12-07  2:57           ` Segher Boessenkool
  2019-12-09 18:19           ` Joseph Myers
  2 siblings, 0 replies; 198+ messages in thread
From: Sandra Loosemore @ 2019-12-06 20:43 UTC (permalink / raw)
  To: esr, Richard Biener; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On 12/6/19 12:46 PM, Eric S. Raymond wrote:
> Richard Biener <richard.guenther@gmail.com>:
>> To me, looking from the outside, the talks about reposurgeon doing damage and a rewrite (in the last minute) would fix it doesn't make a trustworthy appearance either ;)
> 
> *shrug* Hard problems are hard.
> 
> Every time I do a conversion that is at a record size I have to
> rebuild parts of the analyzer, because the problem domain is seriously
> gnarly. I'm having to rebuild more than usual this time because the
> GCC repo is a monster that stresses the analyzer in particularly
> unusual ways.
> 
> Reposurgeon has been used for several major conversions, including groff and Emacs.
> I don't mean to be nasty to Maxim, but I have not yet seen *anybody* who thought they
> could get the job done with ad-hoc scripts turn out to be correct.  Unfortunately,
> the costs of failure are often well-hidden problems in the converted history
> that people trip over months and years later.
> 
> Experience matters at this.  So does staying away from tools like git-svn that
> are known to be bad.

I have nothing useful to contribute regarding the actual mechanics of 
the repository conversion (I'm a total dummy about the internals of both 
git and svn and stick with only the most basic usages in my daily work), 
but from a software engineering and project management perspective....

I'm also put off by the talk of having to do last-minute rewrites of a 
massively complex project.  [Insert image of prehistoric animals trapped 
in tar pit here.]  Shouldn't it be possible to *test* whether Maxim's 
git-svn conversion is correct, e.g. by diffing the git and svn versions 
at appropriate places in the history, or comparing revision histories of 
each file at branch tips, or something like that?  Instead of just 
asserting that it's full of bugs, without any evidence either way?  I'd 
expect that the same testing would need to be performed on the 
reposurgeon version in order to have any confidence that it is any less 
buggy.  Do we have any volunteers who could independently work on QA of 
whatever git repository we end up with?

-Sandra

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 17:21     ` Eric S. Raymond
  2019-12-06 17:39       ` Richard Biener
@ 2019-12-06 20:49       ` Bernd Schmidt
  1 sibling, 0 replies; 198+ messages in thread
From: Bernd Schmidt @ 2019-12-06 20:49 UTC (permalink / raw)
  To: esr, Maxim Kuvyrkov; +Cc: Richard Earnshaw (lists), gcc

On 12/6/19 6:21 PM, Eric S. Raymond wrote:

> Your approach sounds pretty reasonable except for that part. I don't
> trust git-svn at *all* - I've collided with it too often during
> past conversions.  It has a nasty habit of leaving damage in places
> that are difficult to audit.

So, which steps are we taking to ensure such damage does not occur with 
either method of conversion? Do we have any verification scripts already?


Bernd

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 19:46         ` Eric S. Raymond
  2019-12-06 20:43           ` Sandra Loosemore
@ 2019-12-07  2:57           ` Segher Boessenkool
  2019-12-09 18:19           ` Joseph Myers
  2 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-07  2:57 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Richard Biener, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Fri, Dec 06, 2019 at 02:46:04PM -0500, Eric S. Raymond wrote:
> Experience matters at this.  So does staying away from tools like git-svn that
> are known to be bad.

git-svn is an excellent tool, if you use it for something it is fit for.
And that is what Maxim did.  Knowing what tool to use how when and where
and how is what experience is.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 19:46         ` Eric S. Raymond
  2019-12-06 20:43           ` Sandra Loosemore
  2019-12-07  2:57           ` Segher Boessenkool
@ 2019-12-09 18:19           ` Joseph Myers
  2019-12-09 18:40             ` Bernd Schmidt
                               ` (2 more replies)
  2 siblings, 3 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-09 18:19 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Richard Biener, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Fri, 6 Dec 2019, Eric S. Raymond wrote:

> Reposurgeon has been used for several major conversions, including groff 
> and Emacs.  I don't mean to be nasty to Maxim, but I have not yet seen 
> *anybody* who thought they could get the job done with ad-hoc scripts 
> turn out to be correct.  Unfortunately, the costs of failure are often 
> well-hidden problems in the converted history that people trip over 
> months and years later.

I think the ad hoc script is the risk factor here as much as the fact that 
the ad hoc script makes limited use of git-svn.

For any conversion we're clearly going to need to run various validation 
(comparing properties of the converted repository, such as contents at 
branch tips, with expected values of those properties based on the SVN 
repository) and fix issues shown up by that validation.  reposurgeon has 
its own tools for such validation; I also intend to write some validation 
scripts myself.  And clearly we need to fix issues shown up by such 
validation - that's what various recent reposurgeon issues Richard and I 
have reported are about, fixing the most obvious issues that show up, 
which in turn will enable running more detailed validation.

The main risks are about issues that are less obvious in validation and so 
don't get fixed in that process.  There, if you're using an ad hoc script, 
the risks are essentially unknown.  But using a known conversion tool with 
an extensive testsuite, such as reposurgeon, gives confidence based on 
reposurgeon passing its own testsuite (once the SVN dump reader rewrite 
does so) that a wide range of potential conversion bugs, that might appear 
without showing up in the kinds of validation people try, are less likely 
because of all the regression tests for conversion issues seen in past 
conversions.  When using an ad hoc script specific to one conversion you 
lose that confidence that comes from a conversion tool having been used in 
previous conversions and having tests to ensure bugs found in those 
conversions don't come back.

I think we should fix whatever the remaining relevant bugs are in 
reposurgeon and do the conversion with reposurgeon being used to read and 
convert the SVN history and do any desired surgical operations on it.

Ad hoc scripts identifying specific proposed local changes to the 
repository content, such as the proposed commit message improvements from 
Richard or my branch parent fixes, to be performed with reposurgeon, seem 
a lot safer than ad hoc code doing the conversion itself.  And for 
validation, the more validation scripts people come up with the better.  
If anyone has or wishes to write custom scripts to analyze the SVN 
repository branch structure and turn that into verifiable assertions about 
what a git conversion should look like, rather than into directly 
generating a git repository or doing surgery on history, that helps us 
check a reposurgeon-converted repository in areas that might be 
problematic - and in that case it's OK for the custom script to have 
unknown bugs because issues it shows up are just pointing out places where 
the converted repository needs checking more carefully to decide whether 
there is a conversion bug or not.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-09 18:19           ` Joseph Myers
@ 2019-12-09 18:40             ` Bernd Schmidt
  2019-12-09 20:45               ` Joseph Myers
  2019-12-09 22:12               ` Eric S. Raymond
  2019-12-09 19:28             ` Eric S. Raymond
  2019-12-11 14:40             ` Maxim Kuvyrkov
  2 siblings, 2 replies; 198+ messages in thread
From: Bernd Schmidt @ 2019-12-09 18:40 UTC (permalink / raw)
  To: Joseph Myers, Eric S. Raymond
  Cc: Richard Biener, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On 12/9/19 7:19 PM, Joseph Myers wrote:
> 
> For any conversion we're clearly going to need to run various validation
> (comparing properties of the converted repository, such as contents at
> branch tips, with expected values of those properties based on the SVN
> repository) and fix issues shown up by that validation.  reposurgeon has
> its own tools for such validation; I also intend to write some validation
> scripts myself.

Would it be feasible to require that both conversions produce the same 
output repository to some degree? Can we just look at release tags and 
require that they have the same hash in both conversions, or are there 
good reasons why the two would produce different outputs?



Bernd

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-09 18:19           ` Joseph Myers
  2019-12-09 18:40             ` Bernd Schmidt
@ 2019-12-09 19:28             ` Eric S. Raymond
  2019-12-11 14:40             ` Maxim Kuvyrkov
  2 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-09 19:28 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Biener, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

Joseph Myers <joseph@codesourcery.com>:
> I think we should fix whatever the remaining relevant bugs are in 
> reposurgeon and do the conversion with reposurgeon being used to read and 
> convert the SVN history and do any desired surgical operations on it.

On behalf of the reposurgeon crew - Julien Rivaud, Daniel Brooks, and myself -
we thank you for that expression of confidence.

We'll do our damnedest to deliver rapidly.  We welcome oversight and
discussion at #reposurgeon on freenode, because we're just the mechanics.
You guys have to make the policy decisions.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-09 18:40             ` Bernd Schmidt
@ 2019-12-09 20:45               ` Joseph Myers
  2019-12-09 22:12               ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-09 20:45 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Eric S. Raymond, Richard Biener, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 9 Dec 2019, Bernd Schmidt wrote:

> On 12/9/19 7:19 PM, Joseph Myers wrote:
> > 
> > For any conversion we're clearly going to need to run various validation
> > (comparing properties of the converted repository, such as contents at
> > branch tips, with expected values of those properties based on the SVN
> > repository) and fix issues shown up by that validation.  reposurgeon has
> > its own tools for such validation; I also intend to write some validation
> > scripts myself.
> 
> Would it be feasible to require that both conversions produce the same output
> repository to some degree? Can we just look at release tags and require that
> they have the same hash in both conversions, or are there good reasons why the
> two would produce different outputs?

The same hashes are not practical.  There are several areas where two 
perfectly correct conversions are still expected to have different 
contents because of subjective decisions and heuristics involved in the 
conversion.

If some alternative heuristic is found to be clearly better than an 
existing one in reposurgeon, so that it would be better for any project 
converting with reposurgeon, or if some preference in the GCC case can 
readily be represented as a configuration option to choose between 
different approaches, it makes sense to implement the improvements in 
reposurgeon so that any project with similar issues can benefit.  For 
example, see Richard's suggestions in reposurgeon issue 174 of two 
possible improvements to ChangeLog handling: disregarding ChangeLog data 
if a commit adds multiple ChangeLog entries by different authors, and 
specifing a wildcard to allow ChangeLog processing on ChangeLog* files to 
cover ChangeLog.<branch>.  GCC is hardly the last project converting from 
SVN to git, so we can benefit from the experiences of past conversions, 
and help contribute to having useful features available for future 
conversions.

Here are some cases for differences between two correct conversions:

* Tree contents should mostly be identical at any given commit, but 
reposurgeon deliberately produces a .gitignore with contents based on 
svn:ignore if the SVN tree contents don't have a .gitignore (we use 
--user-ignores to prefer the .gitignore file in SVN if it exists), and 
removes any .cvsignore file.

* The first parent of a commit should typically be the same between 
conversions, but (a) might be corrected in some way for cvs2svn issues, 
(b) might skip SVN commits that would translate into empty git commits, 
depending on the choices made for handling of such commits.

* Cases that give rise to no tree changes in a commit (which thus might 
not become a git commit at all depending on the choices made and whether 
they also don't change any merge information properties) include (a) 
branch or tag creation as an exact copy of some revision of some branch, 
(b) branch recreation as a copy, e.g. when trunk was deleted accidentally, 
(c) commits that in SVN only add or remove empty directories, as git does 
not store empty directories, (d) commits that in SVN just remove some file 
or directory and replace it with a copy from some revision of some branch 
that happens to have identical contents to the file or directory removed 
(yes, we do have commits like that in GCC SVN).

* Subsequent parents of a commit based on merge info handling may well 
have subjective differences between correct conversions.

* Commit messages might differ, both because of heuristics to improve 
them, like Richard's work on that, and because of different choices for 
how to represent the SVN revision number information in commit messages.

* Author and committer identifications, and commit timestamps (especially 
timezones, something git has, SVN doesn't and reposurgeon has a per-author 
map for) may vary because of different heuristics or author maps used, 
especially when there is no ChangeLog entry for a commit or the ChangeLog 
entry is in some way malformed or the commit adds ChangeLog entries for 
multiple changes with different authors.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-09 18:40             ` Bernd Schmidt
  2019-12-09 20:45               ` Joseph Myers
@ 2019-12-09 22:12               ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-09 22:12 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Joseph Myers, Richard Biener, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Bernd Schmidt <bernds_cb1@t-online.de>:
> On 12/9/19 7:19 PM, Joseph Myers wrote:
> > 
> > For any conversion we're clearly going to need to run various validation
> > (comparing properties of the converted repository, such as contents at
> > branch tips, with expected values of those properties based on the SVN
> > repository) and fix issues shown up by that validation.  reposurgeon has
> > its own tools for such validation; I also intend to write some validation
> > scripts myself.
> 
> Would it be feasible to require that both conversions produce the same
> output repository to some degree? Can we just look at release tags and
> require that they have the same hash in both conversions, or are there good
> reasons why the two would produce different outputs?

There are a couple of areas that could produce divergences.

One is the part of the history before SVN was adopted. There's a lot of 
weird junk back there, artifacts from the cvs2svn conversion, that can produce
issues like fundamntal uncertainty about where a child branch should actually be
rooted on its parent.  Reposurgeon makes choices that are a-priori reasonable
in cases of doubt, but there are edge cases where a different conversion pipeline
could make different ones.

Another is how to translate tags. I don't know what Maxim's scripts do, but 
under reposurgeon a copy commit can have one of two dispositions:

(1) Become a lightweight tag (git reference) if the tag comment looks like 
it was autogenerated and carries no real information.

(2) Become a git annotated tag if we want to preserve the tag metadata (comment,
date stamp)

There's room for a certain amount of artistic license here.

Most conversions have few enough disputable cases that the differences between
renderings can be reviewed by eyeball. I'm not going to bet that will be true
of this one.  At the scale of this conversion, any form of comparative auditing
is pretty hopeless.  You get your assurance, if you get it, from believing
the correctness of the conversion tool.

Which is a major reason that reposurgeon has a *large* test suite. 98
general operations tests, 55 Subversion test dumps including a rogue's
gallery of metadata perversions gathered from pervious conversions,
and a cloud of surrounding auxiliary checks.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-09 18:19           ` Joseph Myers
  2019-12-09 18:40             ` Bernd Schmidt
  2019-12-09 19:28             ` Eric S. Raymond
@ 2019-12-11 14:40             ` Maxim Kuvyrkov
  2019-12-11 15:03               ` Richard Earnshaw (lists)
  2 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-11 14:40 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Eric S. Raymond, Richard Guenther, Richard Earnshaw (lists), gcc

> On Dec 9, 2019, at 9:19 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Fri, 6 Dec 2019, Eric S. Raymond wrote:
> 
>> Reposurgeon has been used for several major conversions, including groff 
>> and Emacs.  I don't mean to be nasty to Maxim, but I have not yet seen 
>> *anybody* who thought they could get the job done with ad-hoc scripts 
>> turn out to be correct.  Unfortunately, the costs of failure are often 
>> well-hidden problems in the converted history that people trip over 
>> months and years later.
> 
> I think the ad hoc script is the risk factor here as much as the fact that 
> the ad hoc script makes limited use of git-svn.
> 
> For any conversion we're clearly going to need to run various validation 
> (comparing properties of the converted repository, such as contents at 
> branch tips, with expected values of those properties based on the SVN 
> repository) and fix issues shown up by that validation.  reposurgeon has 
> its own tools for such validation; I also intend to write some validation 
> scripts myself.  And clearly we need to fix issues shown up by such 
> validation - that's what various recent reposurgeon issues Richard and I 
> have reported are about, fixing the most obvious issues that show up, 
> which in turn will enable running more detailed validation.
> 
> The main risks are about issues that are less obvious in validation and so 
> don't get fixed in that process.  There, if you're using an ad hoc script, 
> the risks are essentially unknown.  But using a known conversion tool with 
> an extensive testsuite, such as reposurgeon, gives confidence based on 
> reposurgeon passing its own testsuite (once the SVN dump reader rewrite 
> does so) that a wide range of potential conversion bugs, that might appear 
> without showing up in the kinds of validation people try, are less likely 
> because of all the regression tests for conversion issues seen in past 
> conversions.  When using an ad hoc script specific to one conversion you 
> lose that confidence that comes from a conversion tool having been used in 
> previous conversions and having tests to ensure bugs found in those 
> conversions don't come back.
> 
> I think we should fix whatever the remaining relevant bugs are in 
> reposurgeon and do the conversion with reposurgeon being used to read and 
> convert the SVN history and do any desired surgical operations on it.
> 
> Ad hoc scripts identifying specific proposed local changes to the 
> repository content, such as the proposed commit message improvements from 
> Richard or my branch parent fixes, to be performed with reposurgeon, seem 
> a lot safer than ad hoc code doing the conversion itself.  And for 
> validation, the more validation scripts people come up with the better.  
> If anyone has or wishes to write custom scripts to analyze the SVN 
> repository branch structure and turn that into verifiable assertions about 
> what a git conversion should look like, rather than into directly 
> generating a git repository or doing surgery on history, that helps us 
> check a reposurgeon-converted repository in areas that might be 
> problematic - and in that case it's OK for the custom script to have 
> unknown bugs because issues it shows up are just pointing out places where 
> the converted repository needs checking more carefully to decide whether 
> there is a conversion bug or not.

Firstly, I am not going to defend my svn-git-* scripts or the git-svn tool they are using.  They are likely to have bugs and problems.  I am, though, going to defend the conversion that these tools produced.  No matter the conversion tool, all that matters is the final result.  I have asked many times to scrutinize the git repository that I have uploaded several months ago and to point out any artifacts or mistakes.  Surely, it can't be hard for one to find a mistake or two in my converted repository by comparing it against any other /better/ repository that one has.

[FWIW, I am going to privately compare reposurgeon-generated repo that Richard E. uploaded against my repo.  The results of such comparison can appear biased, so I'm not planning to publish them.]

Secondly, the GCC community has overwhelmingly supported move to git, and in private conversations many developers have expressed the same view:

1. all we care about is history of trunk and recent release branches
2. current gcc-mirror is really all we need
3. having vendor branches and author info would be nice, but not so nice as to delay the switch any longer

Granted, the above is not the /official/ consensus of GCC community, and I don't want to represent it as such.  However, it is equally not the consensus of GCC community to delay the switch to git until we have a confirmed perfect repo.

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 14:40             ` Maxim Kuvyrkov
@ 2019-12-11 15:03               ` Richard Earnshaw (lists)
  2019-12-11 15:19                 ` Jonathan Wakely
  0 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-11 15:03 UTC (permalink / raw)
  To: Maxim Kuvyrkov, Joseph S. Myers; +Cc: Eric S. Raymond, Richard Guenther, gcc

On 11/12/2019 14:40, Maxim Kuvyrkov wrote:
>> On Dec 9, 2019, at 9:19 PM, Joseph Myers <joseph@codesourcery.com> wrote:
>>
>> On Fri, 6 Dec 2019, Eric S. Raymond wrote:
>>
>>> Reposurgeon has been used for several major conversions, including groff 
>>> and Emacs.  I don't mean to be nasty to Maxim, but I have not yet seen 
>>> *anybody* who thought they could get the job done with ad-hoc scripts 
>>> turn out to be correct.  Unfortunately, the costs of failure are often 
>>> well-hidden problems in the converted history that people trip over 
>>> months and years later.
>>
>> I think the ad hoc script is the risk factor here as much as the fact that 
>> the ad hoc script makes limited use of git-svn.
>>
>> For any conversion we're clearly going to need to run various validation 
>> (comparing properties of the converted repository, such as contents at 
>> branch tips, with expected values of those properties based on the SVN 
>> repository) and fix issues shown up by that validation.  reposurgeon has 
>> its own tools for such validation; I also intend to write some validation 
>> scripts myself.  And clearly we need to fix issues shown up by such 
>> validation - that's what various recent reposurgeon issues Richard and I 
>> have reported are about, fixing the most obvious issues that show up, 
>> which in turn will enable running more detailed validation.
>>
>> The main risks are about issues that are less obvious in validation and so 
>> don't get fixed in that process.  There, if you're using an ad hoc script, 
>> the risks are essentially unknown.  But using a known conversion tool with 
>> an extensive testsuite, such as reposurgeon, gives confidence based on 
>> reposurgeon passing its own testsuite (once the SVN dump reader rewrite 
>> does so) that a wide range of potential conversion bugs, that might appear 
>> without showing up in the kinds of validation people try, are less likely 
>> because of all the regression tests for conversion issues seen in past 
>> conversions.  When using an ad hoc script specific to one conversion you 
>> lose that confidence that comes from a conversion tool having been used in 
>> previous conversions and having tests to ensure bugs found in those 
>> conversions don't come back.
>>
>> I think we should fix whatever the remaining relevant bugs are in 
>> reposurgeon and do the conversion with reposurgeon being used to read and 
>> convert the SVN history and do any desired surgical operations on it.
>>
>> Ad hoc scripts identifying specific proposed local changes to the 
>> repository content, such as the proposed commit message improvements from 
>> Richard or my branch parent fixes, to be performed with reposurgeon, seem 
>> a lot safer than ad hoc code doing the conversion itself.  And for 
>> validation, the more validation scripts people come up with the better.  
>> If anyone has or wishes to write custom scripts to analyze the SVN 
>> repository branch structure and turn that into verifiable assertions about 
>> what a git conversion should look like, rather than into directly 
>> generating a git repository or doing surgery on history, that helps us 
>> check a reposurgeon-converted repository in areas that might be 
>> problematic - and in that case it's OK for the custom script to have 
>> unknown bugs because issues it shows up are just pointing out places where 
>> the converted repository needs checking more carefully to decide whether 
>> there is a conversion bug or not.
> 
> Firstly, I am not going to defend my svn-git-* scripts or the git-svn tool they are using.  They are likely to have bugs and problems.  I am, though, going to defend the conversion that these tools produced.  No matter the conversion tool, all that matters is the final result.  I have asked many times to scrutinize the git repository that I have uploaded several months ago and to point out any artifacts or mistakes.  Surely, it can't be hard for one to find a mistake or two in my converted repository by comparing it against any other /better/ repository that one has.
> 
> [FWIW, I am going to privately compare reposurgeon-generated repo that Richard E. uploaded against my repo.  The results of such comparison can appear biased, so I'm not planning to publish them.]
> 

I wouldn't bother with that.  There are known defects in the version of
reposurgeon that I used to produce that which have since been fixed.  It
was *never* the point of that upload to ask for correctness checks on
the conversion (I said so at the time).  Instead it was intended to
demonstrate the improvements to the commit summaries that I think we can
make.

R.

> Secondly, the GCC community has overwhelmingly supported move to git, and in private conversations many developers have expressed the same view:
> 
> 1. all we care about is history of trunk and recent release branches
> 2. current gcc-mirror is really all we need
> 3. having vendor branches and author info would be nice, but not so nice as to delay the switch any longer
> 
> Granted, the above is not the /official/ consensus of GCC community, and I don't want to represent it as such.  However, it is equally not the consensus of GCC community to delay the switch to git until we have a confirmed perfect repo.
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:03               ` Richard Earnshaw (lists)
@ 2019-12-11 15:19                 ` Jonathan Wakely
  2019-12-11 15:21                   ` Richard Earnshaw (lists)
                                     ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Jonathan Wakely @ 2019-12-11 15:19 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, Joseph S. Myers, Eric S. Raymond, Richard Guenther, gcc

On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
> I wouldn't bother with that.  There are known defects in the version of
> reposurgeon that I used to produce that which have since been fixed.  It
> was *never* the point of that upload to ask for correctness checks on
> the conversion (I said so at the time).  Instead it was intended to
> demonstrate the improvements to the commit summaries that I think we can
> make.

My concern is that there is no conversion done using reposurgeon that
*can* be used to do correctness checks.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:19                 ` Jonathan Wakely
@ 2019-12-11 15:21                   ` Richard Earnshaw (lists)
  2019-12-11 15:36                     ` Joseph Myers
  2019-12-11 15:30                   ` Dennis Luehring
  2019-12-11 17:36                   ` Eric S. Raymond
  2 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-11 15:21 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Maxim Kuvyrkov, Joseph S. Myers, Eric S. Raymond, Richard Guenther, gcc

On 11/12/2019 15:19, Jonathan Wakely wrote:
> On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
>> I wouldn't bother with that.  There are known defects in the version of
>> reposurgeon that I used to produce that which have since been fixed.  It
>> was *never* the point of that upload to ask for correctness checks on
>> the conversion (I said so at the time).  Instead it was intended to
>> demonstrate the improvements to the commit summaries that I think we can
>> make.
> 
> My concern is that there is no conversion done using reposurgeon that
> *can* be used to do correctness checks.
> 

I have concerns too, but I'm in contact with the reposurgeon guys and
progress *is* being made.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:19                 ` Jonathan Wakely
  2019-12-11 15:21                   ` Richard Earnshaw (lists)
@ 2019-12-11 15:30                   ` Dennis Luehring
  2019-12-11 15:36                     ` Richard Earnshaw
  2019-12-11 17:36                   ` Eric S. Raymond
  2 siblings, 1 reply; 198+ messages in thread
From: Dennis Luehring @ 2019-12-11 15:30 UTC (permalink / raw)
  To: gcc

the differences between Maxim and Erics final result will hopefully show
the open bugs in both tools
and allow fixing - i think this compare phase is needed if the result
should be the best possible

Am 11.12.2019 um 16:19 schrieb Jonathan Wakely:
> On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
> > I wouldn't bother with that.  There are known defects in the version of
> > reposurgeon that I used to produce that which have since been fixed.  It
> > was *never* the point of that upload to ask for correctness checks on
> > the conversion (I said so at the time).  Instead it was intended to
> > demonstrate the improvements to the commit summaries that I think we can
> > make.
>
> My concern is that there is no conversion done using reposurgeon that
> *can* be used to do correctness checks.


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:30                   ` Dennis Luehring
@ 2019-12-11 15:36                     ` Richard Earnshaw
  0 siblings, 0 replies; 198+ messages in thread
From: Richard Earnshaw @ 2019-12-11 15:36 UTC (permalink / raw)
  To: Dennis Luehring, gcc

On 11/12/2019 15:30, Dennis Luehring wrote:
> the differences between Maxim and Erics final result will hopefully show
> the open bugs in both tools
> and allow fixing - i think this compare phase is needed if the result
> should be the best possible
> 

I don't disagree.  But I don't think it's as simple as just comparing
the tips in reality.  Though that is certainly a major part of it.

One of the things I've discovered while working on this is that subtle
errors in recovering the history properly can lead to "git blame"
hitting the wrong path and thus giving confusing answers.  Most of those
problems date back to the CVS era, but they are there and they probably
will show through in the final conversion if we don't get it right, even
if they appear to be ancient history.

Having to go back to the SVN repos to do archaeology will be painful,
especially as SVN becomes less and less used by developers.

R.

PS. I'm not trying to suggest that Maxim's conversion has got this
wrong.  Just that it is an issue that has come to light as I've been
working on this, and I do know that the plain git svn conversion *is*
wrong in this area.

> Am 11.12.2019 um 16:19 schrieb Jonathan Wakely:
>> On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
>> > I wouldn't bother with that.Â  There are known defects in the version of
>> > reposurgeon that I used to produce that which have since been
>> fixed.Â  It
>> > was *never* the point of that upload to ask for correctness checks on
>> > the conversion (I said so at the time).Â  Instead it was intended to
>> > demonstrate the improvements to the commit summaries that I think we
>> can
>> > make.
>>
>> My concern is that there is no conversion done using reposurgeon that
>> *can* be used to do correctness checks.
> 
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:21                   ` Richard Earnshaw (lists)
@ 2019-12-11 15:36                     ` Joseph Myers
  2019-12-11 16:02                       ` Jonathan Wakely
  2019-12-16  2:19                       ` Joseph Myers
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-11 15:36 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Jonathan Wakely, Maxim Kuvyrkov, Eric S. Raymond, Richard Guenther, gcc

On Wed, 11 Dec 2019, Richard Earnshaw (lists) wrote:

> On 11/12/2019 15:19, Jonathan Wakely wrote:
> > On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
> >> I wouldn't bother with that.  There are known defects in the version of
> >> reposurgeon that I used to produce that which have since been fixed.  It
> >> was *never* the point of that upload to ask for correctness checks on
> >> the conversion (I said so at the time).  Instead it was intended to
> >> demonstrate the improvements to the commit summaries that I think we can
> >> make.
> > 
> > My concern is that there is no conversion done using reposurgeon that
> > *can* be used to do correctness checks.
> > 
> 
> I have concerns too, but I'm in contact with the reposurgeon guys and
> progress *is* being made.

Concretely: when I did a comparison of the tip of trunk against master 
from a reposurgeon conversion on 29 November, there were 1421 differences 
(files or directories only present in one of SVN or git or with different 
contents).  As of today with the SVN dump reader rewrite, this is down to 
just two differences (plus two empty directories present in SVN as git 
doesn't represent empty directories), and we understand exactly where the 
problem arises with a trunk deletion and recreation and what's odd about 
that particular trunk deletion and recreation.  All the deleted tags and 
branches are now placed neatly in refs/deleted/; we no longer have any 
problems with deleted tags or branches wrongly appearing in the main tag 
and branch namespaces; all the mess with deleted branches appearing in the 
reposurgeon-generated "root" branch has gone, everything there now appears 
to relate to commits that genuinely and correctly do not go in any branch 
or tag (changes to the hooks directory, branches wrongly created at top 
level, etc.).  Almost all the branches that previously weren't created in 
git by reposurgeon because they were not changed in SVN after branch 
creation are now properly present in the conversion to git.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:36                     ` Joseph Myers
@ 2019-12-11 16:02                       ` Jonathan Wakely
  2019-12-11 17:47                         ` Eric S. Raymond
  2019-12-16  2:19                       ` Joseph Myers
  1 sibling, 1 reply; 198+ messages in thread
From: Jonathan Wakely @ 2019-12-11 16:02 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Earnshaw (lists),
	Maxim Kuvyrkov, Eric S. Raymond, Richard Guenther, gcc

On Wed, 11 Dec 2019 at 15:36, Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 11 Dec 2019, Richard Earnshaw (lists) wrote:
>
> > On 11/12/2019 15:19, Jonathan Wakely wrote:
> > > On Wed, 11 Dec 2019 at 15:03, Richard Earnshaw (lists) wrote:
> > >> I wouldn't bother with that.  There are known defects in the version of
> > >> reposurgeon that I used to produce that which have since been fixed.  It
> > >> was *never* the point of that upload to ask for correctness checks on
> > >> the conversion (I said so at the time).  Instead it was intended to
> > >> demonstrate the improvements to the commit summaries that I think we can
> > >> make.
> > >
> > > My concern is that there is no conversion done using reposurgeon that
> > > *can* be used to do correctness checks.
> > >
> >
> > I have concerns too, but I'm in contact with the reposurgeon guys and
> > progress *is* being made.
>
> Concretely: when I did a comparison of the tip of trunk against master
> from a reposurgeon conversion on 29 November, there were 1421 differences
> (files or directories only present in one of SVN or git or with different
> contents).  As of today with the SVN dump reader rewrite, this is down to
> just two differences (plus two empty directories present in SVN as git
> doesn't represent empty directories), and we understand exactly where the
> problem arises with a trunk deletion and recreation and what's odd about
> that particular trunk deletion and recreation.  All the deleted tags and
> branches are now placed neatly in refs/deleted/; we no longer have any
> problems with deleted tags or branches wrongly appearing in the main tag
> and branch namespaces; all the mess with deleted branches appearing in the
> reposurgeon-generated "root" branch has gone, everything there now appears
> to relate to commits that genuinely and correctly do not go in any branch
> or tag (changes to the hooks directory, branches wrongly created at top
> level, etc.).  Almost all the branches that previously weren't created in
> git by reposurgeon because they were not changed in SVN after branch
> creation are now properly present in the conversion to git.

That's good news and I'm relieved to hear it. Thanks.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:19                 ` Jonathan Wakely
  2019-12-11 15:21                   ` Richard Earnshaw (lists)
  2019-12-11 15:30                   ` Dennis Luehring
@ 2019-12-11 17:36                   ` Eric S. Raymond
  2 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-11 17:36 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Earnshaw (lists),
	Maxim Kuvyrkov, Joseph S. Myers, Richard Guenther, gcc

Jonathan Wakely <jwakely.gcc@gmail.com>:
> My concern is that there is no conversion done using reposurgeon that
> *can* be used to do correctness checks.

We can in fact verify revisions of a GCC conversion in place using
repotool compare. Joseph Myers has been using this with reposurgeon's
readlimit to run tests.

Unfortunately, on a repository this large, it's not practical to run a
verification on every single revision. The blocker is the slowness of
svn checkout. In practice, you have to sample key revisions, with
particular attention to those at and just after known metadata
defects.

The conversion crew - which now includes Joseph Myers and Richard
Earnshaw, in addition to my co-developers Daniel Brooks and Julien
Rivaud - is diligently testing as it refines the last bits of the
conversion.  

I believe everybody on the crew is now satisfied that we're converging
on a good result.  It helps that we now have a detailed characterization
of the pathological trunk deletion at r184996; most of the conversion
problems radiated from that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 16:02                       ` Jonathan Wakely
@ 2019-12-11 17:47                         ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-11 17:47 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Joseph Myers, Richard Earnshaw (lists),
	Maxim Kuvyrkov, Richard Guenther, gcc

Jonathan Wakely <jwakely.gcc@gmail.com>:
> That's good news and I'm relieved to hear it. Thanks.

Defect resolution has sped up noticeably since jsm28 and rearnsha 
showed up on #reposurgeon and started working directly with my crew.

Relax.  As Joseph reported, we've got this well in hand now. We might
even have a final conversion on the original 16 Dec deadline, though
I'm personally guessing it will take a bit longer than that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-11 15:36                     ` Joseph Myers
  2019-12-11 16:02                       ` Jonathan Wakely
@ 2019-12-16  2:19                       ` Joseph Myers
  1 sibling, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16  2:19 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Jonathan Wakely, Maxim Kuvyrkov, Eric S. Raymond, Richard Guenther, gcc

On Wed, 11 Dec 2019, Joseph Myers wrote:

> Concretely: when I did a comparison of the tip of trunk against master 
> from a reposurgeon conversion on 29 November, there were 1421 differences 
> (files or directories only present in one of SVN or git or with different 
> contents).  As of today with the SVN dump reader rewrite, this is down to 
> just two differences (plus two empty directories present in SVN as git 
> doesn't represent empty directories), and we understand exactly where the 
> problem arises with a trunk deletion and recreation and what's odd about 
> that particular trunk deletion and recreation.

Update: having done comparisons for every branch tip and tag, and 
investigated all the problems found and collectively fixed the bugs in 
question based on reduced testcases from that investigation, I believe all 
such problems causing comparison failures have now been fixed (and all 
problems causing tags or branches to be missing); I'm running a fresh 
conversion and comparisons to confirm.  If those comparisons are clean I 
may also compare the refs/deleted tags and branches to provide additional 
points at which the tree contents are verified correct.

Note: these comparisons are after deleting empty directories from the SVN 
checkout, because git doesn't represent empty directories.  There are also 
two branches (c++-modules and melt-branch) where some files have SVN 
keyword expansion enabled, and the files with keyword expansion enabled 
need excluding manually from the comparison process.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-06 14:44   ` Maxim Kuvyrkov
  2019-12-06 17:21     ` Eric S. Raymond
@ 2019-12-16  9:53     ` Mark Wielaard
  2019-12-16 11:29       ` Joseph Myers
  2019-12-16 13:33       ` Segher Boessenkool
  1 sibling, 2 replies; 198+ messages in thread
From: Mark Wielaard @ 2019-12-16  9:53 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Richard Earnshaw (lists), gcc

Hi Maxim,

On Fri, 2019-12-06 at 17:44 +0300, Maxim Kuvyrkov wrote:
> > On Sep 19, 2019, at 6:34 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> > > On Sep 17, 2019, at 3:02 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> > > 
> > > Monday 16th December 2019 - cut off date for picking which git conversion to use
> > > 
> > > Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
> > > 
> > > Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.
> > 
> > I have regenerated my primary version this week, and it's up at
> > https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .
> > So far I have received only minor issue reports about it, and all known problems have been fixed.
> > I could use a bit more scrutiny :-).
> 
> I think now is a good time to give status update on the svn->git conversion I maintain.
> See https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .
> 
> 1. The conversion has all SVN live branches converted as branches under refs/heads/* .
> 
> 2. The conversion has all SVN live tags converted as annotated tags under refs/tags/* .
> 
> 3. If desired, it would be trivial to add all deleted / leaf SVN branches and tags.
>    They would be named as branches/my-deleted-branch@12345,
>    where @12345 is the revision at which the branch was deleted.
>    Branches created and deleted multiple times would have separate entries
>    corresponding to delete revisions.
> 
> 4. Git committer and git author entries are very accurate
>    (imo, better than reposurgeon's, but I'm biased).
>    Developers' names and email addresses are mined from commit logs,
>    changelogs and source code and have historically-accurately attributions
>    to employer's email addresses.
> 
> 5. Since there is interest in reparenting branches to fix cvs2svn merge issues,
>    I've added this feature to my scripts as well (turned out to be trivial).
>    I'll keep the original gcc-pretty.git repo intact and will upload the new one at
>    https://git.linaro.org/people/maxim-kuvyrkov/gcc-reparent.git/
>    -- should be live by Monday.

Should we go with the gcc-reparent.git repo now?

Where exactly should it be installed under https://gcc.gnu.org/git/
Replacing the existing gcc.git will be confusing, but then how would we
name the repo that will become the main git gcc repo in 2 weeks?

Where are the tools/scripts that should be installed on gcc.gnu.org to
keep it up to date during the next 2 week transition period?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16  9:53     ` Mark Wielaard
@ 2019-12-16 11:29       ` Joseph Myers
  2019-12-16 12:43         ` Mark Wielaard
  2019-12-16 16:55         ` Jeff Law
  2019-12-16 13:33       ` Segher Boessenkool
  1 sibling, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 11:29 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 16 Dec 2019, Mark Wielaard wrote:

> Should we go with the gcc-reparent.git repo now?

I think we should go with the reposurgeon conversion, with all Richard's 
improvements to commit messages.  gcc-reparent.git has issues of its own; 
at least, checking the list of branches shows some branches are missing.  
So both conversions can still be considered works in progress.

However, we should also note that stage 3 is intended to last two months, 
ending with the move to git 
<https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
<https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>, and given that it 
didn't start at the start of November as anticipated in the originally 
proposed timetable, that implies corresponding updates to all the dates.  
By now, enough people are away until the new year that now isn't a good 
time for deciding things anyway.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 11:29       ` Joseph Myers
@ 2019-12-16 12:43         ` Mark Wielaard
  2019-12-16 13:36           ` Segher Boessenkool
  2019-12-16 13:53           ` Joseph Myers
  2019-12-16 16:55         ` Jeff Law
  1 sibling, 2 replies; 198+ messages in thread
From: Mark Wielaard @ 2019-12-16 12:43 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 2019-12-16 at 11:29 +0000, Joseph Myers wrote:
> On Mon, 16 Dec 2019, Mark Wielaard wrote:
> 
> > Should we go with the gcc-reparent.git repo now?
> 
> I think we should go with the reposurgeon conversion, with all Richard's 
> improvements to commit messages.  gcc-reparent.git has issues of its own; 
> at least, checking the list of branches shows some branches are missing.  
> So both conversions can still be considered works in progress.

I thought we would pick the best available conversion today.
If we keep tweaking the conversions till they are "perfect" we probably
never reach that point.

> However, we should also note that stage 3 is intended to last two months, 
> ending with the move to git 
> <https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
> <https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>, and given that it 
> didn't start at the start of November as anticipated in the originally 
> proposed timetable, that implies corresponding updates to all the dates.  
> By now, enough people are away until the new year that now isn't a good 
> time for deciding things anyway.

The idea was to do it while most people were away to have the least
impact. The timeline https://gcc.gnu.org/wiki/GitConversion does say we
can slip for logistical reasons the read-only date (2019/12/31) by a
few days.

Do people really want to keep tweaking the conversions and postpone the
git switchover? What would the new timetable be then?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16  9:53     ` Mark Wielaard
  2019-12-16 11:29       ` Joseph Myers
@ 2019-12-16 13:33       ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 13:33 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

Hi!

On Mon, Dec 16, 2019 at 10:53:05AM +0100, Mark Wielaard wrote:
> On Fri, 2019-12-06 at 17:44 +0300, Maxim Kuvyrkov wrote:
> > > On Sep 19, 2019, at 6:34 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> > > > On Sep 17, 2019, at 3:02 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> > > > 
> > > > Monday 16th December 2019 - cut off date for picking which git conversion to use
> > > > 
> > > > Tuesday 31st December 2019 - SVN repo becomes read-only at end of stage 3.
> > > > 
> > > > Thursday 2nd January 2020 - (ie read-only + 2 days) new git repo comes on line for live commits.
> > > 
> > > I have regenerated my primary version this week, and it's up at
> > > https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .
> > > So far I have received only minor issue reports about it, and all known problems have been fixed.
> > > I could use a bit more scrutiny :-).
> > 
> > I think now is a good time to give status update on the svn->git conversion I maintain.
> > See https://git.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/ .
> > 
> > 1. The conversion has all SVN live branches converted as branches under refs/heads/* .

That is true as far as I can see.  All branches I care about are there,
at least, and I don't see anything missing.

> > 2. The conversion has all SVN live tags converted as annotated tags under refs/tags/* .

Yup.

> > 3. If desired, it would be trivial to add all deleted / leaf SVN branches and tags.
> >    They would be named as branches/my-deleted-branch@12345,
> >    where @12345 is the revision at which the branch was deleted.
> >    Branches created and deleted multiple times would have separate entries
> >    corresponding to delete revisions.

I don't think this is desirable.

> > 4. Git committer and git author entries are very accurate
> >    (imo, better than reposurgeon's, but I'm biased).
> >    Developers' names and email addresses are mined from commit logs,
> >    changelogs and source code and have historically-accurately attributions
> >    to employer's email addresses.

They are very good, yes.  I have verified this *a lot*, months ago.  This
was all ready to go before the Cauldron.

> > 5. Since there is interest in reparenting branches to fix cvs2svn merge issues,
> >    I've added this feature to my scripts as well (turned out to be trivial).
> >    I'll keep the original gcc-pretty.git repo intact and will upload the new one at
> >    https://git.linaro.org/people/maxim-kuvyrkov/gcc-reparent.git/
> >    -- should be live by Monday.
> 
> Should we go with the gcc-reparent.git repo now?

I don't actually know what the difference is.  As far as I understand it
changes nothing for anything from this century, so either is fine with me.
And it is not very useful to have this old history cleaned up a bit: the
really *big* problem with the old history is that a) people did omnibus
commits a lot, not small self-contained commits changing one thing only;
and b) we really need to have the motivation that goes with those patches,
but that is not available (no mail archives).

> Where exactly should it be installed under https://gcc.gnu.org/git/
> Replacing the existing gcc.git will be confusing, but then how would we
> name the repo that will become the main git gcc repo in 2 weeks?

I think we should rename the old gcc.git mirror.  That pain is temporary.

> Where are the tools/scripts that should be installed on gcc.gnu.org to
> keep it up to date during the next 2 week transition period?

I think Maxim mentioned it before, but it's hard to find in this
humonguous thread :-)


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 12:43         ` Mark Wielaard
@ 2019-12-16 13:36           ` Segher Boessenkool
  2019-12-16 13:54             ` Eric S. Raymond
  2019-12-16 13:56             ` Joseph Myers
  2019-12-16 13:53           ` Joseph Myers
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 13:36 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Joseph Myers, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, Dec 16, 2019 at 01:43:48PM +0100, Mark Wielaard wrote:
> On Mon, 2019-12-16 at 11:29 +0000, Joseph Myers wrote:
> > On Mon, 16 Dec 2019, Mark Wielaard wrote:
> > 
> > > Should we go with the gcc-reparent.git repo now?
> > 
> > I think we should go with the reposurgeon conversion, with all Richard's 
> > improvements to commit messages.  gcc-reparent.git has issues of its own; 
> > at least, checking the list of branches shows some branches are missing.  

You need to provide proof of that.

> > So both conversions can still be considered works in progress.
> 
> I thought we would pick the best available conversion today.

Yes.

> If we keep tweaking the conversions till they are "perfect" we probably
> never reach that point.
> 
> > However, we should also note that stage 3 is intended to last two months, 
> > ending with the move to git 
> > <https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
> > <https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>, and given that it 
> > didn't start at the start of November as anticipated in the originally 
> > proposed timetable, that implies corresponding updates to all the dates.  

I do not agree.

> > By now, enough people are away until the new year that now isn't a good 
> > time for deciding things anyway.
> 
> The idea was to do it while most people were away to have the least
> impact. The timeline https://gcc.gnu.org/wiki/GitConversion does say we
> can slip for logistical reasons the read-only date (2019/12/31) by a
> few days.

Yes.

> Do people really want to keep tweaking the conversions and postpone the
> git switchover?

No.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 12:43         ` Mark Wielaard
  2019-12-16 13:36           ` Segher Boessenkool
@ 2019-12-16 13:53           ` Joseph Myers
  2019-12-16 16:39             ` Jeff Law
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 13:53 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc, esr

On Mon, 16 Dec 2019, Mark Wielaard wrote:

> > I think we should go with the reposurgeon conversion, with all Richard's 
> > improvements to commit messages.  gcc-reparent.git has issues of its own; 
> > at least, checking the list of branches shows some branches are missing.  
> > So both conversions can still be considered works in progress.
> 
> I thought we would pick the best available conversion today.
> If we keep tweaking the conversions till they are "perfect" we probably
> never reach that point.

There is a difference between tweaking until they are perfect, and 
allowing a little more time to get major improvements for which code 
actually exists (Richard's commit message scripts).

If the go port of reposurgeon were still unfinished, or there were major 
problems with the conversion that weren't understood, considerations might 
be different.  But the go port is fully functional and all the known 
issues affecting tree contents are fixed; to the extent there are other 
issues, they are less significant and also well-understood and should be 
fixed soon.

> > However, we should also note that stage 3 is intended to last two months, 
> > ending with the move to git 
> > <https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
> > <https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>, and given that it 
> > didn't start at the start of November as anticipated in the originally 
> > proposed timetable, that implies corresponding updates to all the dates.  
> > By now, enough people are away until the new year that now isn't a good 
> > time for deciding things anyway.
> 
> The idea was to do it while most people were away to have the least
> impact. The timeline https://gcc.gnu.org/wiki/GitConversion does say we
> can slip for logistical reasons the read-only date (2019/12/31) by a
> few days.

It was also that doing it at the end of stage 3 would mean the least 
disruption to development for stage 3.  That suggests converting over the 
weekend of 18/19 January, given the current stage 3 timings.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:36           ` Segher Boessenkool
@ 2019-12-16 13:54             ` Eric S. Raymond
  2019-12-16 14:05               ` Segher Boessenkool
  2019-12-16 16:04               ` Jeff Law
  2019-12-16 13:56             ` Joseph Myers
  1 sibling, 2 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 13:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Mark Wielaard, Joseph Myers, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Segher Boessenkool <segher@kernel.crashing.org>:
> > Do people really want to keep tweaking the conversions and postpone the
> > git switchover?
> 
> No.

It may not be my place to say, but...I think the stakes are pretty
high here.  If I were a GCC developer, I think I'd want the best
possible conversion even if that takes a little longer.

jsm28, rearnsha, and my reposurgeon crew are pretty close to a final
deliverable now. We know what the remaining issues are, they're not
major, and we have a strategy for fixing them. Have a little patience,
please.

Better yet, come over to #reposurgeon on freenode and help out. Anyone
who can run tests on a machine with >128GB RAM would be especially
welcome.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:36           ` Segher Boessenkool
  2019-12-16 13:54             ` Eric S. Raymond
@ 2019-12-16 13:56             ` Joseph Myers
  2019-12-16 14:17               ` Mark Wielaard
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 13:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 16 Dec 2019, Segher Boessenkool wrote:

> On Mon, Dec 16, 2019 at 01:43:48PM +0100, Mark Wielaard wrote:
> > On Mon, 2019-12-16 at 11:29 +0000, Joseph Myers wrote:
> > > On Mon, 16 Dec 2019, Mark Wielaard wrote:
> > > 
> > > > Should we go with the gcc-reparent.git repo now?
> > > 
> > > I think we should go with the reposurgeon conversion, with all Richard's 
> > > improvements to commit messages.  gcc-reparent.git has issues of its own; 
> > > at least, checking the list of branches shows some branches are missing.  
> 
> You need to provide proof of that.

classpath-generics gcj/classpath-095-import-branch libstdcxx_so_7-2-branch 
st/binutils st/mono-based-binutils.  There are also tags in the GCC SVN 
repository in /branches/st/tags, some of which are missing in the 
conversion and the rest of which are in refs/heads/ when refs/tags/ would 
be more appropriate.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:54             ` Eric S. Raymond
@ 2019-12-16 14:05               ` Segher Boessenkool
  2019-12-16 14:13                 ` Joseph Myers
  2019-12-16 16:04               ` Jeff Law
  1 sibling, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 14:05 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Mark Wielaard, Joseph Myers, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, Dec 16, 2019 at 08:54:51AM -0500, Eric S. Raymond wrote:
> Segher Boessenkool <segher@kernel.crashing.org>:
> > > Do people really want to keep tweaking the conversions and postpone the
> > > git switchover?
> > 
> > No.
> 
> It may not be my place to say, but...I think the stakes are pretty
> high here.  If I were a GCC developer, I think I'd want the best
> possible conversion even if that takes a little longer.

Most of us are perfectly happy even with the current git mirror, for
old commits.  We want "real" git to make the workflow for new commits
better.

No more delays, _please_.

If the reposurgeon conversion is not ready now, then it is too late
to be selected.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 14:05               ` Segher Boessenkool
@ 2019-12-16 14:13                 ` Joseph Myers
  2019-12-16 15:37                   ` Segher Boessenkool
  2019-12-16 16:27                   ` Eric S. Raymond
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 14:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Eric S. Raymond, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 16 Dec 2019, Segher Boessenkool wrote:

> Most of us are perfectly happy even with the current git mirror, for
> old commits.  We want "real" git to make the workflow for new commits
> better.
> 
> No more delays, _please_.

The timetable is a useful guideline.  It should not be our master when 
there are clear improvements with implementations already available; 
waiting to the actual end of stage 3 makes sense (when waiting another 
year would not make sense).  When we're talking about something to be used 
for the next 20 years we should make sure to get it right.

All conversions clearly need more validation work.  That missing branches 
in Maxim's conversion could be noted only today clearly shows that 
validation of that conversion is also at a very early stage (and 
conversions with an ad hoc script need much more thorough, trickier 
validation because you don't benefit from knowing the tool has worked for 
other conversions).

> If the reposurgeon conversion is not ready now, then it is too late
> to be selected.

I believe it's at least as ready as Maxim's.  The last public version has 
some known issues, most of those have been addressed since that conversion 
run, others are being addressed.  I fully expect it would in fact be in a 
good state to run the final conversion on the original dates, even though 
those are before the end of stage 3.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:56             ` Joseph Myers
@ 2019-12-16 14:17               ` Mark Wielaard
  2019-12-16 16:29                 ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Mark Wielaard @ 2019-12-16 14:17 UTC (permalink / raw)
  To: Joseph Myers, Segher Boessenkool
  Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 2019-12-16 at 13:56 +0000, Joseph Myers wrote:
> classpath-generics gcj/classpath-095-import-branch
> libstdcxx_so_7-2-branch st/binutils st/mono-based-binutils.

The classpath "branches" should not be in the final git repo. Those
"branches" are really separate from the actual gcc code tree.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 14:13                 ` Joseph Myers
@ 2019-12-16 15:37                   ` Segher Boessenkool
  2019-12-16 16:36                     ` Joseph Myers
  2019-12-16 17:40                     ` Jeff Law
  2019-12-16 16:27                   ` Eric S. Raymond
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 15:37 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, Dec 16, 2019 at 02:13:06PM +0000, Joseph Myers wrote:
> On Mon, 16 Dec 2019, Segher Boessenkool wrote:
> 
> > Most of us are perfectly happy even with the current git mirror, for
> > old commits.  We want "real" git to make the workflow for new commits
> > better.
> > 
> > No more delays, _please_.
> 
> The timetable is a useful guideline.  It should not be our master when 
> there are clear improvements with implementations already available; 
> waiting to the actual end of stage 3 makes sense (when waiting another 
> year would not make sense).  When we're talking about something to be used 
> for the next 20 years we should make sure to get it right.

We should not take five years to get it done.

And the current mirror is "right", already, as Jeff said at the Cauldron
(a minute before we unanymously decided to do the conversion soon; this
is over three months ago already).

> All conversions clearly need more validation work.

No, I do not agree with that.  We have had the opportunity to look at
Maxim's conversions for months already, since before the Cauldron, and
it has been perfectly adequate from the start imnsho, and it has been
improved a little since even.

> That missing branches 
> in Maxim's conversion could be noted only today clearly shows that 

... clearly shows that *no one cares* about those branches.

 (and 
> conversions with an ad hoc script need much more thorough, trickier 
> validation because you don't benefit from knowing the tool has worked for 
> other conversions).

Reposurgeon is ad-hoc as well, and the current implementation is a
complete rewrite, and not proven *at all*.  At least Maxim's scripts are
just that: scripts, using some very well-tested very widely used tools
as building blocks.

> > If the reposurgeon conversion is not ready now, then it is too late
> > to be selected.
> 
> I believe it's at least as ready as Maxim's.

I do not agree.  You say the reposurgeon conversion is not ready today.
Maxim's conversion has been ready for many months.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:54             ` Eric S. Raymond
  2019-12-16 14:05               ` Segher Boessenkool
@ 2019-12-16 16:04               ` Jeff Law
  2019-12-16 16:37                 ` Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-16 16:04 UTC (permalink / raw)
  To: esr, Segher Boessenkool
  Cc: Mark Wielaard, Joseph Myers, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 2019-12-16 at 08:54 -0500, Eric S. Raymond wrote:
> Segher Boessenkool <segher@kernel.crashing.org>:
> > > Do people really want to keep tweaking the conversions and postpone the
> > > git switchover?
> > 
> > No.
> 
> It may not be my place to say, but...I think the stakes are pretty
> high here.  If I were a GCC developer, I think I'd want the best
> possible conversion even if that takes a little longer.
Well, I'm not sure that's entirely true.

I do a ton of historical digging, possibly more than anyone else
involved with GCC.  The git-svn mirror has been sufficient for that for
years, even with the warts that folks have pointed out.

Given that, delaying to achieve a perfect conversion is, IMHO, just
silly.  I don't mind delaying a few days here or there because we want
to do verification, or to line up better with our own development
schedules.  What I don't want to do is delay because any particular
tool is still being tweaked to get closer to that "perfect" conversion.

So the question I would ask is the state of each converter today and
how they compare to each other.  That argues we need time to compare
the result, which as I noted above is fine by me.  But we ought to be
comparing the converter's state as of right now.

Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 14:13                 ` Joseph Myers
  2019-12-16 15:37                   ` Segher Boessenkool
@ 2019-12-16 16:27                   ` Eric S. Raymond
  2019-12-16 16:47                     ` Segher Boessenkool
  1 sibling, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 16:27 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Segher Boessenkool, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Joseph Myers <joseph@codesourcery.com>:
>                             When we're talking about something to be used 
> for the next 20 years we should make sure to get it right.

Segher and others should note that I'm not in the habit of sinking most of
a year of my time into problems that I don't think are extremely
important. This conversion *is* that important.

> conversions with an ad hoc script need much more thorough, trickier 
> validation because you don't benefit from knowing the tool has worked for 
> other conversions).

Nor, as far as I am aware, do the scripts have anything resembling
reposurgeon's test suite.

Segher Boessenkool:
> > If the reposurgeon conversion is not ready now, then it is too late
> > to be selected.

Maxim's conversion pipeline isn't ready either -- there are known
bugs with its result. Does that mean it's too late to select Maxim's
conversion? If so, what do you propose be done?

Please stop bellyaching and pitch in. Whether it's by fixing up
Maxim's conversion, helping improve the reposurgeon one,
or writing a conversion method of your own - I don't much care
and it's not my job to tell you what to do, anyway. Any of those
choices might be helpful; sniping from the sidelines is not.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 14:17               ` Mark Wielaard
@ 2019-12-16 16:29                 ` Joseph Myers
  0 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 16:29 UTC (permalink / raw)
  To: Mark Wielaard
  Cc: Segher Boessenkool, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 16 Dec 2019, Mark Wielaard wrote:

> On Mon, 2019-12-16 at 13:56 +0000, Joseph Myers wrote:
> > classpath-generics gcj/classpath-095-import-branch
> > libstdcxx_so_7-2-branch st/binutils st/mono-based-binutils.
> 
> The classpath "branches" should not be in the final git repo. Those
> "branches" are really separate from the actual gcc code tree.

They're branches of part of GCC (and duly have their contents in a 
libjava/ subdirectory).  I think something present as a non-deleted branch 
in the GCC repository is appropriate for the conversion.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 15:37                   ` Segher Boessenkool
@ 2019-12-16 16:36                     ` Joseph Myers
  2019-12-16 17:40                     ` Jeff Law
  1 sibling, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 16:36 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Eric S. Raymond, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 16 Dec 2019, Segher Boessenkool wrote:

> And the current mirror is "right", already, as Jeff said at the Cauldron
> (a minute before we unanymously decided to do the conversion soon; this
> is over three months ago already).

All the discussion at the Cauldron tells us is many people like the idea 
of moving to git and it not taking forever.  Inviting a crowd to agree 
with a proposition is not a useful way to judge the technical merits of 
any particular detailed conversion approach.

> > That missing branches 
> > in Maxim's conversion could be noted only today clearly shows that 
> 
> ... clearly shows that *no one cares* about those branches.

Since Maxim said that all branches were present, that indicates a lack of 
validation, and a lack of validation by other people since then.  Checking 
the set of branches and tags present is one of the most basic checks on a 
conversion to identify problems.

> > I believe it's at least as ready as Maxim's.
> 
> I do not agree.  You say the reposurgeon conversion is not ready today.
> Maxim's conversion has been ready for many months.

I believe it's ready in the form of source code (gcc-conversion repository 
and newsvn3 branch in the reposurgeon repository).  I'm running a test 
conversion to check this and produce the binary form (converted git 
repository); conversions just take a while to run.  With correctness 
issues having been addressed, we're working on performance issues, and I'm 
running a second test conversion on a second machine with both a patch 
I've just written that passes reposurgeon's tests and I hope will save 
about 8 hours on the conversion time, and further performance improvements 
that went in overnight that should save some more hours via saving memory 
usage.  (A significant proportion of the time for a conversion is spent by 
git-fast-import reading the fast-import stream, which places a lower bound 
of a few hours on the time taken for a conversion even if everything 
outside of git is infinitely fast.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:04               ` Jeff Law
@ 2019-12-16 16:37                 ` Eric S. Raymond
  2019-12-16 16:47                   ` Jeff Law
  0 siblings, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 16:37 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Mark Wielaard, Joseph Myers, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Jeff Law <law@redhat.com>:
> > It may not be my place to say, but...I think the stakes are pretty
> > high here.  If I were a GCC developer, I think I'd want the best
> > possible conversion even if that takes a little longer.
> Well, I'm not sure that's entirely true.

OK, that's a policy choice the GCC project is going to have to make.
I'm just the mechanic here.

Joseph Myers has made his choice.  He has said repeatedly that he
wants to follow through with the reposurgeon conversion, and he's
putting his effort behind that by writing tests and even contributing
code to reposurgeon.

We'll get this done faster if nobody is joggling his elbow. Or mine.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 13:53           ` Joseph Myers
@ 2019-12-16 16:39             ` Jeff Law
  2019-12-16 17:57               ` Richard Biener
  0 siblings, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-16 16:39 UTC (permalink / raw)
  To: Joseph Myers, Mark Wielaard
  Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc, esr

On Mon, 2019-12-16 at 13:53 +0000, Joseph Myers wrote:
> 
> > > However, we should also note that stage 3 is intended to last two months, 
> > > ending with the move to git 
> > > <https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
> > > <https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>;, and given that it 
> > > didn't start at the start of November as anticipated in the originally 
> > > proposed timetable, that implies corresponding updates to all the dates.  
> > > By now, enough people are away until the new year that now isn't a good 
> > > time for deciding things anyway.
> > 
> > The idea was to do it while most people were away to have the least
> > impact. The timeline https://gcc.gnu.org/wiki/GitConversion does say we
> > can slip for logistical reasons the read-only date (2019/12/31) by a
> > few days.
> 
> It was also that doing it at the end of stage 3 would mean the least 
> disruption to development for stage 3.  That suggests converting over the 
> weekend of 18/19 January, given the current stage 3 timings.
My recollection was the timing was meant to land roughly at the stage3-
>stage4 transition.  So the question is whether or not we're on target
with that.  Based on the regression counts, probably not at this point,
but I'll let the release managers chime in on that point.

Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:27                   ` Eric S. Raymond
@ 2019-12-16 16:47                     ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 16:47 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, Dec 16, 2019 at 11:27:56AM -0500, Eric S. Raymond wrote:
> Joseph Myers <joseph@codesourcery.com>:
> >                             When we're talking about something to be used 
> > for the next 20 years we should make sure to get it right.
> 
> Segher and others should note that I'm not in the habit of sinking most of
> a year of my time into problems that I don't think are extremely
> important. This conversion *is* that important.

To you, whatever reposurgeon does that nothing else can, is important.
To many other people, not.  Most people are totally pragmatic and want
to use a git-based workflow with GCC, and then soon upgrade some other
things in our workflow, to improve our day-to-day experience, and to
allow us to do things we couldn't do before.

Most people do not care about fixing the imperfections in the CVS->SVN
conversion.  We have been using the SVN->Git mirror for over ten years
now, and it is perfectly workable.  Now we want to finally finally
_FINALLY_ have an actual git repo that we can commit patches to
directly.  Which we unanimously decided to do over three months ago.

> Nor, as far as I am aware, do the scripts have anything resembling
> reposurgeon's test suite.

So?  Such a test suite does not magically prevent bugs (whatever type
it is: regressions, unit tests, whatever methodology).

The only thing that matters is acceptance testing (which includes such
trivial things ass "are all the files on trunk what they should be").

> Segher Boessenkool:
> > > If the reposurgeon conversion is not ready now, then it is too late
> > > to be selected.
> 
> Maxim's conversion pipeline isn't ready either -- there are known
> bugs with its result.

Are there?  The last I heard is that branches that do not share any
history with GCC are not in there.  That's a feature, not a bug, imnsho.

If you know of any other bugs, detail them, don't make unfounded
statements please.

> Does that mean it's too late to select Maxim's
> conversion? If so, what do you propose be done?

Maxim's conversion was perfectly acceptable many months ago already.

> Please stop bellyaching and pitch in. Whether it's by fixing up
> Maxim's conversion, helping improve the reposurgeon one,
> or writing a conversion method of your own - I don't much care
> and it's not my job to tell you what to do, anyway. Any of those
> choices might be helpful; sniping from the sidelines is not.

Lol.  Yeah, I won't answer that at all, I guess.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:37                 ` Eric S. Raymond
@ 2019-12-16 16:47                   ` Jeff Law
  2019-12-31 13:43                     ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-16 16:47 UTC (permalink / raw)
  To: esr
  Cc: Segher Boessenkool, Mark Wielaard, Joseph Myers, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 2019-12-16 at 11:37 -0500, Eric S. Raymond wrote:
> Jeff Law <law@redhat.com>:
> > > It may not be my place to say, but...I think the stakes are pretty
> > > high here.  If I were a GCC developer, I think I'd want the best
> > > possible conversion even if that takes a little longer.
> > Well, I'm not sure that's entirely true.
> 
> OK, that's a policy choice the GCC project is going to have to make.
> I'm just the mechanic here.
Yup.  And I wouldn't be surprised if there is dissent regardless of
what final decision is made.
> 
> Joseph Myers has made his choice.  He has said repeatedly that he
> wants to follow through with the reposurgeon conversion, and he's
> putting his effort behind that by writing tests and even contributing
> code to reposurgeon.
> 
> We'll get this done faster if nobody is joggling his elbow. Or mine.
And just to be clear, my preference is for reposurgeon, if it's ready. 
But if it isn't, then I'm absolutely comfortable dropping back to
Maxim's conversion or even the existing mirror.



Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 11:29       ` Joseph Myers
  2019-12-16 12:43         ` Mark Wielaard
@ 2019-12-16 16:55         ` Jeff Law
  2019-12-16 17:08           ` Joseph Myers
  1 sibling, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-16 16:55 UTC (permalink / raw)
  To: Joseph Myers, Mark Wielaard; +Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

On Mon, 2019-12-16 at 11:29 +0000, Joseph Myers wrote:
> On Mon, 16 Dec 2019, Mark Wielaard wrote:
> 
> > Should we go with the gcc-reparent.git repo now?
> 
> I think we should go with the reposurgeon conversion, with all Richard's 
> improvements to commit messages.  gcc-reparent.git has issues of its own; 
> at least, checking the list of branches shows some branches are missing.  
> So both conversions can still be considered works in progress.
So it seems like your position is that the reposurgeon conversion is as
good as or better than conversion from Maxim's scripts.

If that's a fair assessment of your position, then my vote would be
that we select reposurgeon based on your assessment, which I absolutely
trust. 

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:55         ` Jeff Law
@ 2019-12-16 17:08           ` Joseph Myers
  2019-12-16 19:15             ` Eric S. Raymond
  2019-12-16 21:59             ` Segher Boessenkool
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 17:08 UTC (permalink / raw)
  To: Jeff Law
  Cc: Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc, esr

On Mon, 16 Dec 2019, Jeff Law wrote:

> On Mon, 2019-12-16 at 11:29 +0000, Joseph Myers wrote:
> > On Mon, 16 Dec 2019, Mark Wielaard wrote:
> > 
> > > Should we go with the gcc-reparent.git repo now?
> > 
> > I think we should go with the reposurgeon conversion, with all Richard's 
> > improvements to commit messages.  gcc-reparent.git has issues of its own; 
> > at least, checking the list of branches shows some branches are missing.  
> > So both conversions can still be considered works in progress.
> So it seems like your position is that the reposurgeon conversion is as
> good as or better than conversion from Maxim's scripts.

Yes.  It should be possible to confirm branch tip conversions and other 
properties of the repository (e.g. that all branch tips are properly 
descended from the first commit on trunk except for the specific branches 
that shouldn't be) once my current conversions have finished running.

I think there may well be things to *learn* from Maxim's conversion to 
improve the reposurgeon one further (if they don't take that long to 
implement).  In particular, we should look carefully at the commit 
attributions in both conversions and Maxim's may well give ideas for 
improving the reposurgeon changelogs command (Richard came up with three 
ideas recently, which I've just filed in the reposurgeon issue tracker).  
But I also think:

* reposurgeon is a safer approach than ad hoc scripts, provided we get 
clean verification of basic properties such as branch tip contents.

* Richard's improvements to commit messages are a great improvement to the 
resulting repository (and it's OK if a small percentage end up misleading 
because someone used the wrong PR number, sometimes people use the wrong 
commit message or commit changes they didn't mean to and so having some 
misleading messages is unavoidable).

* As we're part of the free software community as a whole rather than 
something in isolation, choosing to make a general-purpose tool work for 
our conversion is somewhat preferable to choosing an ad hoc approach 
because it contributes something of value for other repository conversions 
by other projects in future.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 15:37                   ` Segher Boessenkool
  2019-12-16 16:36                     ` Joseph Myers
@ 2019-12-16 17:40                     ` Jeff Law
  2019-12-25  8:12                       ` Alexandre Oliva
  1 sibling, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-16 17:40 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Eric S. Raymond, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 2019-12-16 at 09:36 -0600, Segher Boessenkool wrote:
> On Mon, Dec 16, 2019 at 02:13:06PM +0000, Joseph Myers wrote:
> > On Mon, 16 Dec 2019, Segher Boessenkool wrote:
> > 
> > > Most of us are perfectly happy even with the current git mirror, for
> > > old commits.  We want "real" git to make the workflow for new commits
> > > better.
> > > 
> > > No more delays, _please_.
> > 
> > The timetable is a useful guideline.  It should not be our master when 
> > there are clear improvements with implementations already available; 
> > waiting to the actual end of stage 3 makes sense (when waiting another 
> > year would not make sense).  When we're talking about something to be used 
> > for the next 20 years we should make sure to get it right.
> 
> We should not take five years to get it done.
> 
> And the current mirror is "right", already, as Jeff said at the Cauldron
> (a minute before we unanymously decided to do the conversion soon; this
> is over three months ago already).
Yup.   I want to convert sooner, not later.  I don't mind slipping a
little for validation work or because it doesn't line up with our
schedules.  I don't want to slip for major changes in the tooling.

My preference has always been, in order, reposurgeon, Maxim's scripts
and the existing mirror.  However, I'm confident that all of them will
be sufficient for our needs.


> > All conversions clearly need more validation work.
> 
> No, I do not agree with that.  We have had the opportunity to look at
> Maxim's conversions for months already, since before the Cauldron, and
> it has been perfectly adequate from the start imnsho, and it has been
> improved a little since even.
Yet Joseph just indicated today Maxim's conversion is missing some
branches.  While I don't consider any of the missed branches important,
others might.   More importantly, it raises the issue of what other
branches might be missing and what validation work has been done on
that conversion.


> 
> > That missing branches 
> > in Maxim's conversion could be noted only today clearly shows that 
> 
> ... clearly shows that *no one cares* about those branches.
> 
>  (and 
> > conversions with an ad hoc script need much more thorough, trickier 
> > validation because you don't benefit from knowing the tool has worked for 
> > other conversions).
> 
> Reposurgeon is ad-hoc as well, and the current implementation is a
> complete rewrite, and not proven *at all*.  At least Maxim's scripts are
> just that: scripts, using some very well-tested very widely used tools
> as building blocks.
I wouldn't really classify it that way.  reposurgeon has significant
history.  Are we using a rewrite, yes, but there's extensive experience
behind it as well as testsuites to validate the work.


> 
> > > If the reposurgeon conversion is not ready now, then it is too late
> > > to be selected.
> > 
> > I believe it's at least as ready as Maxim's.
> 
> I do not agree.  You say the reposurgeon conversion is not ready today.
> Maxim's conversion has been ready for many months.
Actually Joseph's position is that reposurgeon is already beyond
Maxim's conversion in terms of quality.

My interpretation of the messages flying by is they're ironing out some
details on the edges, but I don't think those are major intrusive
changes.

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:39             ` Jeff Law
@ 2019-12-16 17:57               ` Richard Biener
  0 siblings, 0 replies; 198+ messages in thread
From: Richard Biener @ 2019-12-16 17:57 UTC (permalink / raw)
  To: law, Jeff Law, Joseph Myers, Mark Wielaard
  Cc: Maxim Kuvyrkov, Richard Earnshaw (lists), gcc, esr

On December 16, 2019 5:39:06 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>On Mon, 2019-12-16 at 13:53 +0000, Joseph Myers wrote:
>> 
>> > > However, we should also note that stage 3 is intended to last two
>months, 
>> > > ending with the move to git 
>> > > <https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html> 
>> > > <https://gcc.gnu.org/ml/gcc/2019-11/msg00117.html>;, and given
>that it 
>> > > didn't start at the start of November as anticipated in the
>originally 
>> > > proposed timetable, that implies corresponding updates to all the
>dates.  
>> > > By now, enough people are away until the new year that now isn't
>a good 
>> > > time for deciding things anyway.
>> > 
>> > The idea was to do it while most people were away to have the least
>> > impact. The timeline https://gcc.gnu.org/wiki/GitConversion does
>say we
>> > can slip for logistical reasons the read-only date (2019/12/31) by
>a
>> > few days.
>> 
>> It was also that doing it at the end of stage 3 would mean the least 
>> disruption to development for stage 3.  That suggests converting over
>the 
>> weekend of 18/19 January, given the current stage 3 timings.
>My recollection was the timing was meant to land roughly at the stage3-
>>stage4 transition.  So the question is whether or not we're on target
>with that.  Based on the regression counts, probably not at this point,
>but I'll let the release managers chime in on that point.
>

Fortunately that's a pure timing thing and I'd expect stage3 to end mid January, possibly scrapping off a week to help the git conversion. 

Richard. 

>Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 17:08           ` Joseph Myers
@ 2019-12-16 19:15             ` Eric S. Raymond
  2019-12-16 21:59             ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 19:15 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jeff Law, Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc

Joseph Myers <joseph@codesourcery.com>:
> * As we're part of the free software community as a whole rather than 
> something in isolation, choosing to make a general-purpose tool work for 
> our conversion is somewhat preferable to choosing an ad hoc approach 
> because it contributes something of value for other repository conversions 
> by other projects in future.

That's not just theory or sentiment. Reposurgeon is the best
any-VCS-to-any-VCS converter there is because every time I do a
conversion, I learn things, and that knowledge gets incorporated in
the code and the documentation around it.

Yes, in theory someone else could build a tool as good that
incorporates as much domain knowledge. So far, nobody has tried.  It's
unlikely anyone will, at this point, when they can join my dev team
and get the results they want with much less effort by improving
reposurgeon or one of its auxiliary tools.

Every time that happens, everybody - into the indefinite future - wins.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 17:08           ` Joseph Myers
  2019-12-16 19:15             ` Eric S. Raymond
@ 2019-12-16 21:59             ` Segher Boessenkool
  2019-12-16 22:14               ` Jeff Law
                                 ` (2 more replies)
  1 sibling, 3 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 21:59 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jeff Law, Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc, esr

Hi,

On Mon, Dec 16, 2019 at 05:07:48PM +0000, Joseph Myers wrote:
> Yes.  It should be possible to confirm branch tip conversions and other 
> properties of the repository (e.g. that all branch tips are properly 
> descended from the first commit on trunk except for the specific branches 
> that shouldn't be) once my current conversions have finished running.

Please do that for Maxim's conversion as well then?

(If the way you do the verification requires reposurgeon, the
verification methodology itself is fatally flawed).

> I think there may well be things to *learn* from Maxim's conversion to 
> improve the reposurgeon one further (if they don't take that long to 
> implement).

Or the other way around.

> In particular, we should look carefully at the commit 
> attributions in both conversions and Maxim's may well give ideas for 
> improving the reposurgeon changelogs command (Richard came up with three 
> ideas recently, which I've just filed in the reposurgeon issue tracker).  
> But I also think:
> 
> * reposurgeon is a safer approach than ad hoc scripts, provided we get 
> clean verification of basic properties such as branch tip contents.

I totally do not agree.  Black boxes are not safe.  *New* black boxes
are even worse.

I trust scripts that have low internal complexity much better.

There is absolutely no reason to trust a system that supposedly was
already very mature, but that required lots of complex modifications,
and even a complete rewrite in a different language, that even has its
own bug tracker, to work without problems (although we all have *seen*
some of its many problems over the last years), and at the same time
bad-mouthing simple scripts that simply work, and have simple problems.

> * Richard's improvements to commit messages are a great improvement to the 
> resulting repository (and it's OK if a small percentage end up misleading 
> because someone used the wrong PR number, sometimes people use the wrong 
> commit message or commit changes they didn't mean to and so having some 
> misleading messages is unavoidable).

As long as the original commit message is kept, verbatim, and you only
add a new summary line, all is fine.  If not -> nope, not okay.

> * As we're part of the free software community as a whole rather than 
> something in isolation, choosing to make a general-purpose tool work for 
> our conversion is somewhat preferable to choosing an ad hoc approach 
> because it contributes something of value for other repository conversions 
> by other projects in future.

This, I don't agree with at all either: having some lean-and-mean
scripts that worked for the GCC conversion will be at least as helpful
for another conversion as a "general" tool that first requires you to
build a custom machine before you can use it properly, would be.

Anyway: yes, please verify all conversion candidates for your criteria.
Thanks!

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 21:59             ` Segher Boessenkool
@ 2019-12-16 22:14               ` Jeff Law
  2019-12-16 22:42                 ` Segher Boessenkool
  2019-12-16 23:34                 ` Eric S. Raymond
  2019-12-16 23:18               ` Joseph Myers
  2019-12-16 23:19               ` Eric S. Raymond
  2 siblings, 2 replies; 198+ messages in thread
From: Jeff Law @ 2019-12-16 22:14 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists), gcc, esr

On Mon, 2019-12-16 at 15:59 -0600, Segher Boessenkool wrote:
> In particular, we should look carefully at the commit 
> > attributions in both conversions and Maxim's may well give ideas for 
> > improving the reposurgeon changelogs command (Richard came up with three 
> > ideas recently, which I've just filed in the reposurgeon issue tracker).  
> > But I also think:
> > 
> > * reposurgeon is a safer approach than ad hoc scripts, provided we get 
> > clean verification of basic properties such as branch tip contents.
> 
> I totally do not agree.  Black boxes are not safe.  *New* black boxes
> are even worse.
> 
> I trust scripts that have low internal complexity much better.
Perhaps.  But there's also limits to what scripts can do.

> 
> There is absolutely no reason to trust a system that supposedly was
> already very mature, but that required lots of complex modifications,
> and even a complete rewrite in a different language, that even has its
> own bug tracker, to work without problems (although we all have *seen*
> some of its many problems over the last years), and at the same time
> bad-mouthing simple scripts that simply work, and have simple problems.
I'd disagree.  THe experience and testsuites from that system are a
significant benefit. 

> 
> > * Richard's improvements to commit messages are a great improvement to the 
> > resulting repository (and it's OK if a small percentage end up misleading 
> > because someone used the wrong PR number, sometimes people use the wrong 
> > commit message or commit changes they didn't mean to and so having some 
> > misleading messages is unavoidable).
> 
> As long as the original commit message is kept, verbatim, and you only
> add a new summary line, all is fine.  If not -> nope, not okay.
Sorry, have to disagree here.  I think what Richard has done is a
significant step forward. 


> 
> > * As we're part of the free software community as a whole rather than 
> > something in isolation, choosing to make a general-purpose tool work for 
> > our conversion is somewhat preferable to choosing an ad hoc approach 
> > because it contributes something of value for other repository conversions 
> > by other projects in future.
> 
> This, I don't agree with at all either: having some lean-and-mean
> scripts that worked for the GCC conversion will be at least as helpful
> for another conversion as a "general" tool that first requires you to
> build a custom machine before you can use it properly, would be.
> 
> 
> Anyway: yes, please verify all conversion candidates for your criteria.
> Thanks!
And if they're the same, then I'm still going to prefer reposurgeon.

So unless there's something  Maxim's scripts are getting right that
aren't by reposurgeon, then reposurgeon is the right choice.

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 22:14               ` Jeff Law
@ 2019-12-16 22:42                 ` Segher Boessenkool
  2019-12-16 23:26                   ` Joseph Myers
  2019-12-18 18:07                   ` Jeff Law
  2019-12-16 23:34                 ` Eric S. Raymond
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-16 22:42 UTC (permalink / raw)
  To: Jeff Law
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

Hi!

On Mon, Dec 16, 2019 at 03:13:49PM -0700, Jeff Law wrote:
> On Mon, 2019-12-16 at 15:59 -0600, Segher Boessenkool wrote:
> > In particular, we should look carefully at the commit 
> > > attributions in both conversions and Maxim's may well give ideas for 
> > > improving the reposurgeon changelogs command (Richard came up with three 
> > > ideas recently, which I've just filed in the reposurgeon issue tracker).  
> > > But I also think:
> > > 
> > > * reposurgeon is a safer approach than ad hoc scripts, provided we get 
> > > clean verification of basic properties such as branch tip contents.
> > 
> > I totally do not agree.  Black boxes are not safe.  *New* black boxes
> > are even worse.
> > 
> > I trust scripts that have low internal complexity much better.
> Perhaps.  But there's also limits to what scripts can do.

Absolutely.  Just this trust in complicated things is very misplaced, in
my opinion.

> > There is absolutely no reason to trust a system that supposedly was
> > already very mature, but that required lots of complex modifications,
> > and even a complete rewrite in a different language, that even has its
> > own bug tracker, to work without problems (although we all have *seen*
> > some of its many problems over the last years), and at the same time
> > bad-mouthing simple scripts that simply work, and have simple problems.
> I'd disagree.  THe experience and testsuites from that system are a
> significant benefit. 

That isn't what I said.  I said that freshly constructed complex software
will have more and deeper errors than stupid simple scripts do (or I
implied that at least, maybe it wasn't clear).  And I only say this
because the opposite was claimed, which is laughable imnsho.

> > > * Richard's improvements to commit messages are a great improvement to the 
> > > resulting repository (and it's OK if a small percentage end up misleading 
> > > because someone used the wrong PR number, sometimes people use the wrong 
> > > commit message or commit changes they didn't mean to and so having some 
> > > misleading messages is unavoidable).
> > 
> > As long as the original commit message is kept, verbatim, and you only
> > add a new summary line, all is fine.  If not -> nope, not okay.
> Sorry, have to disagree here.  I think what Richard has done is a
> significant step forward. 

We talked about it for days, and as far as I understand it Richard agreed.

But, there is no way I can verify this yet, or is there?  Is there a repo
we can look at?  Something close to final.

> > > * As we're part of the free software community as a whole rather than 
> > > something in isolation, choosing to make a general-purpose tool work for 
> > > our conversion is somewhat preferable to choosing an ad hoc approach 
> > > because it contributes something of value for other repository conversions 
> > > by other projects in future.
> > 
> > This, I don't agree with at all either: having some lean-and-mean
> > scripts that worked for the GCC conversion will be at least as helpful
> > for another conversion as a "general" tool that first requires you to
> > build a custom machine before you can use it properly, would be.
> > 
> > 
> > Anyway: yes, please verify all conversion candidates for your criteria.
> > Thanks!
> And if they're the same, then I'm still going to prefer reposurgeon.

But they aren't the same, so you will have to make an actual choice,
based on actual data ;-)

> So unless there's something  Maxim's scripts are getting right that
> aren't by reposurgeon, then reposurgeon is the right choice.

... in your opinion.

Anyway, I'd love too hear Richard's input too, but we will have to wait
for that a few days.

The quality of the conversion should be judged by the output of it, not
by anything else.  You do not want to see how sausage is made, as they
say.  The GCC community has nothing to gain from a generic conversion
tool: what we want and need is *one* conversion, and it should be a
good one.

So let us compare *that*!


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 21:59             ` Segher Boessenkool
  2019-12-16 22:14               ` Jeff Law
@ 2019-12-16 23:18               ` Joseph Myers
  2019-12-16 23:19               ` Eric S. Raymond
  2 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 23:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc, esr

On Mon, 16 Dec 2019, Segher Boessenkool wrote:

> Hi,
> 
> On Mon, Dec 16, 2019 at 05:07:48PM +0000, Joseph Myers wrote:
> > Yes.  It should be possible to confirm branch tip conversions and other 
> > properties of the repository (e.g. that all branch tips are properly 
> > descended from the first commit on trunk except for the specific branches 
> > that shouldn't be) once my current conversions have finished running.
> 
> Please do that for Maxim's conversion as well then?
> 
> (If the way you do the verification requires reposurgeon, the
> verification methodology itself is fatally flawed).

It does not require reposurgeon.  The inputs for verification of branch 
tips are (a) a list of correspondences between SVN paths and git refs and 
(b) the SVN revision number at which those refs should correspond to those 
SVN paths.  A list can readily be generated for any git repository not 
using too complicated a mapping from SVN branch names (with a more 
complicated mapping, it's natural for the process modifying the names also 
to generate the list for use in verification).

> > * Richard's improvements to commit messages are a great improvement to the 
> > resulting repository (and it's OK if a small percentage end up misleading 
> > because someone used the wrong PR number, sometimes people use the wrong 
> > commit message or commit changes they didn't mean to and so having some 
> > misleading messages is unavoidable).
> 
> As long as the original commit message is kept, verbatim, and you only
> add a new summary line, all is fine.  If not -> nope, not okay.

That is how it works.  A new summary line is added, with the original 
message kept verbatim after that.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 21:59             ` Segher Boessenkool
  2019-12-16 22:14               ` Jeff Law
  2019-12-16 23:18               ` Joseph Myers
@ 2019-12-16 23:19               ` Eric S. Raymond
  2019-12-18 17:27                 ` Segher Boessenkool
  2 siblings, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 23:19 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Joseph Myers, Jeff Law, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Segher Boessenkool <segher@kernel.crashing.org>:
> There is absolutely no reason to trust a system that supposedly was
> already very mature, but that required lots of complex modifications,
> and even a complete rewrite in a different language, that even has its
> own bug tracker, to work without problems (although we all have *seen*
> some of its many problems over the last years), and at the same time
> bad-mouthing simple scripts that simply work, and have simple problems.

Some factual corrections:

I didn't port to Go to fix bugs, I ported for better performance.
Python is a wonderful language for prototyping a tool like this, but
it's too slow and memory-hungry for use at the GCC conversion's
scale.  Also doesn't parallelize worth a damn.

I very carefully *didn't* bad-mouth Maxim's scripts - in facrt I have
said on-list that I think his approach is on the whole pretty
intelligent. To anyone who didn't have some of the experiences I have
had, even using git-svn to analyze basic blocks would appear
reasonable and I don't actually fault Maxim for it.

I *did* bad-mouth git-svn - and I will continue to do so until it no
longer troubles the world with botched conversions.  Relying on it is,
in my subject-matter-expert opinion, unacceptably risky. While I don't
blame Maxim for not being aware of this, it remains a serious
vulnerability in his pipeline.

I don't know how it is on your planet, but here on Earth having a
bug tracker - and keeping it reasonably clean - is generally 
considered a sign of responsible maintainership.

In conclusion, I'm happy that you're so concerned about bugs in
reposurgeon. I am too. You're welcome to file issues and help us
improve our already-extensive test suite by shipping us dumps that
produce errors.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 22:42                 ` Segher Boessenkool
@ 2019-12-16 23:26                   ` Joseph Myers
  2019-12-16 23:44                     ` Eric S. Raymond
  2019-12-18 18:07                   ` Jeff Law
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-16 23:26 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc, esr

On Mon, 16 Dec 2019, Segher Boessenkool wrote:

> We talked about it for days, and as far as I understand it Richard agreed.
> 
> But, there is no way I can verify this yet, or is there?  Is there a repo
> we can look at?  Something close to final.

The conversion run I started this afternoon is now running 
git-fast-import.  That part takes about four hours because that's how long 
git-fast-import takes to process a 240 GB file with all the GCC history 
(and then git gc --aggressive, to get the repository down to a reasonable 
size, takes an hour or two).

So tomorrow I expect to have a repository ready to make available.  (In 
fact, two variants - one with branch and tag names essentially as in SVN, 
one with rearrangements done as suggested by Richard.)  And that will be 
along with the results of validation, which I hope will be clean as we've 
fixed all the issues shown up in previous rounds of validation.

This includes the part of Richard's commit message improvements deduced 
purely from the existing commit messages.  It doesn't include the part 
based on Bugzilla data (because the attempt to download the data from 
Bugzilla fell over).  I expect the next conversion run, started after that 
one finishes, to include both parts of Richard's commit message 
improvements, as well as an improvement to commit attribution extraction 
from ChangeLog files (to include attributions from ChangeLog.<branch> 
files, not just plain ChangeLog).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 22:14               ` Jeff Law
  2019-12-16 22:42                 ` Segher Boessenkool
@ 2019-12-16 23:34                 ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 23:34 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Jeff Law <law@redhat.com>:
> So unless there's something  Maxim's scripts are getting right that
> aren't by reposurgeon, then reposurgeon is the right choice.

It is still possible that the scripts could get things right that
reposurgeon doesn't. But the reverse question is also valid. Can
Maxim's scripts get everything right that reposurgeon does?

If anyone wants to audit for that, my test suite is open source.  May
the best program win!
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 23:26                   ` Joseph Myers
@ 2019-12-16 23:44                     ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-16 23:44 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Segher Boessenkool, Jeff Law, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Joseph Myers <joseph@codesourcery.com>:
>                     I expect the next conversion run, started after that 
> one finishes, to include both parts of Richard's commit message 
> improvements, as well as an improvement to commit attribution extraction 
> from ChangeLog files (to include attributions from ChangeLog.<branch> 
> files, not just plain ChangeLog).

There is also a known but minor bug in ChangeLog mining at branch roots.
I'm working on that and expect to have a fix shortly.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 23:19               ` Eric S. Raymond
@ 2019-12-18 17:27                 ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-18 17:27 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Joseph Myers, Jeff Law, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, Dec 16, 2019 at 06:19:26PM -0500, Eric S. Raymond wrote:
> Segher Boessenkool <segher@kernel.crashing.org>:
> > There is absolutely no reason to trust a system that supposedly was
> > already very mature, but that required lots of complex modifications,
> > and even a complete rewrite in a different language, that even has its
> > own bug tracker, to work without problems (although we all have *seen*
> > some of its many problems over the last years), and at the same time
> > bad-mouthing simple scripts that simply work, and have simple problems.
> 
> Some factual corrections:
> 
> I didn't port to Go to fix bugs, I ported for better performance.

That is not a correction, because that is not what I said.

> I very carefully *didn't* bad-mouth Maxim's scripts - in facrt I have
> said on-list that I think his approach is on the whole pretty
> intelligent. To anyone who didn't have some of the experiences I have
> had, even using git-svn to analyze basic blocks would appear
> reasonable and I don't actually fault Maxim for it.

And yet, you do it once again now.

Judge the conversion candidates by what they are, not by if the tool
to create them is named "reposurgeon" or not.

> I *did* bad-mouth git-svn - and I will continue to do so until it no
> longer troubles the world with botched conversions.  Relying on it is,
> in my subject-matter-expert opinion, unacceptably risky. While I don't
> blame Maxim for not being aware of this, it remains a serious
> vulnerability in his pipeline.

More unfounded aspersions.  Yay.

> I don't know how it is on your planet, but here on Earth having a
> bug tracker - and keeping it reasonably clean - is generally 
> considered a sign of responsible maintainership.

Having a bugtracker is a sign of having more bugs than you can count on
one hand.

> In conclusion, I'm happy that you're so concerned about bugs in
> reposurgeon.

I am concerned if the conversion we eeventually select will be usable
for our purposes.  And I am concerned it will take some more years.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 22:42                 ` Segher Boessenkool
  2019-12-16 23:26                   ` Joseph Myers
@ 2019-12-18 18:07                   ` Jeff Law
  2019-12-18 18:24                     ` Joseph Myers
                                       ` (2 more replies)
  1 sibling, 3 replies; 198+ messages in thread
From: Jeff Law @ 2019-12-18 18:07 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

On Mon, 2019-12-16 at 16:42 -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Dec 16, 2019 at 03:13:49PM -0700, Jeff Law wrote:
> > On Mon, 2019-12-16 at 15:59 -0600, Segher Boessenkool wrote:
> > > In particular, we should look carefully at the commit 
> > > > attributions in both conversions and Maxim's may well give ideas for 
> > > > improving the reposurgeon changelogs command (Richard came up with three 
> > > > ideas recently, which I've just filed in the reposurgeon issue tracker).  
> > > > But I also think:
> > > > 
> > > > * reposurgeon is a safer approach than ad hoc scripts, provided we get 
> > > > clean verification of basic properties such as branch tip contents.
> > > 
> > > I totally do not agree.  Black boxes are not safe.  *New* black boxes
> > > are even worse.
> > > 
> > > I trust scripts that have low internal complexity much better.
> > Perhaps.  But there's also limits to what scripts can do.
> 
> Absolutely.  Just this trust in complicated things is very misplaced, in
> my opinion.
It's as much about trusting the people as much as the tools.  While I
don't have the opportunity to work with Joseph all that much, I have
seen his work and attention to detail for 20+ years.   When he says
that, in his opinion, the reposurgeon conversion is already a better
conversion than Maxim's conversion, I absolutely trust him. 
Furthermore I trust that if there are significant issues that he'll
engaged to fix them.


> > > There is absolutely no reason to trust a system that supposedly was
> > > already very mature, but that required lots of complex modifications,
> > > and even a complete rewrite in a different language, that even has its
> > > own bug tracker, to work without problems (although we all have *seen*
> > > some of its many problems over the last years), and at the same time
> > > bad-mouthing simple scripts that simply work, and have simple problems.
> > I'd disagree.  THe experience and testsuites from that system are a
> > significant benefit. 
> 
> That isn't what I said.  I said that freshly constructed complex software
> will have more and deeper errors than stupid simple scripts do (or I
> implied that at least, maybe it wasn't clear).  And I only say this
> because the opposite was claimed, which is laughable imnsho.
But it's not that freshly constructed, at least not in my mind.  All
the experience ESR has from the python implementation carries to the Go
implementation.

And the "simple scripts" argument dismisses the fact that those scripts
are built on top of complex software.  It just doesn't hold water IMHO.

Where I think we could have done better would have been to get more
concrete detail from ESR about the problems with git-svn.  That was
never forthcoming and it's a disappointment.  Maybe some of the recent
discussions are in fact related to these issues and I simply missed
that point.

I do think we've gotten some details about the "scar tissue" from the
cvs->svn transition as well as some of our branch problems.  It's my
understanding reposurgeon cleans this up significantly whereas Maxim's
scripts don't touch this stuff IIUC.

> 
> > > > * Richard's improvements to commit messages are a great improvement to the 
> > > > resulting repository (and it's OK if a small percentage end up misleading 
> > > > because someone used the wrong PR number, sometimes people use the wrong 
> > > > commit message or commit changes they didn't mean to and so having some 
> > > > misleading messages is unavoidable).
> > > 
> > > As long as the original commit message is kept, verbatim, and you only
> > > add a new summary line, all is fine.  If not -> nope, not okay.
> > Sorry, have to disagree here.  I think what Richard has done is a
> > significant step forward. 
> 
> We talked about it for days, and as far as I understand it Richard agreed.
When Richard and I spoke we generally agreed that we felt a reposurgeon
conversion, if it could be made to work was the preferred solution,
followed by Maxim's approach and lastly the existing git-svn mirror. 
If I'm mis-representing Richard's position I hope he'll chime in and
correct the record.

> 
> But, there is no way I can verify this yet, or is there?  Is there a repo
> we can look at?  Something close to final.
I think Joseph posted something good enough to verify.  There's still
work going on, but I'd consider the outstanding issues nits and well
within the scope of what can reasonably still be changing.  I would
have had a similar position for Maxim's scripts if there were minor
changes still happening with them.


Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 18:07                   ` Jeff Law
@ 2019-12-18 18:24                     ` Joseph Myers
  2019-12-19  0:57                       ` Eric S. Raymond
  2019-12-18 19:50                     ` Segher Boessenkool
  2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
  2 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-18 18:24 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

On Wed, 18 Dec 2019, Jeff Law wrote:

> > That isn't what I said.  I said that freshly constructed complex software
> > will have more and deeper errors than stupid simple scripts do (or I
> > implied that at least, maybe it wasn't clear).  And I only say this
> > because the opposite was claimed, which is laughable imnsho.
> But it's not that freshly constructed, at least not in my mind.  All
> the experience ESR has from the python implementation carries to the Go
> implementation.
> 
> And the "simple scripts" argument dismisses the fact that those scripts
> are built on top of complex software.  It just doesn't hold water IMHO.

Nor do I think reposurgeon (or at least the SVN reader, which is the main 
part engaged here) is significantly more complicated than implied by the 
task it's performing of translating between the different conceptual 
models of SVN and git.  I've found it straightforward to produce reduced 
testcases for issues found, and fixed several of them myself despite not 
actually knowing Go.  The issues remaining are generally conceptually 
straightforward to understand the issue and how to fix it.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 18:07                   ` Jeff Law
  2019-12-18 18:24                     ` Joseph Myers
@ 2019-12-18 19:50                     ` Segher Boessenkool
  2019-12-18 20:43                       ` Jeff Law
  2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
  2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
  2 siblings, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-18 19:50 UTC (permalink / raw)
  To: Jeff Law
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

On Wed, Dec 18, 2019 at 11:07:11AM -0700, Jeff Law wrote:
> > That isn't what I said.  I said that freshly constructed complex software
> > will have more and deeper errors than stupid simple scripts do (or I
> > implied that at least, maybe it wasn't clear).  And I only say this
> > because the opposite was claimed, which is laughable imnsho.
> But it's not that freshly constructed, at least not in my mind.  All
> the experience ESR has from the python implementation carries to the Go
> implementation.

What, writing code in Python made him learn Go?

> And the "simple scripts" argument dismisses the fact that those scripts
> are built on top of complex software.  It just doesn't hold water IMHO.

This is the Unix philosophy though!

> Where I think we could have done better would have been to get more
> concrete detail from ESR about the problems with git-svn.  That was
> never forthcoming and it's a disappointment.

Yes.  And as far as I can see you can wait forever for it.  Oh well, we
have a lot of experience in waiting.

> I do think we've gotten some details about the "scar tissue" from the
> cvs->svn transition as well as some of our branch problems.  It's my
> understanding reposurgeon cleans this up significantly whereas Maxim's
> scripts don't touch this stuff IIUC.

They do, I think?  This was easy to do, too:
git://git.linaro.org/people/maxim-kuvyrkov/gcc-reparent.git/

> > > > As long as the original commit message is kept, verbatim, and you only
> > > > add a new summary line, all is fine.  If not -> nope, not okay.
> > > Sorry, have to disagree here.  I think what Richard has done is a
> > > significant step forward. 
> > 
> > We talked about it for days, and as far as I understand it Richard agreed.
> When Richard and I spoke we generally agreed that we felt a reposurgeon
> conversion, if it could be made to work was the preferred solution,
> followed by Maxim's approach and lastly the existing git-svn mirror. 
> If I'm mis-representing Richard's position I hope he'll chime in and
> correct the record.

This is just about the "we should not try to change the commit message",
and Joseph confirmed that is what is done now.  So that is all fine.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 19:50                     ` Segher Boessenkool
@ 2019-12-18 20:43                       ` Jeff Law
  2019-12-20 16:28                         ` Segher Boessenkool
  2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Jeff Law @ 2019-12-18 20:43 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

On Wed, 2019-12-18 at 13:50 -0600, Segher Boessenkool wrote:
> On Wed, Dec 18, 2019 at 11:07:11AM -0700, Jeff Law wrote:
> > > That isn't what I said.  I said that freshly constructed complex software
> > > will have more and deeper errors than stupid simple scripts do (or I
> > > implied that at least, maybe it wasn't clear).  And I only say this
> > > because the opposite was claimed, which is laughable imnsho.
> > But it's not that freshly constructed, at least not in my mind.  All
> > the experience ESR has from the python implementation carries to the Go
> > implementation.
> 
> What, writing code in Python made him learn Go?
?!?  What does that question have to do with anything?

My point is the experience of writing reposurgeon and in particular
being intimately familiar with what does on under the hood inside CVS,
SVN and GIT has great value, particularly in converting a large repo
like ours that has warts from the CVS and SVN days as well as warts
from the CVS->SVN conversion.


> > And the "simple scripts" argument dismisses the fact that those scripts
> > are built on top of complex software.  It just doesn't hold water IMHO.
> 
> This is the Unix philosophy though!
But your comment doesn't address the fact that in both cases,
reposurgeon and Maxim's scripts, there's complex code involved.  In
Maxim's case it's just under the covers.

Ultimately I don't care about the Unix philosophy.  I'm pragmatic.  If
reposurgeon gives us a better conversion, and it sounds very much like
it already does, then the fact that it doesn't follow the Unix
philosophy is irrelevant to me.



> 
> > Where I think we could have done better would have been to get more
> > concrete detail from ESR about the problems with git-svn.  That was
> > never forthcoming and it's a disappointment.
> 
> Yes.  And as far as I can see you can wait forever for it.  Oh well, we
> have a lot of experience in waiting.
Umm, no, I'm not suggesting we wait in any way at all.  Based on what
I've heard from Joseph, I'd vote today to go with reposurgeon as soon
as it's convenient for the people doing the conversion and our
development cycle.

This highlights one big issue that we have as a project.  Specifically
that we don't have a clear cut way to make these kinds of technical
decisions when there isn't unanimous consent.

> 
> > I do think we've gotten some details about the "scar tissue" from the
> > cvs->svn transition as well as some of our branch problems.  It's my
> > understanding reposurgeon cleans this up significantly whereas Maxim's
> > scripts don't touch this stuff IIUC.
> 
> They do, I think?  This was easy to do, too:
> git://git.linaro.org/people/maxim-kuvyrkov/gcc-reparent.git/
Good.  Thanks for clarifying.

> 
> > > > > As long as the original commit message is kept, verbatim, and you only
> > > > > add a new summary line, all is fine.  If not -> nope, not okay.
> > > > Sorry, have to disagree here.  I think what Richard has done is a
> > > > significant step forward. 
> > > 
> > > We talked about it for days, and as far as I understand it Richard agreed.
> > When Richard and I spoke we generally agreed that we felt a reposurgeon
> > conversion, if it could be made to work was the preferred solution,
> > followed by Maxim's approach and lastly the existing git-svn mirror. 
> > If I'm mis-representing Richard's position I hope he'll chime in and
> > correct the record.
> 
> This is just about the "we should not try to change the commit message",
> and Joseph confirmed that is what is done now.  So that is all fine.
OK
jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 18:07                   ` Jeff Law
  2019-12-18 18:24                     ` Joseph Myers
  2019-12-18 19:50                     ` Segher Boessenkool
@ 2019-12-19  0:46                     ` Eric S. Raymond
  2 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-19  0:46 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Jeff Law <law@redhat.com>:
> But it's not that freshly constructed, at least not in my mind.  All
> the experience ESR has from the python implementation carries to the Go
> implementation.

Not only do you have reposurgeon, you have me. I wish this mattered
less than it does.

I have *far* more experience doing big, messy repository moves than
anybody else.  I try to exteriorize that knowledge into the
reposurgeon code and documents as much as I can, but as with other
kinds of expertise a lot of it is implicit knowledge that is only
elicited by practice and answering questions.

On small conversions of clean repositories such implicit expertise
doesn't matter too much. You may be able to pull off a once-and-done
with the tools, especially if they're my tools and you've read all my
stuff on good practice.

As an example, the CVS-to-git conversion of groff didn't really need
me. Lifts from CVS are normally horrible, but the groff devs were the
best I've ever seen at not leaving debris from operator errors in the
history.  Any of them could have read my docs and done a clean
coversion in two hours. Only...there was no way to way to know that in
advance. The odds were heavily against it.

Emacs was, and GCC is, the messy opposite case.  You guys needed a
seasoned "I know these things so you don't have to" expert more than
you will probably ever really understand. And, sadly, there aren't any
others but me yet.  Nobody else has been interested enough in the
problem to invest the time.

> Where I think we could have done better would have been to get more
> concrete detail from ESR about the problems with git-svn.  That was
> never forthcoming and it's a disappointment.  Maybe some of the recent
> discussions are in fact related to these issues and I simply missed
> that point.

I posted this link before: http://esr.ibiblio.org/?p=6778

I can't actually tell you much more than that. Actually, if I
understood git-svn's failure modes in enough detail to tell you more I
might be less frightened of it.

Mostly what I know is that during several other conversions I have
stumbled across trails of metadata damage for which use of git-svn
seems to have been to blame. Though, admittedly, I'm not certain of
that in any individual case; the ways git-svn screws up are not
necessarily disinguishable from the aftereffects of cvs2svn conversion
damage, or from normal kinds of operator error.

Overall, though, defect rates seemed noticeably higher when git-svn had
been used as a front end. I learned to flinch when people wanting me
to do a full conversion of an SVN repo admitted git-svn had been deployed,
even though I was hard-put to explain why I was flinching.

> I do think we've gotten some details about the "scar tissue" from the
> cvs->svn transition as well as some of our branch problems.  It's my
> understanding reposurgeon cleans this up significantly whereas Maxim's
> scripts don't touch this stuff IIUC.

That's correct.  And again, no blame to Maxim for this; he took a
conventional approach that does as little analysis as it can get away
with, which can be a good tradeoff on smaller, cleaner repositories without
a CVS back-history.

>                                                      There's still
> work going on, but I'd consider the outstanding issues nits and well
> within the scope of what can reasonably still be changing.

Issue list here: 

https://gitlab.com/esr/reposurgeon/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=GCC

Presently 6 items including 2 bugs. One of those bugs may already be
fixed, we're waiting on Joseph's current conversion to see.

Counting time do all the RFEs requested, polishing, and final review
I think we're looking at another week, maybe a bit less if things go
well.  You guys could get a final conversion under your Yule tree.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 18:24                     ` Joseph Myers
@ 2019-12-19  0:57                       ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-19  0:57 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jeff Law, Segher Boessenkool, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Joseph Myers <joseph@codesourcery.com>:
> Nor do I think reposurgeon (or at least the SVN reader, which is the main 
> part engaged here) is significantly more complicated than implied by the 
> task it's performing of translating between the different conceptual 
> models of SVN and git.  I've found it straightforward to produce reduced 
> testcases for issues found, and fixed several of them myself despite not 
> actually knowing Go.  The issues remaining are generally conceptually 
> straightforward to understand the issue and how to fix it.

Let me note for the record that I found Joseph's ability to find and
fix bugs in the reader quite impressive.

Maybe not as impressive as it would have been before the recent
rewrite.  That code used to be a pretty nasty hairball.  It's a lot
cleaner and easier to understand now.

But impedence-matching the two data models is tricky, subtler than it
looks, and has rebarbative edge cases.  Even given the ckeanest
possible implementatiion, troubleshooting it is no mean feat.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Unix philosopy vs. poor semantic locality
  2019-12-18 19:50                     ` Segher Boessenkool
  2019-12-18 20:43                       ` Jeff Law
@ 2019-12-19  2:34                       ` Eric S. Raymond
  2019-12-19  3:16                         ` Joseph Myers
  1 sibling, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-19  2:34 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

[New thread]

Segher Boessenkool <segher@kernel.crashing.org>:
> > And the "simple scripts" argument dismisses the fact that those scripts
> > are built on top of complex software.  It just doesn't hold water IMHO.
> 
> This is the Unix philosophy though!

I'm now finishing a book in which I have a lot to say about this, inspired
in part by experience with reposurgeon.

One of the major concepts I introduce in the book is "semantic
locality".  This is a property of data representations and structures.
A representation has good semantic locality when the context you need
to interpret any individual part of it is reliably nearby.

A classic example of a representation wth good semantic locality is a Unix
password file.  All the information associated with a username is 
on one line. It is accordingly easy to parse and extract individual 
records.

Databases have very poor semantic locality.  So do version-control
systems.  You need a lot of context to understand any individual data
element, and that context can be arbitrarily far away in terms of
retrieval complexity and time.

The Unix philosophy of small loosely-coupled tools has few more
fervent advocates than me. But I have come to understand that
it almost necessarily fails in the presence of data representations
with poor semantic locality.

This contraint can be inverted and used as a guide to good design: 
to enable loose coupling, design your representations to have
good semantic locality.

If the Unix password file were an SQL database, could you grep it?
No. You'd have to use an SQL-specific query method rather than a
generic utility like grep that is uncoupled from the specifics of
the database's schema.

The ideal data representation for enabling the Unix ecology of tools
is textual, self-describing, and has good semantic locality.

Historically, Unix programmers have understood the importance of
textuality and self-description.  But we've lacked the concept of
and a term for semantic locality.  Having that allows one to
talk about some things that were hard to even notice before.

Here's one: the effort required to parallelize an operation on
a data structure is inversely proportional to its semantic locality.

If it has good semantic locality, you can carve it into pieces that
are easy units of work for parallelization.  If it doesn't...you
can't. Best case is you'll need locking for shared parts. Worst case
is that the referential structure of the representation is so
tangled that you can't parallelize at all.

Version-control systems rely on data structures with very poor
semantic locality.  It is therefore predictable that attacking them
with small unspecialized tools and scripting is...difficult.

It can be done, sometimes, with sufficient cleverness, but the results
are too often like making a pig fly by strapping JATO units to
it. That is to say: a brief and glorious ascent followed by entirely
predictable catastrophe.

Having trouble believing me?  OK, here's a challenge: rewrite GCC's
code-generation stage in awk/sed/m4.  

The attempt, if you actually made it, would teach you that poor
semantic locality forces complexity on the tools that have to deal
with it.

And that, ladies and gentlemen, is why reposurgeon has to be as
large and complex as it is.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Unix philosopy vs. poor semantic locality
  2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
@ 2019-12-19  3:16                         ` Joseph Myers
  2019-12-19  5:46                           ` Eric S. Raymond
  0 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-19  3:16 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Segher Boessenkool, Jeff Law, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Wed, 18 Dec 2019, Eric S. Raymond wrote:

> And that, ladies and gentlemen, is why reposurgeon has to be as
> large and complex as it is.

And, in the end, it *is* complex software on which you build simple 
scripts.  gcc.lift is a simple script, written in the domain-specific 
reposurgeon language.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Unix philosopy vs. poor semantic locality
  2019-12-19  3:16                         ` Joseph Myers
@ 2019-12-19  5:46                           ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-19  5:46 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Segher Boessenkool, Jeff Law, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Joseph Myers <joseph@codesourcery.com>:
> On Wed, 18 Dec 2019, Eric S. Raymond wrote:
> 
> > And that, ladies and gentlemen, is why reposurgeon has to be as
> > large and complex as it is.
> 
> And, in the end, it *is* complex software on which you build simple 
> scripts.  gcc.lift is a simple script, written in the domain-specific 
> reposurgeon language.

The Patterns crowd speaks of "alternating hard and soft layers".

The design of reposurgeon was driven by two insights:

1. Previous VCS-conversion tools sucked in part because they tried to
be too automatic, eliminating human judgment. Repposurgeon is designed
and intended to be a *judgment amplifier*, doing mechanics and freeing
the human operator to think about conversion policy. Hence the DSL.

2. git fast-import streams are a pretty capable format for interchanging
version-control histories. Not perfect, but  good enough that you can gain
a lot by co-opting existing importers and exporters.

Mate the idea of a judgment-amplifying DSL to a structure editor for
git fast-import streams and reposurgeon is what you get.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-18 20:43                       ` Jeff Law
@ 2019-12-20 16:28                         ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-20 16:28 UTC (permalink / raw)
  To: Jeff Law
  Cc: Joseph Myers, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc, esr

Hi!

On Wed, Dec 18, 2019 at 01:43:19PM -0700, Jeff Law wrote:
> On Wed, 2019-12-18 at 13:50 -0600, Segher Boessenkool wrote:
> > On Wed, Dec 18, 2019 at 11:07:11AM -0700, Jeff Law wrote:
> > > > That isn't what I said.  I said that freshly constructed complex software
> > > > will have more and deeper errors than stupid simple scripts do (or I
> > > > implied that at least, maybe it wasn't clear).  And I only say this
> > > > because the opposite was claimed, which is laughable imnsho.
> > > But it's not that freshly constructed, at least not in my mind.  All
> > > the experience ESR has from the python implementation carries to the Go
> > > implementation.
> > 
> > What, writing code in Python made him learn Go?
> ?!?  What does that question have to do with anything?

There is a lot more needed to write reliable programs than just domain
knowledge.  git-svn is used for this exact purpose (converting svn
commits to git commits) millions of times per day, for I-don't-know-
how long already.  Yes, I trust that better than newly written code.

The point is completely moot if we actually verify and compare the
resulting trees, of course.

> Ultimately I don't care about the Unix philosophy.  I'm pragmatic.  If
> reposurgeon gives us a better conversion, and it sounds very much like
> it already does, then the fact that it doesn't follow the Unix
> philosophy is irrelevant to me.

Exactly the same here!

But we need to look at the actual candidate conversions to determine this.
Not just say "I like X better than Y".  That is at best subjective; we can
do better than that.

> > > Where I think we could have done better would have been to get more
> > > concrete detail from ESR about the problems with git-svn.  That was
> > > never forthcoming and it's a disappointment.
> > 
> > Yes.  And as far as I can see you can wait forever for it.  Oh well, we
> > have a lot of experience in waiting.
> Umm, no, I'm not suggesting we wait in any way at all.

And neither am I.  I don't wait for things I do not expect to happen.
And I want a Git conversion soon, not wait more years for it.

> Based on what
> I've heard from Joseph, I'd vote today to go with reposurgeon as soon
> as it's convenient for the people doing the conversion and our
> development cycle.
> 
> This highlights one big issue that we have as a project.  Specifically
> that we don't have a clear cut way to make these kinds of technical
> decisions when there isn't unanimous consent.

This isn't a technical decision really.  Both candidate conversions are
perfect technically already (or will be soon).  All that is left is a)
aesthetics, so everyone wants something else; b) some people are dead
set against falsifying history (including me), while other people think
something that looks slightly better is more important; and c) what tags
and branches do we not carry over from svn at all?  We'll keep the svn
repo around as well, anyway.

If Joseph and Richard agree a candidate is good, then I will agree as
well.  All that can be left is nit-picking, and that is not worth it
anyway: the repository will not be perfect no matter what, people have
made mistakes, we can only fix some superficial ones.  Some of those
are practically important (because they are annoying), but most are not.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 17:40                     ` Jeff Law
@ 2019-12-25  8:12                       ` Alexandre Oliva
  2019-12-25 12:07                         ` Eric S. Raymond
  2019-12-25 12:10                         ` Segher Boessenkool
  0 siblings, 2 replies; 198+ messages in thread
From: Alexandre Oliva @ 2019-12-25  8:12 UTC (permalink / raw)
  To: Jeff Law
  Cc: Segher Boessenkool, Joseph Myers, Eric S. Raymond, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Dec 16, 2019, Jeff Law <law@redhat.com> wrote:

> Yet Joseph just indicated today Maxim's conversion is missing some
> branches.  While I don't consider any of the missed branches important,
> others might.   More importantly, it raises the issue of what other
> branches might be missing and what validation work has been done on
> that conversion.

It also raises another issue, namely the ability to *incrementally* fix
such problems should we find them after the switch-over.

I've got enough experience with git-svn to tell that, if it missed a
branch for whatever reason, it is reasonably easy to create a
configuration that will enable it to properly identify the point of
creation of the branch, and bring in subsequent changes to the branch,
in such a way that the newly-converted branch can be easily pushed onto
live git so that it becomes indistinguishable from other branches that
had been converted before.

I know very little about reposurgeon, but I'm concerned that, should we
make the conversion with it, and later identify e.g. missed branches, we
might be unable to make such an incremental recovery.  Can anyone
alleviate my concerns and let me know we could indeed make such an
incremental recovery of a branch missed in the initial conversion, in
such a way that its commit history would be shared with that of the
already-converted branch it branched from?

Anyway, hopefully we won't have to go through that.  Having not just one
but *two* fully independent conversions of the SVN repo to GIT, using
different tools, makes it a lot less likely that whatever result we
choose contains a significant error, as both can presumably help catch
conversion errors in each other, and the odds that both independent
implementations make the same error are pretty thin, I'd think.

Now, would it be too much of a burden to insist that the commit graphs
out of both conversions be isomorphic, and maybe mappings between the
commit ids (if they can't be made identical to begin with, that is) be
generated and shared, so that the results of both conversions can be
efficiently and mechanically compared (disregarding expected
differences) not only in terms of branch and tag names and commit
graphs, but also tree contents, commit messages and any other metadata?
Has anything like this been done yet?

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25  8:12                       ` Alexandre Oliva
@ 2019-12-25 12:07                         ` Eric S. Raymond
  2019-12-25 12:24                           ` Segher Boessenkool
  2019-12-26  6:09                           ` Alexandre Oliva
  2019-12-25 12:10                         ` Segher Boessenkool
  1 sibling, 2 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-25 12:07 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Jeff Law, Segher Boessenkool, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Alexandre Oliva <oliva@gnu.org>:
> I know very little about reposurgeon, but I'm concerned that, should we
> make the conversion with it, and later identify e.g. missed branches, we
> might be unable to make such an incremental recovery.  Can anyone
> alleviate my concerns and let me know we could indeed make such an
> incremental recovery of a branch missed in the initial conversion, in
> such a way that its commit history would be shared with that of the
> already-converted branch it branched from?

Reposurgeon has a reparent command.  If you have determined that a
branch is detached or has an incorrect attachment point, patching the
metadata of the root node to fix that is very easy.

> Now, would it be too much of a burden to insist that the commit graphs
> out of both conversions be isomorphic, and maybe mappings between the
> commit ids (if they can't be made identical to begin with, that is) be
> generated and shared, so that the results of both conversions can be
> efficiently and mechanically compared (disregarding expected
> differences) not only in terms of branch and tag names and commit
> graphs, but also tree contents, commit messages and any other metadata?
> Has anything like this been done yet?

On the GCC repository, no. 

There are very serious practical problems with full verification of
git against SVN stemming mainly from the fact that Subversion checkout
on a respository of this size is extremely slow. IIRC Joseph at one
point estimated a check time on the order of months due to that
overhead alone.

If you're talking about a commit-by-commit comparison between two
conversions that assumes one or te other is correct, that is
theoretically possible and - because git retrieval is much faster -
could theoretically be done in a reasonable amount of time.  But there
is a lot of devil in the practical details.

The reposurgeon suite once included a tool for such comparisons.
Last year this happened:

commit b8a609925ba70a6b68f9eda1d748eb667ad2fa59
Author: Eric S. Raymond <esr@thyrsus.com>
Date:   Fri Aug 24 12:40:46 2018 -0400

    Retire repodiffer.  Its only use case was checks against git-svn...

    ...which we now know to make such bad conversions that on larger than trivial
    repos the differ would be prohibitively noisy.

Maxim's scripts probably make a better conversion than bare git-svn,
because he uses git-svn only for linear basic blocks and thereby
avoids its worst failure modes. In theory I could dust off repodiffer
and apply it.

That's in theory. In practice, on a repository this size I am not
greatly optimistic about getting a result that could be interpreted by
a Mark I brain.  The reasons go beyond git-svn's brain damage to the
same ontological-mismatch problems that make SVN-to-git conversion a 
headache in general.

You might think at least there'd be a 1:1 correspondence between
commits in the two conversions, but that's not going to be true for a
couple of different reasons.

1. Split commits. Reposurgeon decomposes these into pieces one per 
git branch.  I don't know what Maxim's scripts do.  I think Joseph turned
up that there are over a thousand of these in the GCC history.

2. There are three classes of commits in Subversion that don't really fit 
the git data model, (1) directory creation/deletion commits, (2) directory
copy commits, (3) property changes with no associated blob.

For each of these exceptional commits a converter to Git has a choice
of dropping the commit, turning it into some sort of annotated tag, or
leaving it in place as a zero-op commit (anomalous but not forbidden
in the git model). It is pretty much guaranteed that different
converters will make different choices about these, which will make
for huge amounts of noise in your attempt at a diff.

Checking for DAG isomorphism: again, theoretically possible,
practically pretty daunting.  It could be worse - general graph
isomorphism is not even known to be polynomial-time - but in this case
we can label corresponding commits with matching legacy IDs, which
should make possible an isomorphism check in linear time with a trivial
algorithm.

Well, except for split commits. That one would be solvable, albeit
painful.

The real problem here would be mergeinfo links.  It's not even obvious
what "correct" mapping of mergeinfo links is, in general, due to the
mismatch between Subversion's cherry-pick-based merge model and git's
branch merging.  Again, different converters will make different
choices. Reconciling them would be not fun.

There is another world of hurt lurking in "(disregarding expected
differences)".  How do you know what differences to expect? How are
you going to specify them?  What will interpret that spec?

There is more months of work here - nasty, wearing toil, with no
guarantee of a result with a decent signal-to-noise ratio.  Even
though I'm quite literally the best-qualified person on earth to do
it, I flinch at the thought.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25  8:12                       ` Alexandre Oliva
  2019-12-25 12:07                         ` Eric S. Raymond
@ 2019-12-25 12:10                         ` Segher Boessenkool
  2019-12-25 14:13                           ` Joseph Myers
  2019-12-29 16:47                           ` Mark Wielaard
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-25 12:10 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Jeff Law, Joseph Myers, Eric S. Raymond, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Hi!

On Wed, Dec 25, 2019 at 05:11:31AM -0300, Alexandre Oliva wrote:
> On Dec 16, 2019, Jeff Law <law@redhat.com> wrote:
> > Yet Joseph just indicated today Maxim's conversion is missing some
> > branches.  While I don't consider any of the missed branches important,
> > others might.   More importantly, it raises the issue of what other
> > branches might be missing and what validation work has been done on
> > that conversion.
> 
> It also raises another issue, namely the ability to *incrementally* fix
> such problems should we find them after the switch-over.
> 
> I've got enough experience with git-svn to tell that, if it missed a
> branch for whatever reason, it is reasonably easy to create a
> configuration that will enable it to properly identify the point of
> creation of the branch, and bring in subsequent changes to the branch,
> in such a way that the newly-converted branch can be easily pushed onto
> live git so that it becomes indistinguishable from other branches that
> had been converted before.

git-svn did not miss any branches.  Finding branches is not done by
git-svn at all, for this.  These branches were skipped because they
have nothing to do with GCC, have no history in common (they are not
descendants of revision 1).  They can easily be added -- Maxim might
already have done that, not sure, imo it's better to just drop the
garbage, it's in svn if anyone cares.

> I know very little about reposurgeon, but I'm concerned that, should we
> make the conversion with it, and later identify e.g. missed branches, we
> might be unable to make such an incremental recovery.  Can anyone
> alleviate my concerns and let me know we could indeed make such an
> incremental recovery of a branch missed in the initial conversion, in
> such a way that its commit history would be shared with that of the
> already-converted branch it branched from?

Git of course allows you to transplant whatever you want.  Whether it
is easy with reposurgeon to convert just some branches, I have no idea.
With some Git jiujitsu it can be done, of course.

> Anyway, hopefully we won't have to go through that.  Having not just one
> but *two* fully independent conversions of the SVN repo to GIT, using
> different tools, makes it a lot less likely that whatever result we
> choose contains a significant error, as both can presumably help catch
> conversion errors in each other, and the odds that both independent
> implementations make the same error are pretty thin, I'd think.

We need to make a good comparison between the two.  This is needed so
we can choose what conversion to use finally, but also to verify both
options (and various sub-options).

Ideally they will become identical :-)

> Now, would it be too much of a burden to insist that the commit graphs
> out of both conversions be isomorphic, and maybe mappings between the
> commit ids (if they can't be made identical to begin with, that is) be
> generated and shared,

Each conversion has a mapping of svn ids to git commits --  that is
part of the deliverable!

> so that the results of both conversions can be
> efficiently and mechanically compared (disregarding expected
> differences) not only in terms of branch and tag names and commit
> graphs, but also tree contents, commit messages and any other metadata?
> Has anything like this been done yet?

I haven't seen such a thing yet, no.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:07                         ` Eric S. Raymond
@ 2019-12-25 12:24                           ` Segher Boessenkool
  2019-12-25 14:16                             ` Joseph Myers
  2019-12-25 18:50                             ` Eric S. Raymond
  2019-12-26  6:09                           ` Alexandre Oliva
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-25 12:24 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Alexandre Oliva, Jeff Law, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Wed, Dec 25, 2019 at 07:07:47AM -0500, Eric S. Raymond wrote:
> For each of these exceptional commits a converter to Git has a choice
> of dropping the commit, turning it into some sort of annotated tag, or
> leaving it in place as a zero-op commit (anomalous but not forbidden
> in the git model).

Or doing what everyone else does: put an empty .gitignore file in
otherwise empty directories.  This is safe even if your conversion
creates such files from metadata (why on earth would you do that?!)


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:10                         ` Segher Boessenkool
@ 2019-12-25 14:13                           ` Joseph Myers
  2019-12-29 16:47                           ` Mark Wielaard
  1 sibling, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-25 14:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Alexandre Oliva, Jeff Law, Eric S. Raymond, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Wed, 25 Dec 2019, Segher Boessenkool wrote:

> git-svn did not miss any branches.  Finding branches is not done by
> git-svn at all, for this.  These branches were skipped because they
> have nothing to do with GCC, have no history in common (they are not
> descendants of revision 1).  They can easily be added -- Maxim might

Whether they have history in common is, in the case of 
libstdcxx_so_7-2-branch, a subjective question (that branch was created in 
SVN by creating the directory for the branch, then copying just the 
libstdc++-v3 subdirectory, so that subdirectory does have history in 
common with trunk).  In fact I've seen it go, in the reposurgeon 
conversion, from being an orphan branch to a non-orphan and back again, as 
details of the exact handling of empty commits (a commit that only created 
the directory for the branch, in this case) changed (and I've wondered 
about making the reposurgeon conversion explicitly reparent the commit 
that copies the libstdc++-v3 subdirectory, on the basis that being 
connected to the main history of trunk is more useful than not being so 
connected).

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:24                           ` Segher Boessenkool
@ 2019-12-25 14:16                             ` Joseph Myers
  2019-12-25 18:50                             ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-25 14:16 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Eric S. Raymond, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Wed, 25 Dec 2019, Segher Boessenkool wrote:

> On Wed, Dec 25, 2019 at 07:07:47AM -0500, Eric S. Raymond wrote:
> > For each of these exceptional commits a converter to Git has a choice
> > of dropping the commit, turning it into some sort of annotated tag, or
> > leaving it in place as a zero-op commit (anomalous but not forbidden
> > in the git model).
> 
> Or doing what everyone else does: put an empty .gitignore file in
> otherwise empty directories.  This is safe even if your conversion
> creates such files from metadata (why on earth would you do that?!)

An empty .gitignore is only needed in the case where the empty directory 
is genuinely needed.  I'm pretty sure that none of those in GCC are 
needed, especially given that building in the source tree has been 
discouraged for a very long time so an empty directory shouldn't be 
present for use in the build; the most common reason (empirically) to have 
an empty directory in the source tree is that someone deleted the contents 
using git-svn to push the commit back to SVN (which thus only deleted the 
contents, and not the directory itself, in SVN).

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:24                           ` Segher Boessenkool
  2019-12-25 14:16                             ` Joseph Myers
@ 2019-12-25 18:50                             ` Eric S. Raymond
  2019-12-25 19:18                               ` Segher Boessenkool
  1 sibling, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-25 18:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Alexandre Oliva, Jeff Law, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Segher Boessenkool <segher@kernel.crashing.org>:
> Or doing what everyone else does: put an empty .gitignore file in
> otherwise empty directories.

That is an ugly kludge that I will have no part of whatsoever.

Conversion artifacts like this are sources of cognitive friction and
confusion that take developers' attention away from the substantive
part of their work.  Each individual one may be minor, but the
cumulative effect can be a chronic distraction that us not less 
because developers are unware or ibly half-aware of it.

Thus, the goal of a repository converter should be to bridge smoothly
between the native idioms of the source and target systems,
*minimizing* conversion artifacts.

The ideal should be to produce a converted history that looks as much
as possible like it has always been under the target
system. Developers should have no need to know or care that the
history used to be managed differently unless they need to do
sonething that *unavoidably* crosses that boundary, like looking uo a
legacy ID grom an old bug report.

Reposurgeon was designed for this goal from the beginning.   
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 18:50                             ` Eric S. Raymond
@ 2019-12-25 19:18                               ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-25 19:18 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Alexandre Oliva, Jeff Law, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Wed, Dec 25, 2019 at 01:50:14PM -0500, Eric S. Raymond wrote:
> Segher Boessenkool <segher@kernel.crashing.org>:
> > Or doing what everyone else does: put an empty .gitignore file in
> > otherwise empty directories.
> 
> That is an ugly kludge that I will have no part of whatsoever.
> 
> Conversion artifacts like this

It's not a conversion artifact.  It's what people do if for whatever
reason they want to commit an empty directory.  It works, it is simple,
it doesn't conflict with other things, and above all, it is the common
way of handling this.

Of course since (as Joseph notes) we do not really care about having the
empty directories here at all, it is moot anyway.

> the goal of a repository converter

The only goal *we* (GCC) have is *one* converted repository.

> The ideal should be to produce a converted history that looks as much
> as possible like it has always been under the target
> system.

I don't know anyone who wants that, either.  Why would that be useful?

> Developers should have no need to know or care that the
> history used to be managed differently

Wait until they find out about changelogs.  A *much* bigger change, and
it will happen *later*, some time after the conversion to Git.

Since historical commit messages are pretty much non-existent (to say
it nicely), and it takes some sleuthing to line that up with the ML
discussions (which is the important part of archaeology!), the only
really useful part of most historical commits is the actual file
changes.  Which we hopefully have perfectly fine already, in all
candidate conversions.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:07                         ` Eric S. Raymond
  2019-12-25 12:24                           ` Segher Boessenkool
@ 2019-12-26  6:09                           ` Alexandre Oliva
  2019-12-26 11:04                             ` Joseph Myers
  2019-12-26 19:16                             ` Eric S. Raymond
  1 sibling, 2 replies; 198+ messages in thread
From: Alexandre Oliva @ 2019-12-26  6:09 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Jeff Law, Segher Boessenkool, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Dec 25, 2019, "Eric S. Raymond" <esr@thyrsus.com> wrote:

> Reposurgeon has a reparent command.  If you have determined that a
> branch is detached or has an incorrect attachment point, patching the
> metadata of the root node to fix that is very easy.

Thanks, I see how that can enable a missed branch to be converted and
added incrementally to a converted repo even after it went live, at
least as long as there aren't subsequent merges from a converted branch
to the missed one.  I don't quite see how this helps if there are,
though.

> If you're talking about a commit-by-commit comparison between two
> conversions that assumes one or te other is correct

Yeah, minus the assumption; the point would be to find errors in either
one, maybe even in both.  With git, given that the converted trees
should look the same, at least the tree diffs would likely be pretty
fast, since the top-level dir blobs are likely to be identical even if
the commits don't share the same hash, right?  And, should they differ,
a bisection to find the divergence point should be very fast too.

Could make it a requirement that at least the commits associated with
head branches and published tags compare equal in both conversions, or
that differences are known, understood and accepted, before we switch
over to either one?  Going over all corresponding commits might be too
much, but at least a representative random sample would be desirable to
check IMHO.

Of course, besides checking trees, it would be nice to compare metadata
as well.  Alas, the more either conversion diverges from the raw
metadata in svn, the harder it becomes to mechanically ignore expected
differences and identify unexpected ones.  Unless both conversions agree
on transformations to make, such metadata fixes end up conflicting with
the (proposed) goal of enabling mechanical verification of the
conversion results against each other.

> Well, except for split commits. That one would be solvable, albeit
> painful.

Even for split SVN commits, that will amount to at most one GIT commit
per GIT branch/tag, no?  That extra info should make it easy to identify
corresponding GIT commits between two conversions, so as to compare
trees, metadata and DAGs.

> The real problem here would be mergeinfo links.

*nod*.  I don't consider this all that important, considering that GIT
doesn't keep track of cherry-picks at all.  On the same note, it's nice
to identify merges, but since the info is NOT readily available in SVN,
it's arguably not essential that a SVN merge commit be represented as a
GIT merge commit rather than as multi cherry picking, at least provided
that merge metadata is somehow preserved/mapped across the conversion,
perhaps as GIT annotations or so.

I suppose if there are active branches that get merges frequently,
coming up with a merge parent that names at least the latest merged
commit would make the first merge after the transition a lot easier.

> There is another world of hurt lurking in "(disregarding expected
> differences)".  How do you know what differences to expect?

I was thinking someone would look at the differences, possibly
investigate a bit, and then decide whether they indicate a problem in
either conversion or something to be expected, ideally that could be
mechanically identified as expected in subsequent compares, until we
converge on a pair of conversions with only expected differences, if
any.

I suppose we're sort of doing that in a distributed but not very
organized fashion, as repos converted by both tools are made available
for assessment and verification.  Alas, the specification of expected
differences is not (to my knowledge) consolidated in a
publicly-available way, so there may be plenty of duplicate effort
filtering out differences that, if we organized the comparing effort by
sharing configuration data, scripts and tools to compare and to filter
out expected differences, we might be able to do that more efficiently.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26  6:09                           ` Alexandre Oliva
@ 2019-12-26 11:04                             ` Joseph Myers
  2019-12-26 11:17                               ` Jakub Jelinek
  2019-12-26 19:16                             ` Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 11:04 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Thu, 26 Dec 2019, Alexandre Oliva wrote:

> Could make it a requirement that at least the commits associated with
> head branches and published tags compare equal in both conversions, or
> that differences are known, understood and accepted, before we switch
> over to either one?  Going over all corresponding commits might be too

The checks I run on every conversion with reposurgeon include checking the 
tree contents at the tip of every (non-deleted) branch and tag agree with 
SVN (this check now includes checking that execute permissions match).  
Empty directories are removed from SVN checkouts before that comparison; 
.gitignore files are ignored because of those generated automatically by 
reposurgeon from svn:ignore properties (but the conversion is set to 
prefer .gitignore files checked into SVN where they exist, and empirically 
that works as expected, so differences relating to automatically-generated 
.gitignore files are only relevant to older branches).  Two branches 
(c++-modules and melt-branch) have some files with SVN keyword expansion 
enabled, which causes expected differences in such comparisons.

The scripts used for those checks are checked into the gcc-conversion 
repository; the input needed is a mapping from SVN branch / tag paths to 
git refs (along with the SVN revision number the branch tips should 
match).  The main thing that consumes time in the checks is switching SVN 
checkouts to a different branch.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 11:04                             ` Joseph Myers
@ 2019-12-26 11:17                               ` Jakub Jelinek
  2019-12-26 12:10                                 ` Joseph Myers
                                                   ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Jakub Jelinek @ 2019-12-26 11:17 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alexandre Oliva, Eric S. Raymond, Jeff Law, Segher Boessenkool,
	Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
Is there some easy way (e.g. file in the conversion scripts) to correct
spelling and other mistakes in the commit authors?
E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
Jakub Jakub Jelinek (1):
Jakub Jeilnek (1):
Jelinek (1):
entries next to the expected one with most of the commits.
For the misspellings, wonder if e.g. we couldn't compute edit distances from
other names and if we have one with many commits and then one with very few
with small edit distance from those, flag it for human review.

Or I see in git shortlog parts of date being parsed as name, e.g.
(basically anything in git shortlog after the "..." wrapped names and before
Aaron Conole (2): in alphabetical sorting, or after Zuxy Meng (4):.
00:27 -0700  Zack Weinberg (1):
      c-typeck.c (c_expand_start_case): Return immediately if exp is an ERROR_MARK.

01:17 -0500  Zack Weinberg (1):
      cpplib.h (struct cpp_buffer): Replace dir and dlen members with a struct file_name_list pointer.

02:50  Ulrich Drepper (1):
      Handle __set_errno correctly.

04:08  Ulrich Drepper (1):
      Fix all problems reported by the test suite.

07:51 -0500  Zack Weinberg (1):
      gcc.c: Split out Objective-C specs to...
...
Or e.g.
linux.org.pl) & Denis Chertykov (1):
      avr.c (avr_case_values_threshold): New.

lsd.ic.unicamp.br),  Jakub Jelinek (1):
      configure.in: When target is sparc* and tm_file contains 64, test for 64bit support in assembler.

lsd.ic.unicamp.br), Richard Henderson (1):
      resource.c (mark_referenced_resources): Mark a set strict_low_part as used.

m17n.org), Kaz Kojima (1):
      lib1funcs.asm (GLOBAL): Define.

redhat.com), Alexandre Oliva (1):
      * g++.dg/init/pm1.C: New test.

redhat.com), Bernd Schmidt (1):
      reload.c (find_reloads_address_1): Generate reloads for auto_inc pseudos that refer to the original pseudos...

redhat.com), DJ Delorie (1):
      configure.in (FLAGS_FOR_TARGET): Use -nostdinc even for Canadian crosses...

redhat.com), J"orn Rennecke (1):
      reload1.c (move2add_note_store): Treat all registers about which no information is known as potential bases...

redhat.com), Jakub Jelinek (1):
      re PR debug/54693 (VTA guality issues with loops)

redhat.com), Jan Hubicka (1):
      tree-ssa-live.c (remove_unused_scope_block_p): Drop declarations and blocks only after inlining.

redhat.com), Jeff Sturm (1):
      Makefile.in (AS_FOR_TARGET, [...]): If gcc/xgcc is built, use -print-prog-name to find out the program name to use.

redhat.com), Kazu Hirata (1):
      h8300.md: Remove the memory alternative and correct the insn lengths in the templates for...

redhat.com), NIIBE Yutaka (1):
      sh-protos.h (symbol_ref_operand): Declare.

<A0>Eric Botcazou (1):
      config.gcc (sparc64-*-solaris2*, [...]): Add tm-dwarf2.h to tm_file.

	Jakub

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 11:17                               ` Jakub Jelinek
@ 2019-12-26 12:10                                 ` Joseph Myers
  2019-12-26 16:11                                 ` Maxim Kuvyrkov
  2019-12-26 22:33                                 ` Joseph Myers
  2 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 12:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Alexandre Oliva, Eric S. Raymond, Jeff Law, Segher Boessenkool,
	Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

These can be corrected via reposurgeon commands in gcc.lift (see the 
existing "/<jwakely-gcc@gmail.com>/ attribution =A set 
jwakely.gcc@gmail.com" command), or the msgout/msgin mechanism used in 
Richard's script for commit message improvements could also make changes 
to authors (don't know the exact syntax offhand, but I believe authors are 
among the things that mechanism allows to be changed in commit metadata, 
so the script could gain a table of author corrections to apply).

> Or I see in git shortlog parts of date being parsed as name, e.g.
> (basically anything in git shortlog after the "..." wrapped names and before
> Aaron Conole (2): in alphabetical sorting, or after Zuxy Meng (4):.
> 00:27 -0700  Zack Weinberg (1):

> lsd.ic.unicamp.br),  Jakub Jelinek (1):

Filed https://gitlab.com/esr/reposurgeon/issues/218 for these kinds of 
ChangeLog entries - some changes to regular expressions should be able to 
make the code handle them better (possibly by reverting to committer 
identities in some more cases where the ChangeLog header line looks odd in 
some way).

> <A0>Eric Botcazou (1):

I didn't include anything for this in my reduced test.  I'd noted some of 
the invalid attribution warnings from reposurgeon also involving bytes 
0xA0 (= ISO-8859-1 NBSP).  If anything is appropriate there, it might be 
something like "change any 0xA0 that's preceded by an ASCII byte to ASCII 
space before processing further" ("preceded by an ASCII byte" being needed 
to avoid the case of 0xA0 in the middle of a UTF-8 character).

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 11:17                               ` Jakub Jelinek
  2019-12-26 12:10                                 ` Joseph Myers
@ 2019-12-26 16:11                                 ` Maxim Kuvyrkov
  2019-12-26 16:58                                   ` Joseph Myers
  2019-12-29 18:31                                   ` Maxim Kuvyrkov
  2019-12-26 22:33                                 ` Joseph Myers
  2 siblings, 2 replies; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-26 16:11 UTC (permalink / raw)
  To: GCC Development
  Cc: Joseph Myers, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?
> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
> Jakub Jakub Jelinek (1):
> Jakub Jeilnek (1):
> Jelinek (1):
> entries next to the expected one with most of the commits.
> For the misspellings, wonder if e.g. we couldn't compute edit distances from
> other names and if we have one with many commits and then one with very few
> with small edit distance from those, flag it for human review.

This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.

In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.

== Merges on trunk ==

Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.

Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.

My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.

== Differences in trees ==

Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
Here is SVN log of that revision (restoration of deleted trunk):
------------------------------------------------------------------------
r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
Changed paths:
   A /trunk (from /trunk:130802)
------------------------------------------------------------------------

Reposurgeon conversion has:
-------------
commit 7e6f2a96e89d96c2418482788f94155d87791f0a
Author: Daniel Berlin <dberlin@gcc.gnu.org>
Date:   Thu Dec 13 01:53:37 2007 +0000

    Readd trunk

    Legacy-ID: 130805

 .gitignore | 17 -----------------
 1 file changed, 17 deletions(-)
-------------
and my conversion has:
-------------
commit fb128f3970789ce094c798945b4fa20eceb84cc7
Author: Daniel Berlin <dberlin@dbrelin.org>
Date:   Thu Dec 13 01:53:37 2007 +0000

    Readd trunk

    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
-------------

It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.

== Committer entries ==

Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.

reposurgeon-5a:
r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>

pretty:
r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>

== Bad summary line ==

While looking around r138087, below caught my eye.  Is the contents of summary line as expected?

commit cc2726884d56995c514d8171cc4a03657851657e
Author: Chris Fairles <chris.fairles@gmail.com>
Date:   Wed Jul 23 14:49:00 2008 +0000

    acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.

    2008-07-23  Chris Fairles <chris.fairles@gmail.com>

            * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
            Holds the lib that defines clock_gettime (-lrt or -lposix4).
            * src/Makefile.am: Use it.
            * configure: Regenerate.
            * configure.in: Likewise.
            * Makefile.in: Likewise.
            * src/Makefile.in: Likewise.
            * libsup++/Makefile.in: Likewise.
            * po/Makefile.in: Likewise.
            * doc/Makefile.in: Likewise.

    Legacy-ID: 138087

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 16:11                                 ` Maxim Kuvyrkov
@ 2019-12-26 16:58                                   ` Joseph Myers
  2019-12-26 18:36                                     ` Jakub Jelinek
                                                       ` (2 more replies)
  2019-12-29 18:31                                   ` Maxim Kuvyrkov
  1 sibling, 3 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 16:58 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

On Thu, 26 Dec 2019, Maxim Kuvyrkov wrote:

> Reposurgeon creates merge entries on trunk when changes from a branch 
> are merged into trunk.  This brings entire development history from the 
> branch to trunk, which is both good and bad.  The good part is that we 
> get more visibility into how the code evolved.  The bad part is that we 
> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
> every few revisions) and that our SVN branches are work-in-progress 
> quality, not ready for review/commit quality.  It's common for files to 
> be re-written in large chunks on branches.

Seeing "noisy" or possibly confusing commits in "git log" output for 
master is simply a consequence of the possibly confusing defaults for how 
git log behaves (showing all commits in the ancestry in reverse committer 
date order).  I often find "git log --first-parent" output less confusing 
when dealing with any git repository making heavy use of branches (but 
there are other options as well to control how it shows such histories).

If we don't want merge commits on git master for the cases where people 
put merge properties on trunk in the past, we can use a reposurgeon 
"unmerge" command in gcc.lift to stop the few commits in question from 
being merge commits (while keeping all other merges as-is).  (The merges 
of trunk into other branches that copied merge properties from trunk into 
those branches will still be handled correctly, with exactly two parents 
rather than regaining the extra parents corresponding to the merges into 
trunk that Bernd noted in an earlier version of the conversion, because 
the processing that avoids redundant merge parents takes place well before 
any unmerge commands are executed - so at the time of that processing, 
reposurgeon knows that those other branches are in fact in the ancestry of 
trunk, even if we remove that information in the final git repository.)

> Also, reposurgeon's commit logs don't have information on SVN path from 
> which the change came, so there is no easy way to determine that a given 
> commit is from a merged branch, not an original trunk commit.  Git-svn, 

I think it's idiomatic in git for a branch commit not to say "this is a 
commit on X branch", i.e. this is a general property of branchy git 
histories (and unmerge is the solution if we don't want a branchy history 
of master, or use of smarter git tools for viewing the history that people 
may well make more use of when dealing with repositories with that kind of 
history).

> It appears that .gitignore has been added in r1 by reposurgeon and then 
> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
> I speculate that addition of .gitignore at r1 is expected, but it's 
> deletion at r130805 is highly suspicious.

I suspect this is one of the known issues related to reposurgeon-generated 
.gitignore files.  Since such files are not really part of the GCC 
history, and the .gitignore files checked into SVN are properly preserved 
as far as I can see, I don't think it's a particularly important issue for 
the GCC conversion (since auto-generated .gitignore files are only 
nice-to-have, not required).  I've filed 
https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
for this oddity.

> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even 
> when it correctly detects author name from ChangeLog.

I think that's logically accurate (and certainly harmless) as a 
description of commits made to a central repository on gcc.gnu.org, 
although using committer = author would also be OK.

> == Bad summary line ==
> 
> While looking around r138087, below caught my eye.  Is the contents of 
> summary line as expected?
> 
> commit cc2726884d56995c514d8171cc4a03657851657e
> Author: Chris Fairles <chris.fairles@gmail.com>
> Date:   Wed Jul 23 14:49:00 2008 +0000
> 
>     acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.

Yes.  This seems to be Richard's script working exactly as intended, by 
extracting the first bit of the ChangeLog entry *after* the date/author 
header as a better description than "2008-07-23 Chris Fairles 
<chris.fairles@gmail.com>" (i.e. it certainly gives more distinctive 
information about the commit and is more useful than having a date/author 
line as the summary line).  I don't think it's a bad summary line (but 
Richard's script supports hardcoding new summary lines for individual 
commits where desired).

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 16:58                                   ` Joseph Myers
@ 2019-12-26 18:36                                     ` Jakub Jelinek
  2019-12-26 18:59                                       ` Joseph Myers
  2019-12-26 20:31                                     ` Richard Biener
  2019-12-27  1:32                                     ` Joseph Myers
  2 siblings, 1 reply; 198+ messages in thread
From: Jakub Jelinek @ 2019-12-26 18:36 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Richard Earnshaw (lists)

On Thu, Dec 26, 2019 at 04:58:22PM +0000, Joseph Myers wrote:
> If we don't want merge commits on git master for the cases where people 
> put merge properties on trunk in the past, we can use a reposurgeon 
> "unmerge" command in gcc.lift to stop the few commits in question from 
> being merge commits (while keeping all other merges as-is).  (The merges 
> of trunk into other branches that copied merge properties from trunk into 
> those branches will still be handled correctly, with exactly two parents 
> rather than regaining the extra parents corresponding to the merges into 
> trunk that Bernd noted in an earlier version of the conversion, because 
> the processing that avoids redundant merge parents takes place well before 
> any unmerge commands are executed - so at the time of that processing, 
> reposurgeon knows that those other branches are in fact in the ancestry of 
> trunk, even if we remove that information in the final git repository.)

Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
svn:mergeinfo property on the trunk when it appeared too).

	Jakub

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 18:36                                     ` Jakub Jelinek
@ 2019-12-26 18:59                                       ` Joseph Myers
  2019-12-27 11:21                                         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 18:59 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Richard Earnshaw (lists)

On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> On Thu, Dec 26, 2019 at 04:58:22PM +0000, Joseph Myers wrote:
> > If we don't want merge commits on git master for the cases where people 
> > put merge properties on trunk in the past, we can use a reposurgeon 
> > "unmerge" command in gcc.lift to stop the few commits in question from 
> > being merge commits (while keeping all other merges as-is).  (The merges 
> > of trunk into other branches that copied merge properties from trunk into 
> > those branches will still be handled correctly, with exactly two parents 
> > rather than regaining the extra parents corresponding to the merges into 
> > trunk that Bernd noted in an earlier version of the conversion, because 
> > the processing that avoids redundant merge parents takes place well before 
> > any unmerge commands are executed - so at the time of that processing, 
> > reposurgeon knows that those other branches are in fact in the ancestry of 
> > trunk, even if we remove that information in the final git repository.)
> 
> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> svn:mergeinfo property on the trunk when it appeared too).

I've added the unmerge commands for the three commits in question to 
gcc.lift.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26  6:09                           ` Alexandre Oliva
  2019-12-26 11:04                             ` Joseph Myers
@ 2019-12-26 19:16                             ` Eric S. Raymond
  2019-12-26 20:08                               ` Alexandre Oliva
  1 sibling, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-26 19:16 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Jeff Law, Segher Boessenkool, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Alexandre Oliva <oliva@gnu.org>:
> On Dec 25, 2019, "Eric S. Raymond" <esr@thyrsus.com> wrote:
> 
> > Reposurgeon has a reparent command.  If you have determined that a
> > branch is detached or has an incorrect attachment point, patching the
> > metadata of the root node to fix that is very easy.
> 
> Thanks, I see how that can enable a missed branch to be converted and
> added incrementally to a converted repo even after it went live, at
> least as long as there aren't subsequent merges from a converted branch
> to the missed one.  I don't quite see how this helps if there are,
> though.

There's also a command for cutting parent links, ifvthat helps.

> Could make it a requirement that at least the commits associated with
> head branches and published tags compare equal in both conversions, or
> that differences are known, understood and accepted, before we switch
> over to either one?  Going over all corresponding commits might be too
> much, but at least a representative random sample would be desirable to
> check IMHO.

repotool compare does that, and there's a production in the conversion
makefile that applies it.

As Joseph says in anotyer reply, he's already doing a lot of the 
verifications you are suggesting.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 19:16                             ` Eric S. Raymond
@ 2019-12-26 20:08                               ` Alexandre Oliva
  2019-12-26 20:28                                 ` Joseph Myers
  2019-12-26 21:19                                 ` Eric S. Raymond
  0 siblings, 2 replies; 198+ messages in thread
From: Alexandre Oliva @ 2019-12-26 20:08 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Jeff Law, Segher Boessenkool, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Dec 26, 2019, "Eric S. Raymond" <esr@thyrsus.com> wrote:

> Alexandre Oliva <oliva@gnu.org>:
>> On Dec 25, 2019, "Eric S. Raymond" <esr@thyrsus.com> wrote:
>> 
>> > Reposurgeon has a reparent command.  If you have determined that a
>> > branch is detached or has an incorrect attachment point, patching the
>> > metadata of the root node to fix that is very easy.
>> 
>> Thanks, I see how that can enable a missed branch to be converted and
>> added incrementally to a converted repo even after it went live, at
>> least as long as there aren't subsequent merges from a converted branch
>> to the missed one.  I don't quite see how this helps if there are,
>> though.

> There's also a command for cutting parent links, ifvthat helps.

I don't see that it does (help).  Incremental conversion of a missed
branch should include the very same parent links that the conversion of
the entire repo would, just linking to the proper commits in the adopted
conversion.  git-svn can do that incrementally, after the fact; I'm not
sure whether either conversion tool we're contemplating does, but being
able to undertake such recovery seems like a desirable feature to me.

> repotool compare does that, and there's a production in the conversion
> makefile that applies it.

> As Joseph says in anotyer reply, he's already doing a lot of the 
> verifications you are suggesting.

From what I read, he's doing verifications against SVN.  What I'm
suggesting, at this final stage, is for us to do verify one git
converted repo against the other.

Since both claim to be nearing readiness for adoption, I gather it's the
time for both to be comparing with each other (which should be far more
efficient than comparing with SVN) and attempting to narrow down on
differences and converge, so that the community can choose one repo or
another on the actual merits of the converted repositories (e.g. slight
policy differences in metadata conversion), rather than on allegations
by developers of either conversion tool about the reliability of the
tool used by the each other.

Maxim appears to be doing so and finding (easy-to-fix) problems in the
reposurgeon conversion; it would be nice for reposurgeon folks to
reciprocate and maybe even point out problems in the gcc-pretty
conversion, if they can find any, otherwise the allegations of
unsuitability of the tools would have to be taken on blind faith.

I wouldn't like the community to have to decide based on blind faith,
rather than hard data.  I'd much rather we had two great, maybe even
equivalent repos to choose from, possibly with a coin toss if they're
close enough, than pick one over the other on unsubstantiated faith.  It
appears to me that this final stage of collaboration and coopetition,
namely comparing the converted repos proposed for adoption and aiming at
convergence, is in the best interest of our community, even if seemingly
at odds with the promotion of either conversion tool.  I hope we can set
aside these slight conflicts of interest, and do what's best for the
community.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 20:08                               ` Alexandre Oliva
@ 2019-12-26 20:28                                 ` Joseph Myers
  2019-12-27 12:06                                   ` Alexandre Oliva
  2019-12-26 21:19                                 ` Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 20:28 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Thu, 26 Dec 2019, Alexandre Oliva wrote:

> I don't see that it does (help).  Incremental conversion of a missed
> branch should include the very same parent links that the conversion of
> the entire repo would, just linking to the proper commits in the adopted
> conversion.  git-svn can do that incrementally, after the fact; I'm not
> sure whether either conversion tool we're contemplating does, but being
> able to undertake such recovery seems like a desirable feature to me.

We should ensure we don't have missing branches in the first place (for 
whatever definition of what branches we should have).  Adding a branch 
after the fact is a fundamentally different kind of operation from 
including one in the conversion, because it comes with an extra constraint 
of not changing any existing commit hashes (even if the missing branch 
were e.g. merged into some existing branch and maybe logically an ideal 
conversion would thus have had different hashes for existing commits).

> Maxim appears to be doing so and finding (easy-to-fix) problems in the
> reposurgeon conversion; it would be nice for reposurgeon folks to
> reciprocate and maybe even point out problems in the gcc-pretty
> conversion, if they can find any, otherwise the allegations of

That's exactly where information on missing branches, tags in 
branches/st/tags appearing as branches, reparented commits appearing as 
merges came from - I examined properties of those conversions by 
comparison to reposurgeon conversions.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 16:58                                   ` Joseph Myers
  2019-12-26 18:36                                     ` Jakub Jelinek
@ 2019-12-26 20:31                                     ` Richard Biener
  2019-12-27  1:32                                     ` Joseph Myers
  2 siblings, 0 replies; 198+ messages in thread
From: Richard Biener @ 2019-12-26 20:31 UTC (permalink / raw)
  To: gcc, Joseph Myers, Maxim Kuvyrkov
  Cc: GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

On December 26, 2019 5:58:22 PM GMT+01:00, Joseph Myers <jsm@polyomino.org.uk> wrote:
>On Thu, 26 Dec 2019, Maxim Kuvyrkov wrote:
>
>> Reposurgeon creates merge entries on trunk when changes from a branch
>
>> are merged into trunk.  This brings entire development history from
>the 
>> branch to trunk, which is both good and bad.  The good part is that
>we 
>> get more visibility into how the code evolved.  The bad part is that
>we 
>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" 
>> every few revisions) and that our SVN branches are work-in-progress 
>> quality, not ready for review/commit quality.  It's common for files
>to 
>> be re-written in large chunks on branches.
>
>Seeing "noisy" or possibly confusing commits in "git log" output for 
>master is simply a consequence of the possibly confusing defaults for
>how 
>git log behaves (showing all commits in the ancestry in reverse
>committer 
>date order).  I often find "git log --first-parent" output less
>confusing 
>when dealing with any git repository making heavy use of branches (but 
>there are other options as well to control how it shows such
>histories).
>
>If we don't want merge commits on git master for the cases where people
>
>put merge properties on trunk in the past, we can use a reposurgeon 

We've never wanted merge properties on trunk, even deleted them from time to time. And I don't think we want any merge commits to appear in git for this reason
(non-official branches might be fine). 

Richard. 

>"unmerge" command in gcc.lift to stop the few commits in question from 
>being merge commits (while keeping all other merges as-is).  (The
>merges 
>of trunk into other branches that copied merge properties from trunk
>into 
>those branches will still be handled correctly, with exactly two
>parents 
>rather than regaining the extra parents corresponding to the merges
>into 
>trunk that Bernd noted in an earlier version of the conversion, because
>
>the processing that avoids redundant merge parents takes place well
>before 
>any unmerge commands are executed - so at the time of that processing, 
>reposurgeon knows that those other branches are in fact in the ancestry
>of 
>trunk, even if we remove that information in the final git repository.)
>
>> Also, reposurgeon's commit logs don't have information on SVN path
>from 
>> which the change came, so there is no easy way to determine that a
>given 
>> commit is from a merged branch, not an original trunk commit. 
>Git-svn, 
>
>I think it's idiomatic in git for a branch commit not to say "this is a
>
>commit on X branch", i.e. this is a general property of branchy git 
>histories (and unmerge is the solution if we don't want a branchy
>history 
>of master, or use of smarter git tools for viewing the history that
>people 
>may well make more use of when dealing with repositories with that kind
>of 
>history).
>
>> It appears that .gitignore has been added in r1 by reposurgeon and
>then 
>> deleted at r130805.  In SVN repository .gitignore was added in
>r195087.  
>> I speculate that addition of .gitignore at r1 is expected, but it's 
>> deletion at r130805 is highly suspicious.
>
>I suspect this is one of the known issues related to
>reposurgeon-generated 
>.gitignore files.  Since such files are not really part of the GCC 
>history, and the .gitignore files checked into SVN are properly
>preserved 
>as far as I can see, I don't think it's a particularly important issue
>for 
>the GCC conversion (since auto-generated .gitignore files are only 
>nice-to-have, not required).  I've filed 
>https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced
>test 
>for this oddity.
>
>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even
>
>> when it correctly detects author name from ChangeLog.
>
>I think that's logically accurate (and certainly harmless) as a 
>description of commits made to a central repository on gcc.gnu.org, 
>although using committer = author would also be OK.
>
>> == Bad summary line ==
>> 
>> While looking around r138087, below caught my eye.  Is the contents
>of 
>> summary line as expected?
>> 
>> commit cc2726884d56995c514d8171cc4a03657851657e
>> Author: Chris Fairles <chris.fairles@gmail.com>
>> Date:   Wed Jul 23 14:49:00 2008 +0000
>> 
>>     acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define
>GLIBCXX_LIBS.
>
>Yes.  This seems to be Richard's script working exactly as intended, by
>
>extracting the first bit of the ChangeLog entry *after* the date/author
>
>header as a better description than "2008-07-23 Chris Fairles 
><chris.fairles@gmail.com>" (i.e. it certainly gives more distinctive 
>information about the commit and is more useful than having a
>date/author 
>line as the summary line).  I don't think it's a bad summary line (but 
>Richard's script supports hardcoding new summary lines for individual 
>commits where desired).

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 20:08                               ` Alexandre Oliva
  2019-12-26 20:28                                 ` Joseph Myers
@ 2019-12-26 21:19                                 ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-26 21:19 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Jeff Law, Segher Boessenkool, Joseph Myers, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Alexandre Oliva <oliva@gnu.org>:
> I don't see that it does (help).  Incremental conversion of a missed
> branch should include the very same parent links that the conversion of
> the entire repo would, just linking to the proper commits in the adopted
> conversion.  git-svn can do that incrementally, after the fact; I'm not
> sure whether either conversion tool we're contemplating does, but being
> able to undertake such recovery seems like a desirable feature to me.

It's all in what you have in the lift script.  Reposurgeon can do any kind
of branch surgery you want, and that can be added to the conversion pipeline
and replicated every time.

> >From what I read, he's doing verifications against SVN.  What I'm
> suggesting, at this final stage, is for us to do verify one git
> converted repo against the other.

There are no tools for that, and probably won't be unless somebody
revives repodiffer. There isn't a lot of time left in the schedule for
that, and I have my hands full fixing other glitches.  (Minor issues
about parsing ChangeLogs and generated .gitignores; the serious
problems are well behind us at this point.)

> Maxim appears to be doing so and finding (easy-to-fix) problems in the
> reposurgeon conversion; it would be nice for reposurgeon folks to
> reciprocate and maybe even point out problems in the gcc-pretty
> conversion, if they can find any, otherwise the allegations of
> unsuitability of the tools would have to be taken on blind faith.

Joseph has already made the call to go with a reposurgeon-based
conversion for reasons he explained in detail on this list. Given
that, it really doesn't make any sense for me to do any of what
you're proposing with time I could use working on Joseph's RFEs
instead.

If you're concerned about the quality of reposurgeon's conversion,
you'd be a good person to work on a comparison tool. Should I email you
a copy of the repodiffer code as it last existed in my repository?
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 11:17                               ` Jakub Jelinek
  2019-12-26 12:10                                 ` Joseph Myers
  2019-12-26 16:11                                 ` Maxim Kuvyrkov
@ 2019-12-26 22:33                                 ` Joseph Myers
  2 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-26 22:33 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Alexandre Oliva, Eric S. Raymond, Jeff Law, Segher Boessenkool,
	Mark Wielaard, Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Thu, 26 Dec 2019, Jakub Jelinek wrote:

> Is there some easy way (e.g. file in the conversion scripts) to correct
> spelling and other mistakes in the commit authors?

I've added author fixups to bugdb.py, so you can add any number of fixes 
(e.g. based on authors that look suspicious in "git shortlog -s -e --all" 
output) to the author_fixups array (and send a merge-request for the 
gcc-conversion project, or a patch).

The case of multiple consecutive spaces in an attribution is now 
normalized to a single space in reposurgeon, so no fixes are needed for 
that (and fixups should be given in the form with a single space).  In 
addition to that array of fixes, bugdb.py does the following so they don't 
need listing in the array of fixups: converts ISO-8859-1 NBSP to space 
(and trims such spaces at left or right or where the result is multiple 
consecutive spaces); converts ISO-8859-1 author names (coming from 
ChangeLog files) to UTF-8 (there are manual fixups for cases where the 
author in the ChangeLog file didn't seem to be ISO-8859-1 but wasn't valid 
UTF-8 either); fixes up the cases you found where certain forms of 
timestamp from the ChangeLog header, or header specifying multiple 
authors, were used but handled badly in conversion to authors.  I've found 
and reported another case where a form of ChangeLog header used in the 
past isn't handled at all, and Eric is looking at it.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 16:58                                   ` Joseph Myers
  2019-12-26 18:36                                     ` Jakub Jelinek
  2019-12-26 20:31                                     ` Richard Biener
@ 2019-12-27  1:32                                     ` Joseph Myers
  2019-12-27 10:14                                       ` Maxim Kuvyrkov
  2 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-27  1:32 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

On Thu, 26 Dec 2019, Joseph Myers wrote:

> > It appears that .gitignore has been added in r1 by reposurgeon and then 
> > deleted at r130805.  In SVN repository .gitignore was added in r195087.  
> > I speculate that addition of .gitignore at r1 is expected, but it's 
> > deletion at r130805 is highly suspicious.
> 
> I suspect this is one of the known issues related to reposurgeon-generated 
> .gitignore files.  Since such files are not really part of the GCC 
> history, and the .gitignore files checked into SVN are properly preserved 
> as far as I can see, I don't think it's a particularly important issue for 
> the GCC conversion (since auto-generated .gitignore files are only 
> nice-to-have, not required).  I've filed 
> https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
> for this oddity.

This has now been fixed, so future conversion runs with reposurgeon should 
have the automatically-generated .gitignore present until replaced by the 
one checked into SVN.  (If people don't want automatically-generated 
.gitignore files at all, we could always add an option to reposurgeon not 
to generate them.)

I'll do another GCC conversion run to pick up all the accumulated fixes 
and improvements (including many more PR whitelist entries / fixes in 
Richard's script), once another ChangeLog-related fix is in.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27  1:32                                     ` Joseph Myers
@ 2019-12-27 10:14                                       ` Maxim Kuvyrkov
  2019-12-28  1:55                                         ` Eric S. Raymond
  0 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-27 10:14 UTC (permalink / raw)
  To: Joseph Myers
  Cc: GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

> On Dec 27, 2019, at 4:32 AM, Joseph Myers <jsm@polyomino.org.uk> wrote:
> 
> On Thu, 26 Dec 2019, Joseph Myers wrote:
> 
>>> It appears that .gitignore has been added in r1 by reposurgeon and then 
>>> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
>>> I speculate that addition of .gitignore at r1 is expected, but it's 
>>> deletion at r130805 is highly suspicious.
>> 
>> I suspect this is one of the known issues related to reposurgeon-generated 
>> .gitignore files.  Since such files are not really part of the GCC 
>> history, and the .gitignore files checked into SVN are properly preserved 
>> as far as I can see, I don't think it's a particularly important issue for 
>> the GCC conversion (since auto-generated .gitignore files are only 
>> nice-to-have, not required).  I've filed 
>> https://gitlab.com/esr/reposurgeon/issues/219 anyway with a reduced test 
>> for this oddity.
> 
> This has now been fixed, so future conversion runs with reposurgeon should 
> have the automatically-generated .gitignore present until replaced by the 
> one checked into SVN.  (If people don't want automatically-generated 
> .gitignore files at all, we could always add an option to reposurgeon not 
> to generate them.)

Removing auto-generated .gitignore files from reposurgeon conversion would allow comparison of git trees vs gcc-pretty and gcc-reparent beyond r195087.  So, while we are evaluating the conversion candidates, it is best to disable conversion features that cause hard-to-workaround differences.

> 
> I'll do another GCC conversion run to pick up all the accumulated fixes 
> and improvements (including many more PR whitelist entries / fixes in 
> Richard's script), once another ChangeLog-related fix is in.


--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 18:59                                       ` Joseph Myers
@ 2019-12-27 11:21                                         ` Richard Earnshaw (lists)
  2019-12-27 11:33                                           ` Andrew Pinski
                                                             ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-27 11:21 UTC (permalink / raw)
  To: Joseph Myers, Jakub Jelinek
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard

On 26/12/2019 18:59, Joseph Myers wrote:
> On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> 
>> On Thu, Dec 26, 2019 at 04:58:22PM +0000, Joseph Myers wrote:
>>> If we don't want merge commits on git master for the cases where people 
>>> put merge properties on trunk in the past, we can use a reposurgeon 
>>> "unmerge" command in gcc.lift to stop the few commits in question from 
>>> being merge commits (while keeping all other merges as-is).  (The merges 
>>> of trunk into other branches that copied merge properties from trunk into 
>>> those branches will still be handled correctly, with exactly two parents 
>>> rather than regaining the extra parents corresponding to the merges into 
>>> trunk that Bernd noted in an earlier version of the conversion, because 
>>> the processing that avoids redundant merge parents takes place well before 
>>> any unmerge commands are executed - so at the time of that processing, 
>>> reposurgeon knows that those other branches are in fact in the ancestry of 
>>> trunk, even if we remove that information in the final git repository.)
>>
>> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
>> svn:mergeinfo property on the trunk when it appeared too).
> 
> I've added the unmerge commands for the three commits in question to 
> gcc.lift.
> 

I'm not really sure I understand why we don't want merge commits into
trunk, especially for large changes.  Performing archaeology on a change
is just so much easier if the development history is just there.

Without the merge information, if you're tracking down the reason for a
bug, you get to the merge, and then have to go find the branch where the
development was done and start the process all over again.  With merge
information, tools like git blame will show which commit during
development touched the relevant line last and a major step in analysis
is vastly simplified.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:21                                         ` Richard Earnshaw (lists)
@ 2019-12-27 11:33                                           ` Andrew Pinski
  2019-12-27 13:35                                             ` Segher Boessenkool
  2019-12-27 11:35                                           ` Joseph Myers
  2019-12-27 13:29                                           ` Segher Boessenkool
  2 siblings, 1 reply; 198+ messages in thread
From: Andrew Pinski @ 2019-12-27 11:33 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Jakub Jelinek, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Eric S. Raymond, Jeff Law, Segher Boessenkool,
	Mark Wielaard

On Fri, Dec 27, 2019 at 3:22 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 26/12/2019 18:59, Joseph Myers wrote:
> > On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> >
> >> On Thu, Dec 26, 2019 at 04:58:22PM +0000, Joseph Myers wrote:
> >>> If we don't want merge commits on git master for the cases where people
> >>> put merge properties on trunk in the past, we can use a reposurgeon
> >>> "unmerge" command in gcc.lift to stop the few commits in question from
> >>> being merge commits (while keeping all other merges as-is).  (The merges
> >>> of trunk into other branches that copied merge properties from trunk into
> >>> those branches will still be handled correctly, with exactly two parents
> >>> rather than regaining the extra parents corresponding to the merges into
> >>> trunk that Bernd noted in an earlier version of the conversion, because
> >>> the processing that avoids redundant merge parents takes place well before
> >>> any unmerge commands are executed - so at the time of that processing,
> >>> reposurgeon knows that those other branches are in fact in the ancestry of
> >>> trunk, even if we remove that information in the final git repository.)
> >>
> >> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> >> svn:mergeinfo property on the trunk when it appeared too).
> >
> > I've added the unmerge commands for the three commits in question to
> > gcc.lift.
> >
>
> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.
>
> Without the merge information, if you're tracking down the reason for a
> bug, you get to the merge, and then have to go find the branch where the
> development was done and start the process all over again.  With merge
> information, tools like git blame will show which commit during
> development touched the relevant line last and a major step in analysis
> is vastly simplified.

The one branch merge which would have helped me track down why a
testcase was added is the tree-ssa branch merge.  If we had the commit
for the merge to have the merge info, it would have been easier for me
to track down that.  Note this testcase failed with a new patch I am
working on and I decided in the end, the testcase is bogus and not
even testing what it was testing for anyways.  There is a few other
instances like that which would have been helpful.

Thanks,
Andrew Pinski

>
> R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:21                                         ` Richard Earnshaw (lists)
  2019-12-27 11:33                                           ` Andrew Pinski
@ 2019-12-27 11:35                                           ` Joseph Myers
  2019-12-27 12:37                                             ` Richard Earnshaw (lists)
  2019-12-28 12:19                                             ` Segher Boessenkool
  2019-12-27 13:29                                           ` Segher Boessenkool
  2 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-27 11:35 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Jakub Jelinek, Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard

On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:

> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.

To some extent it fits with the principle of separating changes to 
workflow from the actual move to git (as the existing state is that we 
have a linear history on trunk and the few merge properties that were 
there were later deleted).  So after the conversion we could consider if 
for future merges we wish to use merge commits.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 20:28                                 ` Joseph Myers
@ 2019-12-27 12:06                                   ` Alexandre Oliva
  2019-12-27 12:21                                     ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Alexandre Oliva @ 2019-12-27 12:06 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Dec 26, 2019, Joseph Myers <jsm@polyomino.org.uk> wrote:

> We should ensure we don't have missing branches in the first place (for 
> whatever definition of what branches we should have).

*nod*

> Adding a branch after the fact is a fundamentally different kind of
> operation

That depends on the used tool.  A reproducible one, or at least one that
aimed at stability across multiple conversions, could make this easier,
but I guess reposurgeon is not such a tool.  Which suggests to me we
have to be even more reassured of the correctness of its moving-target
output before we adopt it, unlike other conversion tools that have long
had a certain stability of output built into their design.

I understand you're on it, and I thank you for undertaking much of that
validation and verification work.  Your well-known attention to detail
is very valuable.

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 12:06                                   ` Alexandre Oliva
@ 2019-12-27 12:21                                     ` Joseph Myers
  2019-12-28  2:33                                       ` Eric S. Raymond
  0 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-27 12:21 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Fri, 27 Dec 2019, Alexandre Oliva wrote:

> That depends on the used tool.  A reproducible one, or at least one that
> aimed at stability across multiple conversions, could make this easier,
> but I guess reposurgeon is not such a tool.  Which suggests to me we
> have to be even more reassured of the correctness of its moving-target
> output before we adopt it, unlike other conversion tools that have long
> had a certain stability of output built into their design.

reposurgeon results are fully reproducible (by design, the same inputs to 
the same version of reposurgeon should produce the same output as a 
git-fast-import stream, and git should then produce the same objects given 
the same fast-import stream) - note that "same inputs" here includes the 
same bug data used for adding commit summary lines (and of course past 
commit messages, when people fix commit messages in SVN that consequently 
changes the hashes for all descendent commits in any git conversion).  It 
is, however, a tool that works with the *global* commit history.

Most of reposurgeon's own tests verify an output fast-import stream is as 
expected and thus would be liable to fail if the output were not 
reproducible as a function of the input.

Even with two completely separate conversions done with different tools, 
adding a new branch into a git repository should not be more complicated 
than a rebase operation, possibly with some fixups to merge commit 
parents.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:35                                           ` Joseph Myers
@ 2019-12-27 12:37                                             ` Richard Earnshaw (lists)
  2019-12-28  2:27                                               ` Eric S. Raymond
  2019-12-28 12:19                                             ` Segher Boessenkool
  1 sibling, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-27 12:37 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jakub Jelinek, Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard

On 27/12/2019 11:35, Joseph Myers wrote:
> On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:
> 
>> I'm not really sure I understand why we don't want merge commits into
>> trunk, especially for large changes.  Performing archaeology on a change
>> is just so much easier if the development history is just there.
> 
> To some extent it fits with the principle of separating changes to 
> workflow from the actual move to git (as the existing state is that we 
> have a linear history on trunk and the few merge properties that were 
> there were later deleted).  So after the conversion we could consider if 
> for future merges we wish to use merge commits.
> 

Well, personally, I'd rather we didn't throw away data we have in our
current SVN repo unless it's unpresentable in the final conversion.
Merge info is not one of those cases.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:21                                         ` Richard Earnshaw (lists)
  2019-12-27 11:33                                           ` Andrew Pinski
  2019-12-27 11:35                                           ` Joseph Myers
@ 2019-12-27 13:29                                           ` Segher Boessenkool
  2 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-27 13:29 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Jakub Jelinek, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Eric S. Raymond, Jeff Law, Mark Wielaard

On Fri, Dec 27, 2019 at 11:21:41AM +0000, Richard Earnshaw (lists) wrote:
> On 26/12/2019 18:59, Joseph Myers wrote:
> > On Thu, 26 Dec 2019, Jakub Jelinek wrote:
> >> Yes, I'd prefer the trunk to have no merge commits (in svn I've removed the
> >> svn:mergeinfo property on the trunk when it appeared too).
> > 
> > I've added the unmerge commands for the three commits in question to 
> > gcc.lift.
> 
> I'm not really sure I understand why we don't want merge commits into
> trunk, especially for large changes.  Performing archaeology on a change
> is just so much easier if the development history is just there.
> 
> Without the merge information, if you're tracking down the reason for a
> bug, you get to the merge, and then have to go find the branch where the
> development was done and start the process all over again.  With merge
> information, tools like git blame will show which commit during
> development touched the relevant line last and a major step in analysis
> is vastly simplified.

Archaeology is much simpler still if people do not do merges at all, but
use a rebase (or rebase-like, e.g. quilt) workflow.  That way, there are
no bad changes that have to be undone later, etc.  Ideally everything
comes in as small, well thought out patches.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:33                                           ` Andrew Pinski
@ 2019-12-27 13:35                                             ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-27 13:35 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Richard Earnshaw (lists),
	Joseph Myers, Jakub Jelinek, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Eric S. Raymond, Jeff Law, Mark Wielaard

On Fri, Dec 27, 2019 at 03:32:57AM -0800, Andrew Pinski wrote:
> The one branch merge which would have helped me track down why a
> testcase was added is the tree-ssa branch merge.  If we had the commit
> for the merge to have the merge info, it would have been easier for me
> to track down that.  Note this testcase failed with a new patch I am
> working on and I decided in the end, the testcase is bogus and not
> even testing what it was testing for anyways.  There is a few other
> instances like that which would have been helpful.

It sounds like it would have helped you if the testcase had stated what
it is for, what it is testing, in the testcase file itself.  As all tests
should, imnsho.

In the more general case you need to find the discussion on the mailing
list archives.  Which is a difficul problem in itself.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 10:14                                       ` Maxim Kuvyrkov
@ 2019-12-28  1:55                                         ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-28  1:55 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Joseph Myers, GCC Development, Alexandre Oliva, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>:
> Removing auto-generated .gitignore files from reposurgeon conversion
> would allow comparison of git trees vs gcc-pretty and gcc-reparent
> beyond r195087.  So, while we are evaluating the conversion
> candidates, it is best to disable conversion features that cause
> hard-to-workaround differences.

I was going to write that feature yesterday, then Julien nipped in and
did it while my back was turned.  It's a read option,
--no-automatic-ignores.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 12:37                                             ` Richard Earnshaw (lists)
@ 2019-12-28  2:27                                               ` Eric S. Raymond
  2019-12-28 11:23                                                 ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-28  2:27 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Jakub Jelinek, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Segher Boessenkool, Mark Wielaard

Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>:
> Well, personally, I'd rather we didn't throw away data we have in our
> current SVN repo unless it's unpresentable in the final conversion.

I agree with this philosophy. You will have noticed by now, I hope,
that reposurgeon peserves as much as it can, leaving deletions to be 
a matter of user policy.

In the normal case, reposurgeon could save its users a significant
amount of work by being more aggressive about automatically deleting
remnant bits that are merely *very unlikely* to be useful. I deliberately
refused to go thar route.

> Merge info is not one of those cases.

Sometimes. Some Subversion mergeinfo operations map to Git's
branch-centric merging.  Many do not, corresponding to cherry-picks
that cannot be expressed in a Git history.

Reposurgeon does a correct but not complete job of translating 
mergeinfos that compose into branch merges.  It handles the simple,
cmmon cases and punts the tricky ones.

More coverage would theoretically be possible, but I don't
have the faintest clue what a general resolution rule would
look like.  Except I'm pretty sure the problem is bitchy-hard
and the solution really easy to get subtly wrong.

Frankly, I don't want to touch this mess with insulated
tongs. Somebody would have to offer me serious money to
compensate for the expected level of pain.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 12:21                                     ` Joseph Myers
@ 2019-12-28  2:33                                       ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-28  2:33 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alexandre Oliva, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

Joseph Myers <jsm@polyomino.org.uk>:
> reposurgeon results are fully reproducible (by design, the same inputs to 
> the same version of reposurgeon should produce the same output as a 
> git-fast-import stream,

Designer confirms, and adds that we gave a *very* stringent test suite
to verify this.

Much of it consists of bizarre malformations collected during past
conversions. GCC has added its share.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-28  2:27                                               ` Eric S. Raymond
@ 2019-12-28 11:23                                                 ` Joseph Myers
  0 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-28 11:23 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Richard Earnshaw (lists),
	Jakub Jelinek, Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Jeff Law, Segher Boessenkool, Mark Wielaard

On Fri, 27 Dec 2019, Eric S. Raymond wrote:

> > Merge info is not one of those cases.
> 
> Sometimes. Some Subversion mergeinfo operations map to Git's
> branch-centric merging.  Many do not, corresponding to cherry-picks
> that cannot be expressed in a Git history.

And in the case of merge commits on master: *deletion* of SVN merge 
properties (which is what was done some time ago on trunk when it was 
decided we didn't want them there) cannot be expressed in a git history.  
But using "unmerge" for the three commits in question (so they don't 
appear as merge commits in the git conversion) is one reasonable choice 
for how to represent it.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-27 11:35                                           ` Joseph Myers
  2019-12-27 12:37                                             ` Richard Earnshaw (lists)
@ 2019-12-28 12:19                                             ` Segher Boessenkool
  2019-12-28 17:11                                               ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-28 12:19 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Earnshaw (lists),
	Jakub Jelinek, Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Mark Wielaard

On Fri, Dec 27, 2019 at 11:35:21AM +0000, Joseph Myers wrote:
> On Fri, 27 Dec 2019, Richard Earnshaw (lists) wrote:
> 
> > I'm not really sure I understand why we don't want merge commits into
> > trunk, especially for large changes.  Performing archaeology on a change
> > is just so much easier if the development history is just there.
> 
> To some extent it fits with the principle of separating changes to 
> workflow from the actual move to git (as the existing state is that we 
> have a linear history on trunk and the few merge properties that were 
> there were later deleted).  So after the conversion we could consider if 
> for future merges we wish to use merge commits.

SVN mergeinfo is not representable in Git.  It records which changesets
have been copied over from one branch to another.  Git doesn't do
changesets *at all*: it just stores tree contents, and it records one or
multiple parents for every commit.  That isn't actually derivable from
the SVN info.  You can guess, and you can guess wrong.

Branch merges do not mesh well with our commit policies, fwiw:
everything should normally be posted for public review on the mailing
lists.  This does not really work for commits that have been set in
stone months before.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-28 12:19                                             ` Segher Boessenkool
@ 2019-12-28 17:11                                               ` Richard Earnshaw (lists)
  2019-12-28 20:28                                                 ` Segher Boessenkool
  0 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-28 17:11 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Jakub Jelinek, Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Mark Wielaard

On 28/12/2019 12:19, Segher Boessenkool wrote:
> Branch merges do not mesh well with our commit policies, fwiw:
> everything should normally be posted for public review on the mailing
> lists.  This does not really work for commits that have been set in
> stone months before.
> 

I disagree.  The review comments will show up as additional commits on
the branch and can be tracked back to such events.  Once history gets
flattened into a major single commit it's significantly more effort to
drill down into the history and find out why if we've lost the merge
information.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-28 17:11                                               ` Richard Earnshaw (lists)
@ 2019-12-28 20:28                                                 ` Segher Boessenkool
  2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-28 20:28 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Jakub Jelinek, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Eric S. Raymond, Jeff Law, Mark Wielaard

On Sat, Dec 28, 2019 at 05:11:47PM +0000, Richard Earnshaw (lists) wrote:
> On 28/12/2019 12:19, Segher Boessenkool wrote:
> > Branch merges do not mesh well with our commit policies, fwiw:
> > everything should normally be posted for public review on the mailing
> > lists.  This does not really work for commits that have been set in
> > stone months before.
> 
> I disagree.  The review comments will show up as additional commits on
> the branch and can be tracked back to such events.  Once history gets
> flattened into a major single commit it's significantly more effort to
> drill down into the history and find out why if we've lost the merge
> information.

Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
future merges, where we can help that.  It disagrees with our documented
"submitting patches" protocol.

Nothing should ever be flattened to a single commit.  But before patches
hit trunk, the patch series can be made nicer than it was at the start
of its development.

All merges lose information.  All of them.  You take two branches, and
cut and paste between the two, but you never show which part is from
what, or how conflicts were resolved, etc.  All this can be reconstructed
of course -- you know the inputs, and you have the output -- but the info
isn't there directly, and there is no why or what.  If you're lucky there
is a mail about it, or the merge commit itself goes into it a bit.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-28 20:28                                                 ` Segher Boessenkool
@ 2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
  2019-12-29 10:41                                                     ` Segher Boessenkool
  0 siblings, 1 reply; 198+ messages in thread
From: Julien "FrnchFrgg" Rivaud @ 2019-12-29  1:45 UTC (permalink / raw)
  To: gcc

Le 28/12/2019 Ã  21:28, Segher Boessenkool a Ã©critÂ :
> On Sat, Dec 28, 2019 at 05:11:47PM +0000, Richard Earnshaw (lists) wrote >> I disagree.  The review comments will show up as additional commits on
>> the branch and can be tracked back to such events.  Once history gets
>> flattened into a major single commit it's significantly more effort to
>> drill down into the history and find out why if we've lost the merge
>> information.

Review comments should *not* correspond to any *new* commit on any 
branch. At least not in the vast majority of cases. They should trigger 
modifications of the existing commits.

Commits are units of meaning, and their sequence is not a timeline, but 
a logical relationship (in a single patch set at least). When submitting 
a feature, the different commits *should not* correspond to passing time 
in the development but to incremental building of the feature. And when 
a patch set is finished as in the end point is what you want, you should 
rewrite it so that the changes are coherent units of change in a "small 
provable steps" point of view.

Similarly that in Maths it is customary to have proofs where you first 
proove a weaker result you know will be never used when you get to 
proving the stronger version, if that's the cleanest/most coherent or 
beautiful or convincing way to write the proof, I quite often introduce 
a method or structure early in a patch set *that I know will not survive 
the whole patchset and will never be in the final result*, just because 
that makes the transition easier and easier to prove right, be it with 
suitable passing tests, concise code changes, and extensive commit 
messages that explain the reasoning (all of those are required together 
IMHO).

The goad behind a patch set is to look like you wrote it linearly, with 
god-like insight throughout, that enabled you to modify in small, 
trivial steps some code and lead smoothly the reader towards your end 
goal that is unrecognizable from the starting point. I work very hard to 
attain that, because I do not have a god-like brain.

History littered with Â« That didn't work, now try that Â» or Â« Fix that 
to make the reviewer happy Â» is just that, history. A VCS is not to 
track history, but to track meaningful changes (IMHO again).

> 
> Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
> future merges, where we can help that.  It disagrees with our documented
> "submitting patches" protocol.

I don't see how that can be correct. Linux is heavily "submitting 
patches" based, with stringent reviews on LKML, yet heavily uses merges. 
The git project itself uses that workflow, and reading the ML is quite 
enlightening (even enjoyable). Patch sets have to be rebased on the last 
*release*, and every one of them gets then merged by Juno C Hamano.
That means the "first parent" history of git is only merge commits.

> Nothing should ever be flattened to a single commit.  But before patches
> hit trunk, the patch series can be made nicer than it was at the start
> of its development.

I quite agree with that, and it resonates with my TL;DR chunk of text above.

> All merges lose information.  All of them.  You take two branches, and
> cut and paste between the two, but you never show which part is from
> what, or how conflicts were resolved, etc.  All this can be reconstructed
> of course -- you know the inputs, and you have the output -- but the info
> isn't there directly, and there is no why or what.  If you're lucky there
> is a mail about it, or the merge commit itself goes into it a bit.

In fact, that's one of the reasons I argue for the Â« always use merge 
commits Â» rule, *even if you rebase beforehand*: the individual commits 
are the logical steps, their commit message explain the local why of the 
incremental changes, and the merge commit is the cover letter. It 
explains the rationale of the complete set, describes the feature, etc.

Again, have a look at the git ML to see what I consider as near-perfect 
application of these principles. And that can be done even without 
hugely branchy repositories (look at the git repo in gitk and cringe).

I would add that of course, no merge commit should be non-trivial. If 
you had to introduce code changes *in* the merge commit itself, 
something is wrong. Rebase before merging (but keep the merge commit, 
using the --no-ff flag).

Julien "_FrnchFrgg_" Rivaud

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
@ 2019-12-29 10:41                                                     ` Segher Boessenkool
  2019-12-29 11:02                                                       ` Richard Biener
  2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
  0 siblings, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 10:41 UTC (permalink / raw)
  To: Julien FrnchFrgg Rivaud; +Cc: gcc

On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:
> >Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
> >future merges, where we can help that.  It disagrees with our documented
> >"submitting patches" protocol.
> 
> I don't see how that can be correct. Linux is heavily "submitting 
> patches" based, with stringent reviews on LKML, yet heavily uses merges. 

Linux has most development done in separate trees, one for each maintainer.
That is not how GCC works.

I was talking about https://gcc.gnu.org/contribute.html , see heading
"submitting patches" :-)

> >Nothing should ever be flattened to a single commit.  But before patches
> >hit trunk, the patch series can be made nicer than it was at the start
> >of its development.
> 
> I quite agree with that, and it resonates with my TL;DR chunk of text above.

Yup.  Rebasing is superior to merging in many ways.  Merging is appropriate
if there is parallel development of (mostly) independent things.  Features
aren't that, usually: they can be rebased easily, and they should be posted
for review anyway.

It is very easy to use merges more often than is useful, and it hurts.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 10:41                                                     ` Segher Boessenkool
@ 2019-12-29 11:02                                                       ` Richard Biener
  2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 12:15                                                         ` Segher Boessenkool
  2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
  1 sibling, 2 replies; 198+ messages in thread
From: Richard Biener @ 2019-12-29 11:02 UTC (permalink / raw)
  To: gcc, Segher Boessenkool, Julien FrnchFrgg Rivaud

On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
>wrote:
>> >Oh, I'm not talking about historical merges.  I'm saying we
>shouldn't do
>> >future merges, where we can help that.  It disagrees with our
>documented
>> >"submitting patches" protocol.
>> 
>> I don't see how that can be correct. Linux is heavily "submitting 
>> patches" based, with stringent reviews on LKML, yet heavily uses
>merges. 
>
>Linux has most development done in separate trees, one for each
>maintainer.
>That is not how GCC works.
>
>I was talking about https://gcc.gnu.org/contribute.html , see heading
>"submitting patches" :-)
>
>> >Nothing should ever be flattened to a single commit.  But before
>patches
>> >hit trunk, the patch series can be made nicer than it was at the
>start
>> >of its development.
>> 
>> I quite agree with that, and it resonates with my TL;DR chunk of text
>above.
>
>Yup.  Rebasing is superior to merging in many ways.  Merging is
>appropriate
>if there is parallel development of (mostly) independent things. 
>Features
>aren't that, usually: they can be rebased easily, and they should be
>posted
>for review anyway.
>
>It is very easy to use merges more often than is useful, and it hurts.

For bisecting trunk a merge would be a single commit, right? So I could see value in preserving a patch series where individual steps might introduce temporary issues as a branch merge (after rebasing) so the series is visible but not when bisecting (by default). It would also make the series relatedness obvious and avoids splitting it with a commit race (if that is possible with git). 

IMHO exact workflow for merging a patch series as opposed to a single patch should be documented. 

Richard. 

>
>Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 10:41                                                     ` Segher Boessenkool
  2019-12-29 11:02                                                       ` Richard Biener
@ 2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 13:26                                                         ` Segher Boessenkool
  1 sibling, 1 reply; 198+ messages in thread
From: Julien '_FrnchFrgg_' RIVAUD @ 2019-12-29 11:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc

Le 29/12/2019 Ã  11:41, Segher Boessenkool a Ã©critÂ :
> On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:
>>> Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
>>> future merges, where we can help that.  It disagrees with our documented
>>> "submitting patches" protocol.
>> I don't see how that can be correct. Linux is heavily "submitting
>> patches" based, with stringent reviews on LKML, yet heavily uses merges.
> Linux has most development done in separate trees, one for each maintainer.
> That is not how GCC works.

I mentioned the git development for a reason. They use merges for 
*everything*, including patchsets by people who never contributed before 
and might never contribute afterwards. The very *concept* of a DVCS is 
that each developer has a separate tree, not each maintainer.

I'm not arguing that you should go that route, it seems a bit extreme to 
me. But outright refusing merges on the basis they are painful is (if 
you can accept the strong word) ludicrous.

> Nothing should ever be flattened to a single commit. But before patches
>>> hit trunk, the patch series can be made nicer than it was at the start
>>> of its development.
>> I quite agree with that, and it resonates with my TL;DR chunk of text above.
> Yup.  Rebasing is superior to merging in many ways.

That's not what I agreed with. I agreed with Â« the patch series can be 
made nicer Â», which I took to be the contrary of Â« append patches at the 
end Â». Rebasing is *one* of the ways to do that, especially interactive 
rebasing to shuffle patches around, check that each step compiles and 
passes the full test suite (updating it if needed and correct), reword 
messages, and think a lot of times about the best progression. But I 
never opposed rebasing to merging. In particular, I clearly wrote that 
*even if you rebased*, there are very strong arguments out there about 
refusing fast-forward merges, that is *always* generate a real merge 
commit, with a cover letter message roughly corresponding to the mail 
people send on the ML to convince people their patch series are worth 
including in GCC.

That leaves individual commit messages to explain the local rationale 
behind each discrete change (not the how, as it is readily apparent from 
the code, unless the code is very clever and then an in-code comment is 
warranted)

> Merging is appropriate if there is parallel development of (mostly) independent things.

Which is almost always the case.

> Features aren't that, usually: they can be rebased easily, and they should be posted
> for review anyway.
How often successive features checked into GCC are dependent on each 
other ? The fact that they can be rebased either way and easily is 
almost a testimony of that. And the fact that they need review has 
nothing to do with anything.
> It is very easy to use merges more often than is useful, and it hurts.

And it is very easy to use SVN-like workflows, and it hurts far more. 
SVN, due to its centrality and inherent impossibility to encode logical 
relationships between changes (as opposed to time-based evolution), 
slowly impaired most developers mind openness about what can be done in 
a worthwhile VCS. Moving to git is an opportunity to at last free 
yourselves, not continue that narrow treading on SVN paths.

SVN was like an almanac listing successive events without any analysis. 
That's not History (as in the field of study). Git at least can let you 
express and use to your common benefit logical links between 
modifications. Don't miss that train.

Merges are not scary when the tools are good. Even the logs are totally 
usable with a lot of merges, with suitable tools. The tool has to adapt, 
not you.

Julien

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 11:02                                                       ` Richard Biener
@ 2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 13:31                                                           ` Segher Boessenkool
  2019-12-29 12:15                                                         ` Segher Boessenkool
  1 sibling, 1 reply; 198+ messages in thread
From: Julien '_FrnchFrgg_' RIVAUD @ 2019-12-29 11:47 UTC (permalink / raw)
  To: Richard Biener, gcc, Segher Boessenkool

Le 29/12/2019 Ã  12:02, Richard Biener a Ã©critÂ :
> On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
>> wrote:
>>>> Oh, I'm not talking about historical merges.  I'm saying we
>> shouldn't do
>>>> future merges, where we can help that.  It disagrees with our
>> documented
>>>> "submitting patches" protocol.
>>> I don't see how that can be correct. Linux is heavily "submitting
>>> patches" based, with stringent reviews on LKML, yet heavily uses
>> merges.
>>
>> Linux has most development done in separate trees, one for each
>> maintainer.
>> That is not how GCC works.
>>
>> I was talking about https://gcc.gnu.org/contribute.html , see heading
>> "submitting patches" :-)
>>
>>>> Nothing should ever be flattened to a single commit.  But before
>> patches
>>>> hit trunk, the patch series can be made nicer than it was at the
>> start
>>>> of its development.
>>> I quite agree with that, and it resonates with my TL;DR chunk of text
>> above.
>>
>> Yup.  Rebasing is superior to merging in many ways.  Merging is
>> appropriate
>> if there is parallel development of (mostly) independent things.
>> Features
>> aren't that, usually: they can be rebased easily, and they should be
>> posted
>> for review anyway.
>>
>> It is very easy to use merges more often than is useful, and it hurts.
> For bisecting trunk a merge would be a single commit, right?
Not exactly. It will if the bug was not introduced by the merge, but if 
so then "git bisect" will start looking at individual commits in the 
branch, which is IMHO very good. It is far easier to have a bug pinned 
to a single change (or say 5-6 commits, if all were not buildable or 
testable), than a whole branch. At worst, no commit is testable in the 
branch except the last, and git will say that the bug was introduced in 
the branch, which is not worse that what you'd get without a merge commit.

Julien

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 11:02                                                       ` Richard Biener
  2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
@ 2019-12-29 12:15                                                         ` Segher Boessenkool
  2019-12-29 16:32                                                           ` Richard Earnshaw
  1 sibling, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 12:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Julien FrnchFrgg Rivaud

On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
> For bisecting trunk a merge would be a single commit, right? So I could see value in preserving a patch series where individual steps might introduce temporary issues as a branch merge (after rebasing) so the series is visible but not when bisecting (by default). It would also make the series relatedness obvious and avoids splitting it with a commit race (if that is possible with git). 

"git bisect" actually goes all the way down the rabbit hole, it tries to
find the first bad commit in the range you marked as starting as "good",
ending as "bad".

It is pretty confusing to do if there are many merges, especially if many
commits end up not building at all.  But you can always "git bisect skip"
stuff (it just eats time, and it hampers automated bisecting).

The really nasty cases are when the code does build, but fails for
unrelated reasons.

We require every commit to be individually tested, and if we *do* allow
merges, that should still be done imo.  Which again makes merging less
useful: if you are going to rebase your branch anyway (to fix simple
stuff), why not rebase it onto trunk!

> IMHO exact workflow for merging a patch series as opposed to a single patch should be documented. 

Yes.  It isn't actually documented in so many words for what we do now,
either, but it would be good to have.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
@ 2019-12-29 13:26                                                         ` Segher Boessenkool
  2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 21:31                                                           ` Thomas Koenig
  0 siblings, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 13:26 UTC (permalink / raw)
  To: Julien '_FrnchFrgg_' RIVAUD; +Cc: gcc

Hi!

On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> I'm not arguing that you should go that route, it seems a bit extreme to 
> me. But outright refusing merges on the basis they are painful is (if 
> you can accept the strong word) ludicrous.

They are painful for everyone working with the history later.  Something
that we do in GCC more often than in most other projects.

> >Merging is appropriate if there is parallel development of (mostly) 
> >independent things.
> 
> Which is almost always the case.

Which is almost *never* the case for GCC, in my opinion.  Almost all
commits are smallish improvements / bugfixes.  And most bigger things are
not independent enough -- we require the resulting thing to be (regression)
tested before pushing it upstream, and that is because often that *does*
find problems!

> >Features aren't that, usually: they can be rebased easily, and they should 
> >be posted
> >for review anyway.
> How often successive features checked into GCC are dependent on each 
> other ?

Almost always, one way or the other.  It's not just the GCC code itself
you have to consider here, there things are easily independent enough,
but looking at the code generated by GCC often shows unexpected
interactions.

> The fact that they can be rebased either way and easily is 
> almost a testimony of that. And the fact that they need review has 
> nothing to do with anything.

Every patch should normally be posted to the mailing lists for review.
Such patches should be against trunk.  And *that* patch will be approved,
so *that* is the one you will commit and push upstream eventually.

Those are the procedures we currently have, and it is necessary to keep
the tree even somewhat working most of the time.  Too often the tree is
broken for days on end :-(

> >It is very easy to use merges more often than is useful, and it hurts.
> 
> And it is very easy to use SVN-like workflows, and it hurts far more. 
> SVN, due to its centrality and inherent impossibility to encode logical 
> relationships between changes (as opposed to time-based evolution), 
> slowly impaired most developers mind openness about what can be done in 
> a worthwhile VCS. Moving to git is an opportunity to at last free 
> yourselves, not continue that narrow treading on SVN paths.

We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.

> SVN was like an almanac listing successive events without any analysis. 
> That's not History (as in the field of study). Git at least can let you 
> express and use to your common benefit logical links between 
> modifications. Don't miss that train.

I think you seriously overestimate how much information content is in a
merge (esp. as applied to the GCC context).  Let's start with using good
commit messages (or actual commit messages *at all*), that has a much
better pain/gain ratio.

> Merges are not scary when the tools are good. Even the logs are totally 
> usable with a lot of merges, with suitable tools. The tool has to adapt, 
> not you.

Merges aren't scary.  Merges are inconvenient.  And yes, there is no way
that all of us will change on a non-geological time scale.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
@ 2019-12-29 13:31                                                           ` Segher Boessenkool
  2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 13:31 UTC (permalink / raw)
  To: Julien '_FrnchFrgg_' RIVAUD; +Cc: Richard Biener, gcc

On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> At worst, no commit is testable in the 
> branch except the last, and git will say that the bug was introduced in 
> the branch, which is not worse that what you'd get without a merge commit.

We normally require every commit to be tested, so it is a lot worse, yes.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 13:26                                                         ` Segher Boessenkool
@ 2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 15:01                                                             ` Segher Boessenkool
  2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
  2019-12-29 21:31                                                           ` Thomas Koenig
  1 sibling, 2 replies; 198+ messages in thread
From: Julien '_FrnchFrgg_' RIVAUD @ 2019-12-29 13:48 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc

Le 29/12/2019 Ã  14:26, Segher Boessenkool a Ã©critÂ :
> Hi!
>
> On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
>> I'm not arguing that you should go that route, it seems a bit extreme to
>> me. But outright refusing merges on the basis they are painful is (if
>> you can accept the strong word) ludicrous.
> They are painful for everyone working with the history later.

I don't think merges make looking at history more or less painful, 
unless you consider projects like git where there are a inordinate 
amount of merges. And even then, I think they have solutions.

>   Something that we do in GCC more often than in most other projects.
I would have expected a lot if not all projects to look often in 
history, at least for projects with significant complexity.
>
> Which is almost *never* the case for GCC, in my opinion.  Almost all
> commits are smallish improvements / bugfixes.
Which are indepenent, clearly.
> Every patch should normally be posted to the mailing lists for review.
> Such patches should be against trunk.  And *that* patch will be approved,
> so *that* is the one you will commit and push upstream eventually.

Indeed, the rebased series would be what is reviewed and pushed 
upstream. Which can be done with a merge commit anyway. I think you 
really should look at the workflow of the git project (and they have 
their share of interdependent strange things that happen too; of course 
less than GCC due to the complexity of the project, but the techniques 
to ensure you don't get bitten by that are the same).

They use merges extensively, and have a very very good track record of 
non-broken master (or at least had last time I looked).

>
> We cannot waste a year on a social experiment.  We can slowly and carefully
> adopt new procedures, certainly.  But anything drastic isn't advisable imo.
>
> Also, many GCC developers aren't familiar with Git at all.  It takes time
> to learn it, and to learn new ways of working.  Small steps are needed.

Of course ! I am not suggesting you change everything. But setting in 
stone hard rules that force the SVN mindset is harmful too.

> Merges aren't scary.  Merges are inconvenient.

No they are not. You are unaccustomed to them, which is different. 
People that only ever used DVCS feel merges are much more natural and 
even productivity increasing. Some even do "bad merges", like "sync from 
trunk" every other commit, which I very much frown against.

Which brings me to something I find strange in your policy: to me, 
merges from trunk to branches should be rare if not nonexistent. And you 
are deciding to banish merges the other way around.

Julien

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 13:31                                                           ` Segher Boessenkool
@ 2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
  0 siblings, 0 replies; 198+ messages in thread
From: Julien '_FrnchFrgg_' RIVAUD @ 2019-12-29 13:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Biener, gcc

Le 29/12/2019 Ã  14:31, Segher Boessenkool a Ã©critÂ :
> On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
>> At worst, no commit is testable in the
>> branch except the last, and git will say that the bug was introduced in
>> the branch, which is not worse that what you'd get without a merge commit.
> We normally require every commit to be tested, so it is a lot worse, yes.

That's very good, and should not change. I test every commit of every 
merge request I submit, even on projects that use real merges. It is 
easy to create CI/CD configurations and/or hooks that enforce that when 
trying to push a patch set, with or without a merge commit.

Merge commits have the great effect of separating the history into 
related chunks. Without them, you don't really know if a single bugfix 
is logically part of a set (because it fixes something important to pave 
the way) or not, and you have to think harder to detect the end of a set 
and the start of another (with maybe single commits inbetween).

>
>
> Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
@ 2019-12-29 15:01                                                             ` Segher Boessenkool
  2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
  1 sibling, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 15:01 UTC (permalink / raw)
  To: Julien '_FrnchFrgg_' RIVAUD; +Cc: gcc

On Sun, Dec 29, 2019 at 02:48:31PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> >Merges aren't scary.  Merges are inconvenient.
> 
> No they are not. You are unaccustomed to them, which is different. 

Lol.  Okay, end of discussion.  You are assuming all the wrong things.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 12:15                                                         ` Segher Boessenkool
@ 2019-12-29 16:32                                                           ` Richard Earnshaw
  2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
  0 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw @ 2019-12-29 16:32 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener; +Cc: gcc, Julien FrnchFrgg Rivaud

On 29/12/2019 12:15, Segher Boessenkool wrote:
> On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
>> For bisecting trunk a merge would be a single commit, right? So I could see value in preserving a patch series where individual steps might introduce temporary issues as a branch merge (after rebasing) so the series is visible but not when bisecting (by default). It would also make the series relatedness obvious and avoids splitting it with a commit race (if that is possible with git). 
> 
> "git bisect" actually goes all the way down the rabbit hole, it tries to
> find the first bad commit in the range you marked as starting as "good",
> ending as "bad".
> 
> It is pretty confusing to do if there are many merges, especially if many
> commits end up not building at all.  But you can always "git bisect skip"
> stuff (it just eats time, and it hampers automated bisecting).
> 
> The really nasty cases are when the code does build, but fails for
> unrelated reasons.
> 
> We require every commit to be individually tested, and if we *do* allow
> merges, that should still be done imo.  Which again makes merging less
> useful: if you are going to rebase your branch anyway (to fix simple
> stuff), why not rebase it onto trunk!
> 
>> IMHO exact workflow for merging a patch series as opposed to a single patch should be documented. 
> 
> Yes.  It isn't actually documented in so many words for what we do now,
> either, but it would be good to have.
> 
> 
> Segher
> 

We agreed that for changes in our current workflow practices we'd defer
that until *after* we'd switched to git; so this is getting off topic.

On the other hand, we do need to sort out what we do with existing merge
history, as that forms part of the conversion.  Can we stick to what's
relevant, please, at least in this thread?

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 16:32                                                           ` Richard Earnshaw
@ 2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
  0 siblings, 0 replies; 198+ messages in thread
From: Julien '_FrnchFrgg_' RIVAUD @ 2019-12-29 16:37 UTC (permalink / raw)
  To: Richard Earnshaw, Segher Boessenkool, Richard Biener; +Cc: gcc

Le 29/12/2019 Ã  17:32, Richard Earnshaw a Ã©critÂ :
> We agreed that for changes in our current workflow practices we'd defer
> that until *after* we'd switched to git; so this is getting off topic.
>
> On the other hand, we do need to sort out what we do with existing merge
> history, as that forms part of the conversion.  Can we stick to what's
> relevant, please, at least in this thread?

I never wanted to make the GCC project choose new rules now. What I 
advise (and you are more than able to choose to follow or not) is only 
to avoid taking decisions right now, as part of the migration, that 
would impair establishing better rules later, especially if those 
decisions come from (bad?) habits that were taken during the SVN era, 
due to the idiosyncrasies of SVN itself.

Julien


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-25 12:10                         ` Segher Boessenkool
  2019-12-25 14:13                           ` Joseph Myers
@ 2019-12-29 16:47                           ` Mark Wielaard
  2019-12-29 22:42                             ` Joseph Myers
  1 sibling, 1 reply; 198+ messages in thread
From: Mark Wielaard @ 2019-12-29 16:47 UTC (permalink / raw)
  To: Segher Boessenkool, Alexandre Oliva
  Cc: Jeff Law, Joseph Myers, Eric S. Raymond, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

Hi,

On Wed, 2019-12-25 at 06:10 -0600, Segher Boessenkool wrote:
> git-svn did not miss any branches.  Finding branches is not done by
> git-svn at all, for this.  These branches were skipped because they
> have nothing to do with GCC, have no history in common (they are not
> descendants of revision 1).  They can easily be added -- Maxim might
> already have done that, not sure, imo it's better to just drop the
> garbage, it's in svn if anyone cares.

I just looked at one of these "missed" branches CLASSPATH.
That was created when both GNU Classpath and gcc/libgcj were both in
cvs. The idea was that it was a kind of cvs vendor branch of the
upstream GNU Classpath releases (and some random checkouts) which would
make merging imports of new code into the main trunk easier. libgcj was
merged and then based on GNU Classpath in the past/when it was
officially imported into gcc. The CLASSPATH branch only contains files
under libjava/classpath.

Some of the commits look a little odd, probably because it was
converted from cvs2svn and then again to git. GNU Classpath moved to
git a long time ago and never was in subversion. And of course these
days gcj and libgcj aren't part of the main gcc trunk anymore.

There is also a classpath-generics branch, which has a couple of
snapshots of the GNU Classpath generics branch (some pre-releases of
classpath before 0.95 which had generics separately).

There are also some other branches containing classpath:
gcj/classpath-095-import-branch
gcj/classpath-095-merge-branch
gcj/classpath-0961-import-branch
gcj/classpath-098-merge-branch
gcj/classpath-20070727-import-branch

These branches contain all of gcc, not just the files under
libjava/classpath
I am not sure why these were separate from the CLASSPATH vendor branch.

Even though I have an (historical) interest in the gcj frontend and GNU
Classpath class library I am not sure these branches would really help
me. Also I think the branch aren't very interesting without the actual
GNU Classpath (git) tree history from which they were cherry-picked.
The classpath git tree does contain tags for each import already, so
you can get the real history there.

Seeing how big the git tree/conversion already is I would suggest
leaving these out of the main git repo if at all possible.

Maybe we should have a separate historical git repo which contains
everything that we were able to salvage and that people could git
remote add if they are really, really interested.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
  2019-12-29 15:01                                                             ` Segher Boessenkool
@ 2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
  2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
  1 sibling, 1 reply; 198+ messages in thread
From: Ian Lance Taylor via gcc @ 2019-12-29 17:31 UTC (permalink / raw)
  To: Julien '_FrnchFrgg_' RIVAUD; +Cc: Segher Boessenkool, GCC Development

On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
<frnchfrgg@free.fr> wrote:
>
> Which brings me to something I find strange in your policy: to me,
> merges from trunk to branches should be rare if not nonexistent. And you
> are deciding to banish merges the other way around.

Out of curiosity, why do you say that merges from trunk to branches
should be rare?  It seems to me that any long-lived development branch
will require merges from trunk to the branch.  Are you saying that
those kinds of branches are rare?

In GCC we have historically had a pattern in which people use
long-lived parallel branches that maintain specific patches on top of
GCC trunk.  These branches provide a simple way to get a variant of
GCC with specific patches of interest to some people.  These branches
too require regular merges from trunk.

Ian

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-26 16:11                                 ` Maxim Kuvyrkov
  2019-12-26 16:58                                   ` Joseph Myers
@ 2019-12-29 18:31                                   ` Maxim Kuvyrkov
  2019-12-29 18:55                                     ` Joseph Myers
  2019-12-29 22:24                                     ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-29 18:31 UTC (permalink / raw)
  To: GCC Development
  Cc: Joseph Myers, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek

Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.

I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.

With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.

I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.

Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.

Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.


=== Missed merges ===

Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.

$ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
----
commit ef92c24b042965dfef982349cd5994a2e0ff5fde
Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
Date:   Mon Jul 20 08:15:51 2009 +0000

    Merge trunk through to r149768
    
    Legacy-ID: 149804

 COPYING.RUNTIME                                     |    73 +
 ChangeLog                                           |   270 +-
 MAINTAINERS                                         |    19 +-
<MANY OTHER FILES>
----

at the same time for svn-git scripts we have:

$ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
----
commit ce7d5c8df673a7a561c29f095869f20567a7c598
Merge: 4970119c20da 3a69b1e566a7
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Mon Jul 20 08:15:51 2009 +0000

    Merge trunk through to r149768
    
    git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
----

... which agrees with
$ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
/trunk:142588-149768

=== Bad author entries ===

Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.

=== Missed authors ===

Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".

Akos Kiss
Anders Bertelrud
Andrew Pochinsky
Anton Hartl
Arthur Norman
Aymeric Vincent

=== Conservative author entries ===

Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
> 
>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> 
>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>> Is there some easy way (e.g. file in the conversion scripts) to correct
>> spelling and other mistakes in the commit authors?
>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>> Jakub Jakub Jelinek (1):
>> Jakub Jeilnek (1):
>> Jelinek (1):
>> entries next to the expected one with most of the commits.
>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>> other names and if we have one with many commits and then one with very few
>> with small edit distance from those, flag it for human review.
> 
> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
> 
> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
> 
> == Merges on trunk ==
> 
> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
> 
> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
> 
> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
> 
> == Differences in trees ==
> 
> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
> Here is SVN log of that revision (restoration of deleted trunk):
> ------------------------------------------------------------------------
> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
> Changed paths:
>   A /trunk (from /trunk:130802)
> ------------------------------------------------------------------------
> 
> Reposurgeon conversion has:
> -------------
> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
> Author: Daniel Berlin <dberlin@gcc.gnu.org>
> Date:   Thu Dec 13 01:53:37 2007 +0000
> 
>    Readd trunk
> 
>    Legacy-ID: 130805
> 
> .gitignore | 17 -----------------
> 1 file changed, 17 deletions(-)
> -------------
> and my conversion has:
> -------------
> commit fb128f3970789ce094c798945b4fa20eceb84cc7
> Author: Daniel Berlin <dberlin@dbrelin.org>
> Date:   Thu Dec 13 01:53:37 2007 +0000
> 
>    Readd trunk
> 
> 
>    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
> -------------
> 
> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
> 
> == Committer entries ==
> 
> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
> 
> reposurgeon-5a:
> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
> 
> pretty:
> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
> 
> == Bad summary line ==
> 
> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
> 
> commit cc2726884d56995c514d8171cc4a03657851657e
> Author: Chris Fairles <chris.fairles@gmail.com>
> Date:   Wed Jul 23 14:49:00 2008 +0000
> 
>    acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
> 
>    2008-07-23  Chris Fairles <chris.fairles@gmail.com>
> 
>            * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>            Holds the lib that defines clock_gettime (-lrt or -lposix4).
>            * src/Makefile.am: Use it.
>            * configure: Regenerate.
>            * configure.in: Likewise.
>            * Makefile.in: Likewise.
>            * src/Makefile.in: Likewise.
>            * libsup++/Makefile.in: Likewise.
>            * po/Makefile.in: Likewise.
>            * doc/Makefile.in: Likewise.
> 
>    Legacy-ID: 138087
> 
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 18:31                                   ` Maxim Kuvyrkov
@ 2019-12-29 18:55                                     ` Joseph Myers
  2019-12-29 22:47                                       ` Eric S. Raymond
  2019-12-29 22:24                                     ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-29 18:55 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek, frnchfrgg

On Sun, 29 Dec 2019, Maxim Kuvyrkov wrote:

> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".

It aims to be conservatively safe regarding merges, erring on the side of 
not adding incorrect merges if in doubt.  Because of the difficulty in 
matching SVN and git merge semantics, it's inherently hard to define 
unambiguously exactly which merges are correct and which are cherry-picks 
or erroneous.  I think extra merges are something nice-to-have rather than 
critical.

The case you mention is one where there was a merge to a branch not from 
its immediate parent but from an indirect parent.  I don't think it would 
be hard to support detecting such merges in reposurgeon.

> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.

These are already fixed in bugdb.py since that conversion, as part of the 
general review of authors to fix typos and make them more uniform.

> Reposurgeon-6a conversion misses many authors, below is a list of people 
> with names starting with "A".
> 
> Akos Kiss

This is an example where the originally added ChangeLog entry was 
malformed (had the date in the form "2004-0630"), so a conservatively safe 
approach was taken of using the committer rather than trying to guess what 
a malformed ChangeLog entry means and risk extracting nonsense.

I expect other cases are being similarly careful in cases where there was 
a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
authors so leaving its single-author nature ambiguous.  Parsing 
ChangeLogs, especially where malformed entries are involved, is inherently 
a heuristic matter.

-- 
Joseph S. Myers
jsm@polyomino.org.uk

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 13:26                                                         ` Segher Boessenkool
  2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
@ 2019-12-29 21:31                                                           ` Thomas Koenig
  2019-12-29 23:57                                                             ` Jeff Law
  1 sibling, 1 reply; 198+ messages in thread
From: Thomas Koenig @ 2019-12-29 21:31 UTC (permalink / raw)
  To: Segher Boessenkool, Julien '_FrnchFrgg_' RIVAUD; +Cc: gcc

Am 29.12.19 um 14:26 schrieb Segher Boessenkool:
> We cannot waste a year on a social experiment.  We can slowly and carefully
> adopt new procedures, certainly.  But anything drastic isn't advisable imo.
> 
> Also, many GCC developers aren't familiar with Git at all.  It takes time
> to learn it, and to learn new ways of working.  Small steps are needed.

Amen to that.

My uses of git have can be counted in a single digit (in decimal).  I am
just hoping you guys know what you are doing, and I am a bit
apprehensive about the change and my continued ability to contribute.

Talk of a radical new development model does not raise my confidence.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 18:31                                   ` Maxim Kuvyrkov
  2019-12-29 18:55                                     ` Joseph Myers
@ 2019-12-29 22:24                                     ` Richard Earnshaw (lists)
  2019-12-30  0:18                                       ` Joseph Myers
  2019-12-30 13:01                                       ` Maxim Kuvyrkov
  1 sibling, 2 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-29 22:24 UTC (permalink / raw)
  To: Maxim Kuvyrkov, GCC Development
  Cc: Joseph Myers, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Jakub Jelinek

On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
> 
> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
> 
> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
> 
> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
> 

I don't think either of these conversions are any more ready to use than
the reposurgeon one, possibly less so.  In fact, there are still some
major issues to resolve first before they can be considered.

gcc-pretty has completely wrong parent information for the gcc-3 era
release tags, showing the tags as being made directly from trunk with
massive deltas representing the roll-up of all the commits that were
made on the gcc-3 release branch.

gcc-reparent is better, but many (most?) of the release tags are shown
as merge commits with a fake parent back to the gcc-3 branch point,
which is certainly not what happened when the tagging was done at that
time.

Both of these factually misrepresent the history at the time of the
release tag being made.

As for converting my script to work with your tools, I'm afraid I don't
have time to work on that right now.  I'm still bogged down validating
the incorrect bug ids that the script has identified for some commits.
I'm making good progress (we're down to 160 unreviewed commits now), but
it is still going to take what time I have over the next week to
complete that task.

Furthermore, there is no documentation on how your conversion scripts
work, so it is not possible for me to test any work I might do in order
to validate such changes.  Not being able to run the script locally to
test change would be a non-starter.

You are welcome, of course, to clone the script I have and attempt to
modify it yourself, it's reasonably well documented.  The sources can be
found in esr's gcc-conversion repository here:
https://gitlab.com/esr/gcc-conversion.git


> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
> 
> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
> 
> 
> === Missed merges ===
> 
> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
> 
> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
> ----
> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
> Date:   Mon Jul 20 08:15:51 2009 +0000
> 
>     Merge trunk through to r149768
>     
>     Legacy-ID: 149804
> 
>  COPYING.RUNTIME                                     |    73 +
>  ChangeLog                                           |   270 +-
>  MAINTAINERS                                         |    19 +-
> <MANY OTHER FILES>
> ----
> 
> at the same time for svn-git scripts we have:
> 
> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
> ----
> commit ce7d5c8df673a7a561c29f095869f20567a7c598
> Merge: 4970119c20da 3a69b1e566a7
> Author: Richard Earnshaw <rearnsha@arm.com>
> Date:   Mon Jul 20 08:15:51 2009 +0000
> 
>     Merge trunk through to r149768
>     
>     git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
> ----
> 
> ... which agrees with
> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
> /trunk:142588-149768
> 
> === Bad author entries ===
> 
> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
> 
> === Missed authors ===
> 
> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
> 
> Akos Kiss
> Anders Bertelrud
> Andrew Pochinsky
> Anton Hartl
> Arthur Norman
> Aymeric Vincent
> 
> === Conservative author entries ===
> 
> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
> 
>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>
>>
>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>
>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>> spelling and other mistakes in the commit authors?
>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>> Jakub Jakub Jelinek (1):
>>> Jakub Jeilnek (1):
>>> Jelinek (1):
>>> entries next to the expected one with most of the commits.
>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>> other names and if we have one with many commits and then one with very few
>>> with small edit distance from those, flag it for human review.
>>
>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>
>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>
>> == Merges on trunk ==
>>
>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>
>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>
>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>
>> == Differences in trees ==
>>
>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>> Here is SVN log of that revision (restoration of deleted trunk):
>> ------------------------------------------------------------------------
>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>> Changed paths:
>>   A /trunk (from /trunk:130802)
>> ------------------------------------------------------------------------
>>
>> Reposurgeon conversion has:
>> -------------
>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>
>>    Readd trunk
>>
>>    Legacy-ID: 130805
>>
>> .gitignore | 17 -----------------
>> 1 file changed, 17 deletions(-)
>> -------------
>> and my conversion has:
>> -------------
>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>> Author: Daniel Berlin <dberlin@dbrelin.org>
>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>
>>    Readd trunk
>>
>>
>>    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>> -------------
>>
>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>
>> == Committer entries ==
>>
>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>
>> reposurgeon-5a:
>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>
>> pretty:
>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>
>> == Bad summary line ==
>>
>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>
>> commit cc2726884d56995c514d8171cc4a03657851657e
>> Author: Chris Fairles <chris.fairles@gmail.com>
>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>
>>    acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>
>>    2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>
>>            * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>            Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>            * src/Makefile.am: Use it.
>>            * configure: Regenerate.
>>            * configure.in: Likewise.
>>            * Makefile.in: Likewise.
>>            * src/Makefile.in: Likewise.
>>            * libsup++/Makefile.in: Likewise.
>>            * po/Makefile.in: Likewise.
>>            * doc/Makefile.in: Likewise.
>>
>>    Legacy-ID: 138087
>>
>>
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>>
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 16:47                           ` Mark Wielaard
@ 2019-12-29 22:42                             ` Joseph Myers
  0 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-29 22:42 UTC (permalink / raw)
  To: Mark Wielaard
  Cc: Segher Boessenkool, Alexandre Oliva, Jeff Law, Eric S. Raymond,
	Maxim Kuvyrkov, Richard Earnshaw (lists),
	gcc

On Sun, 29 Dec 2019, Mark Wielaard wrote:

> Maybe we should have a separate historical git repo which contains
> everything that we were able to salvage and that people could git
> remote add if they are really, really interested.

I'm not convinced that's very different from having one repo with 
everything but some pieces in refs that aren't fetched by default.  Maybe 
separate repos make fetching a bit more efficient if it allows packs to be 
reused on the server, but they also mean extra administrative overhead 
ensuring the correct configuration for each repo (for public access, not 
allowing pushes to the historical repo, etc.).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 18:55                                     ` Joseph Myers
@ 2019-12-29 22:47                                       ` Eric S. Raymond
  2019-12-29 23:00                                         ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-29 22:47 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek, frnchfrgg

Joseph Myers <jsm@polyomino.org.uk>:
> The case you mention is one where there was a merge to a branch not from 
> its immediate parent but from an indirect parent.  I don't think it would 
> be hard to support detecting such merges in reposurgeon.

We're working on it.

> This is an example where the originally added ChangeLog entry was 
> malformed (had the date in the form "2004-0630"), so a conservatively safe 
> approach was taken of using the committer rather than trying to guess what 
> a malformed ChangeLog entry means and risk extracting nonsense.
> 
> I expect other cases are being similarly careful in cases where there was 
> a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
> authors so leaving its single-author nature ambiguous.  Parsing 
> ChangeLogs, especially where malformed entries are involved, is inherently 
> a heuristic matter.

As Joseph says, one of reposurgeon's design principles is "First, do no harm."

And yes, changelogs are full of malformations and junk like this. I
saw and dealt with a lifetime's worth while converting the Emacs
history from bzr to git.

If you try to interpret any random garbage in, you will assuredly
get garbage out when you least expect it. Often the cost of this 
sort of mistake is not fully realized until it is far too late
for correction.  This is *why* reposurgeon is conservative.

The correct thing for reposurgeon to do is flag unparseable entry
headers for human intervention, and as of today it does that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 22:47                                       ` Eric S. Raymond
@ 2019-12-29 23:00                                         ` Joseph Myers
  2019-12-29 23:13                                           ` Segher Boessenkool
  0 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2019-12-29 23:00 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Richard Earnshaw (lists),
	Jakub Jelinek, frnchfrgg

On Sun, 29 Dec 2019, Eric S. Raymond wrote:

> Joseph Myers <jsm@polyomino.org.uk>:
> > The case you mention is one where there was a merge to a branch not from 
> > its immediate parent but from an indirect parent.  I don't think it would 
> > be hard to support detecting such merges in reposurgeon.
> 
> We're working on it.

And the other example branch mentioned (redhat/gcc-9-branch) is a 
different case: if the merge from gcc-9-branch to redhat/gcc-9-branch had 
been done in the idiomatic way with modern SVN (i.e. naming the branch to 
merge from and letting SVN deal with identifying the revisions involved), 
I think reposurgeon would have handled it just fine.  But the commit 
messages indicate the merge was done in an old-fashioned way (naming 
individual ranges of revisions to merge manually), which resulted in merge 
properties very slightly different from what SVN creates automatically.  
Now I understand what the difference is I expect we'll be able to fix that 
case as well.

> As Joseph says, one of reposurgeon's design principles is "First, do no harm."
> 
> And yes, changelogs are full of malformations and junk like this. I
> saw and dealt with a lifetime's worth while converting the Emacs
> history from bzr to git.
> 
> If you try to interpret any random garbage in, you will assuredly
> get garbage out when you least expect it. Often the cost of this 
> sort of mistake is not fully realized until it is far too late
> for correction.  This is *why* reposurgeon is conservative.
> 
> The correct thing for reposurgeon to do is flag unparseable entry
> headers for human intervention, and as of today it does that.

Furthermore, we can compare authors in the different conversions to 
identify cases where, based on a manual review, Maxim's heuristics produce 
better results for a particular commit, and add those to the list of 
fixups in bugdb.py - and that way benefit both from reposurgeon making 
choices that are as conservatively safe as possible, which seems a 
desirable property for problem cases that haven't been manually reviewed, 
and from different heuristics helping suggest improvements in particular 
cases.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 23:00                                         ` Joseph Myers
@ 2019-12-29 23:13                                           ` Segher Boessenkool
  2019-12-30 15:36                                             ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-29 23:13 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard,
	Richard Earnshaw (lists),
	Jakub Jelinek, frnchfrgg

On Sun, Dec 29, 2019 at 11:00:08PM +0000, Joseph Myers wrote:
> fixups in bugdb.py - and that way benefit both from reposurgeon making 
> choices that are as conservatively safe as possible, which seems a 
> desirable property for problem cases that haven't been manually reviewed, 

Problem cases that haven't been manually reviewed should *be* manually
reviewed, or the heuristics improved so there are fewer problem cases.

As I've said many many times now, we only have *one* repository to
convert here.  Taking shortcuts is *good*, making problems for ourselves
by pretending we do things more generically is *bad*.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 21:31                                                           ` Thomas Koenig
@ 2019-12-29 23:57                                                             ` Jeff Law
  0 siblings, 0 replies; 198+ messages in thread
From: Jeff Law @ 2019-12-29 23:57 UTC (permalink / raw)
  To: Thomas Koenig, Segher Boessenkool, Julien '_FrnchFrgg_' RIVAUD
  Cc: gcc

On Sun, 2019-12-29 at 22:30 +0100, Thomas Koenig wrote:
> Am 29.12.19 um 14:26 schrieb Segher Boessenkool:
> > We cannot waste a year on a social experiment.  We can slowly and carefully
> > adopt new procedures, certainly.  But anything drastic isn't advisable imo.
> > 
> > Also, many GCC developers aren't familiar with Git at all.  It takes time
> > to learn it, and to learn new ways of working.  Small steps are needed.
> 
> Amen to that.
> 
> My uses of git have can be counted in a single digit (in decimal).  I am
> just hoping you guys know what you are doing, and I am a bit
> apprehensive about the change and my continued ability to contribute.
> 
> Talk of a radical new development model does not raise my confidence.
I was fairly anti GIT for a while, but there's simplistic workflows you
can use which will be close enough to SVN that you're really just
changing the commands you're using, not your entire workflow.

You can add in "git specific" workflows later as you've become familiar
with the basics.  That's what I did, and boy once you wrap your head
around git rebase for dealing with work in progress it's a game
changer.

jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 22:24                                     ` Richard Earnshaw (lists)
@ 2019-12-30  0:18                                       ` Joseph Myers
  2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
  2019-12-30 12:39                                         ` Maxim Kuvyrkov
  2019-12-30 13:01                                       ` Maxim Kuvyrkov
  1 sibling, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-30  0:18 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, GCC Development, Alexandre Oliva,
	Eric S. Raymond, Jeff Law, Segher Boessenkool, Mark Wielaard,
	Jakub Jelinek

On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote:

> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

And looking at the history of gcc-reparent as part of preparing to compare 
authors to identify commits needing manual attention to author 
identification, I see other oddities.

Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The 
history ends up containing two different versions of SVN r5 and of many 
other commits.  One of them looks normal:

commit c01d37f1690de9ea83b341780fad458f506b80c7
Author: Charles Hannum <mycroft@gcc.gnu.org>
Date:   Mon Nov 27 21:22:14 1989 +0000

    entered into RCS

    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 138bc75d-0d04-0410-961f-82ee72b054a4

The other looks strange:

commit 09c5a0fa5ed76e566668cc67f3d72bf397277fdd
Author: Charles Hannum <mycroft@gcc.gnu.org>
Date:   Mon Nov 27 21:22:14 1989 +0000

    entered into RCS

    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 138bc75d-0d04-0410-961f-82ee72b054a4
    Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0)
    Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219)
    Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414)
    Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d)
    Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c)
    Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316)
    Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b)

(and in fact it seems there are *four* commits corresponding to SVN r5 and 
reachable from refs in the gcc-reparent repository).  So we don't just 
have stray merge commits, they actually end up leading back to strange 
alternative versions of history (which I think is clearly worse than 
conservatively not having a merge commit in some case where a commit might 
or might not be unambiguously a merge - if a merge was missed on an active 
branch, the branch maintainer can easily correct that afterwards with "git 
merge -s ours" to avoid problems with future merges).

My expectation is that there are only multiple git commits corresponding 
to an SVN commit when the SVN commit touched more than one SVN branch or 
tag and so has to be split to represent it in git (there are about 1500 
such SVN commits, most of them automatic datestamp updates in the CVS era 
that cvs2svn turned into mixed-branch commits).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
@ 2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
  0 siblings, 0 replies; 198+ messages in thread
From: Julien "FrnchFrgg" Rivaud @ 2019-12-30  0:31 UTC (permalink / raw)
  To: gcc

Le 29/12/2019 Ã  18:30, Ian Lance Taylor via gcc a Ã©critÂ :
> On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
> <frnchfrgg@free.fr> wrote:
>>
>> Which brings me to something I find strange in your policy: to me,
>> merges from trunk to branches should be rare if not nonexistent. And you
>> are deciding to banish merges the other way around.
> 
> Out of curiosity, why do you say that merges from trunk to branches
> should be rare?  It seems to me that any long-lived development branch
> will require merges from trunk to the branch.  Are you saying that
> those kinds of branches are rare?

Because in most cases, the development branch should be periodically 
rebased on top of master, not use a merge from master to the branch.

Maybe that's easier to do while developping, but in the end a real 
rebase should be made (dropping the merge commits), because what you 
will send to the ML for review should be a logical stream of changes and 
"update" merge commits are not that.

Thankfully, if you have git rerere enabled, most conflict resolutions 
you did while merging will be reused when rebasing so it should not be 
too painful.

> 
> In GCC we have historically had a pattern in which people use
> long-lived parallel branches that maintain specific patches on top of
> GCC trunk.  These branches provide a simple way to get a variant of
> GCC with specific patches of interest to some people.  These branches
> too require regular merges from trunk.

In that case, sure. But I expect these branches to never be merged in 
trunk. So the real rule would be Â« branches that merge from trunk should 
not be merged into trunk Â» (rather than Â« forbid merges into trunk Â» or 
even Â« pretend nobody ever merged anything into trunk, these aren't the 
droids you are looking for Â»)

> 
> Ian
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30  0:18                                       ` Joseph Myers
@ 2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
  2019-12-30 12:39                                         ` Maxim Kuvyrkov
  1 sibling, 0 replies; 198+ messages in thread
From: Julien "FrnchFrgg" Rivaud @ 2019-12-30  0:44 UTC (permalink / raw)
  To: gcc

Le 30/12/2019 Ã  01:18, Joseph Myers a Ã©critÂ :

> Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The
> history ends up containing two different versions of SVN r5 and of many
> other commits.

When trying to migrate Blender from svn to git, we actually tried 
git-svn first, and it produced that kind of strangeness. Sometimes, 
something it didn't like in a commit made it duplicate or even multiply 
more the whole history predating that commit, with slight differences 
(that explain the differing sha1 and thus the multiple versions).

That's actually the reason I got involved with reposurgeon in the first 
place, trying to make the then Python version able to cope with the 
Blender repository with less than 64GB of ram.

I thought that working around git-svn to only feed it linear commits 
would sidestep that bug, but it looks like it still can be triggered.

(At the time the bug was so common that we ended with maybe 20 or 30 
times the first 1500 commits in the repository, and of course with the 
speed of git-svn, doing 30 times the same work is horrendous)

Julien

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30  0:18                                       ` Joseph Myers
  2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
@ 2019-12-30 12:39                                         ` Maxim Kuvyrkov
  1 sibling, 0 replies; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-30 12:39 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Eric S. Raymond, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Jakub Jelinek

> On Dec 30, 2019, at 3:18 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote:
> 
>> gcc-reparent is better, but many (most?) of the release tags are shown
>> as merge commits with a fake parent back to the gcc-3 branch point,
>> which is certainly not what happened when the tagging was done at that
>> time.
> 
> And looking at the history of gcc-reparent as part of preparing to compare 
> authors to identify commits needing manual attention to author 
> identification, I see other oddities.
> 
> Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The 
> history ends up containing two different versions of SVN r5 and of many 
> other commits.  One of them looks normal:
> 
> commit c01d37f1690de9ea83b341780fad458f506b80c7
> Author: Charles Hannum <mycroft@gcc.gnu.org>
> Date:   Mon Nov 27 21:22:14 1989 +0000
> 
>    entered into RCS
> 
> 
>    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> The other looks strange:
> 
> commit 09c5a0fa5ed76e566668cc67f3d72bf397277fdd
> Author: Charles Hannum <mycroft@gcc.gnu.org>
> Date:   Mon Nov 27 21:22:14 1989 +0000
> 
>    entered into RCS
> 
> 
>    git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 138bc75d-0d04-0410-961f-82ee72b054a4
>    Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0)
>    Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219)
>    Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414)
>    Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d)
>    Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c)
>    Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316)
>    Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b)
> 
> (and in fact it seems there are *four* commits corresponding to SVN r5 and 
> reachable from refs in the gcc-reparent repository).  So we don't just 
> have stray merge commits, they actually end up leading back to strange 
> alternative versions of history (which I think is clearly worse than 
> conservatively not having a merge commit in some case where a commit might 
> or might not be unambiguously a merge - if a merge was missed on an active 
> branch, the branch maintainer can easily correct that afterwards with "git 
> merge -s ours" to avoid problems with future merges).
> 
> My expectation is that there are only multiple git commits corresponding 
> to an SVN commit when the SVN commit touched more than one SVN branch or 
> tag and so has to be split to represent it in git (there are about 1500 
> such SVN commits, most of them automatic datestamp updates in the CVS era 
> that cvs2svn turned into mixed-branch commits).

Thanks for catching this.  This is fallout from incremental rebuilds (rather than fresh builds) of gcc-reparent repository.  Incremental builds take about 1h and full rebuilds take about 30h.  I'll switch to doing full rebuilds.

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 22:24                                     ` Richard Earnshaw (lists)
  2019-12-30  0:18                                       ` Joseph Myers
@ 2019-12-30 13:01                                       ` Maxim Kuvyrkov
  2019-12-30 15:31                                         ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-30 13:01 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: GCC Development, Joseph Myers, Alexandre Oliva, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
>> 
>> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
>> 
>> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
>> 
>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
>> 
> 
> I don't think either of these conversions are any more ready to use than
> the reposurgeon one, possibly less so.  In fact, there are still some
> major issues to resolve first before they can be considered.
> 
> gcc-pretty has completely wrong parent information for the gcc-3 era
> release tags, showing the tags as being made directly from trunk with
> massive deltas representing the roll-up of all the commits that were
> made on the gcc-3 release branch.

I will clarify the above statement, and please correct me where you think I'm wrong.  Gcc-pretty conversion has the exact right parent information for the gcc-3 era
release tags as recorded in SVN version history.  Gcc-pretty conversion aims to produce an exact copy of SVN history in git.  IMO, it manages to do so just fine.

It is a different thing that SVN history has a screwed up record of gcc-3 era tags.

> 
> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

I agree with you here.

> 
> Both of these factually misrepresent the history at the time of the
> release tag being made.

Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time.  I would argue their continued brokenness is not a show-stopper.

Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to
1. Convert entire SVN history to git.
2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries.

Gcc-pretty does (1) in entirety.

For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents.  I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty.

> 
> As for converting my script to work with your tools, I'm afraid I don't
> have time to work on that right now.  I'm still bogged down validating
> the incorrect bug ids that the script has identified for some commits.
> I'm making good progress (we're down to 160 unreviewed commits now), but
> it is still going to take what time I have over the next week to
> complete that task.
> 
> Furthermore, there is no documentation on how your conversion scripts
> work, so it is not possible for me to test any work I might do in order
> to validate such changes.  Not being able to run the script locally to
> test change would be a non-starter.
> 
> You are welcome, of course, to clone the script I have and attempt to
> modify it yourself, it's reasonably well documented.  The sources can be
> found in esr's gcc-conversion repository here:
> https://gitlab.com/esr/gcc-conversion.git

--
Maxim Kuvyrkov
https://www.linaro.org

> 
> 
>> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
>> 
>> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
>> 
>> 
>> === Missed merges ===
>> 
>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>> 
>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>> ----
>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
>> Date:   Mon Jul 20 08:15:51 2009 +0000
>> 
>>    Merge trunk through to r149768
>> 
>>    Legacy-ID: 149804
>> 
>> COPYING.RUNTIME                                     |    73 +
>> ChangeLog                                           |   270 +-
>> MAINTAINERS                                         |    19 +-
>> <MANY OTHER FILES>
>> ----
>> 
>> at the same time for svn-git scripts we have:
>> 
>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>> ----
>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>> Merge: 4970119c20da 3a69b1e566a7
>> Author: Richard Earnshaw <rearnsha@arm.com>
>> Date:   Mon Jul 20 08:15:51 2009 +0000
>> 
>>    Merge trunk through to r149768
>> 
>>    git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
>> ----
>> 
>> ... which agrees with
>> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>> /trunk:142588-149768
>> 
>> === Bad author entries ===
>> 
>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
>> 
>> === Missed authors ===
>> 
>> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
>> 
>> Akos Kiss
>> Anders Bertelrud
>> Andrew Pochinsky
>> Anton Hartl
>> Arthur Norman
>> Aymeric Vincent
>> 
>> === Conservative author entries ===
>> 
>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>> 
>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>> 
>>> 
>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> 
>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>> spelling and other mistakes in the commit authors?
>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>> Jakub Jakub Jelinek (1):
>>>> Jakub Jeilnek (1):
>>>> Jelinek (1):
>>>> entries next to the expected one with most of the commits.
>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>>> other names and if we have one with many commits and then one with very few
>>>> with small edit distance from those, flag it for human review.
>>> 
>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>> 
>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>> 
>>> == Merges on trunk ==
>>> 
>>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>> 
>>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>> 
>>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>> 
>>> == Differences in trees ==
>>> 
>>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>> Here is SVN log of that revision (restoration of deleted trunk):
>>> ------------------------------------------------------------------------
>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>> Changed paths:
>>>  A /trunk (from /trunk:130802)
>>> ------------------------------------------------------------------------
>>> 
>>> Reposurgeon conversion has:
>>> -------------
>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>> 
>>>   Readd trunk
>>> 
>>>   Legacy-ID: 130805
>>> 
>>> .gitignore | 17 -----------------
>>> 1 file changed, 17 deletions(-)
>>> -------------
>>> and my conversion has:
>>> -------------
>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>> Author: Daniel Berlin <dberlin@dbrelin.org>
>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>> 
>>>   Readd trunk
>>> 
>>> 
>>>   git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>>> -------------
>>> 
>>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>> 
>>> == Committer entries ==
>>> 
>>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>> 
>>> reposurgeon-5a:
>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>> 
>>> pretty:
>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>> 
>>> == Bad summary line ==
>>> 
>>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>> 
>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>> Author: Chris Fairles <chris.fairles@gmail.com>
>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>> 
>>>   acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>> 
>>>   2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>> 
>>>           * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>           Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>           * src/Makefile.am: Use it.
>>>           * configure: Regenerate.
>>>           * configure.in: Likewise.
>>>           * Makefile.in: Likewise.
>>>           * src/Makefile.in: Likewise.
>>>           * libsup++/Makefile.in: Likewise.
>>>           * po/Makefile.in: Likewise.
>>>           * doc/Makefile.in: Likewise.
>>> 
>>>   Legacy-ID: 138087
>>> 
>>> 
>>> --
>>> Maxim Kuvyrkov
>>> https://www.linaro.org
>>> 
>> 
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 13:01                                       ` Maxim Kuvyrkov
@ 2019-12-30 15:31                                         ` Richard Earnshaw (lists)
  2019-12-30 15:49                                           ` Maxim Kuvyrkov
  0 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-30 15:31 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Joseph Myers, Alexandre Oliva, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
>>>
>>> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
>>>
>>> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
>>>
>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
>>>
>>
>> I don't think either of these conversions are any more ready to use than
>> the reposurgeon one, possibly less so.  In fact, there are still some
>> major issues to resolve first before they can be considered.
>>
>> gcc-pretty has completely wrong parent information for the gcc-3 era
>> release tags, showing the tags as being made directly from trunk with
>> massive deltas representing the roll-up of all the commits that were
>> made on the gcc-3 release branch.
> 
> I will clarify the above statement, and please correct me where you think I'm wrong.  Gcc-pretty conversion has the exact right parent information for the gcc-3 era
> release tags as recorded in SVN version history.  Gcc-pretty conversion aims to produce an exact copy of SVN history in git.  IMO, it manages to do so just fine.
> 
> It is a different thing that SVN history has a screwed up record of gcc-3 era tags.

It's not screwed up in svn.  Svn shows the correct history information for the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.

For example, looking at gcc_3_0_release in expr.c with git blame and svn blame shows

git blame expr.c:

ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)         return temp;
ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       }
5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     /* Copy the address into a pseudo, so that the returned value
5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)        remains correct across calls to emit_queue.  */
5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     return new;

git log 5fbf0b0d5828
commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
Author: no-author <no-author@gcc.gnu.org>
Date:   Sun Jun 17 19:44:25 2001 +0000

    This commit was manufactured by cvs2svn to create tag
    'gcc_3_0_release'.

while svn blame expr.c correctly shows:

   386     kenner             return temp;
   386     kenner           }
 42209     bernds         /* Copy the address into a pseudo, so that the returned value
 42209     bernds            remains correct across calls to emit_queue.  */
 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
  6375     kenner         return new;

svn log -r42209 ^/
------------------------------------------------------------------------
r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines

Fix queueing-related bugs

In other words, svn can correctly track the files that were modified on the release branch, while the git conversion looses that information, rolling up all the diffs on the release branch into a single unattributed commit.

As I said, gcc-reparent is better in this regard, but there are still artefacts from conversion, such as incorrect merge records, that show up.

R.

> 
>>
>> gcc-reparent is better, but many (most?) of the release tags are shown
>> as merge commits with a fake parent back to the gcc-3 branch point,
>> which is certainly not what happened when the tagging was done at that
>> time.
> 
> I agree with you here.
> 
>>
>> Both of these factually misrepresent the history at the time of the
>> release tag being made.
> 
> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time.  I would argue their continued brokenness is not a show-stopper.
> 
> Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to
> 1. Convert entire SVN history to git.
> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries.
> 
> Gcc-pretty does (1) in entirety.
> 
> For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents.  I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty.
> 
>>
>> As for converting my script to work with your tools, I'm afraid I don't
>> have time to work on that right now.  I'm still bogged down validating
>> the incorrect bug ids that the script has identified for some commits.
>> I'm making good progress (we're down to 160 unreviewed commits now), but
>> it is still going to take what time I have over the next week to
>> complete that task.
>>
>> Furthermore, there is no documentation on how your conversion scripts
>> work, so it is not possible for me to test any work I might do in order
>> to validate such changes.  Not being able to run the script locally to
>> test change would be a non-starter.
>>
>> You are welcome, of course, to clone the script I have and attempt to
>> modify it yourself, it's reasonably well documented.  The sources can be
>> found in esr's gcc-conversion repository here:
>> https://gitlab.com/esr/gcc-conversion.git
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>>
>>
>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
>>>
>>> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
>>>
>>>
>>> === Missed merges ===
>>>
>>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>>>
>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>> ----
>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>
>>>    Merge trunk through to r149768
>>>
>>>    Legacy-ID: 149804
>>>
>>> COPYING.RUNTIME                                     |    73 +
>>> ChangeLog                                           |   270 +-
>>> MAINTAINERS                                         |    19 +-
>>> <MANY OTHER FILES>
>>> ----
>>>
>>> at the same time for svn-git scripts we have:
>>>
>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>> ----
>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>> Merge: 4970119c20da 3a69b1e566a7
>>> Author: Richard Earnshaw <rearnsha@arm.com>
>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>
>>>    Merge trunk through to r149768
>>>
>>>    git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
>>> ----
>>>
>>> ... which agrees with
>>> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>> /trunk:142588-149768
>>>
>>> === Bad author entries ===
>>>
>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
>>>
>>> === Missed authors ===
>>>
>>> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
>>>
>>> Akos Kiss
>>> Anders Bertelrud
>>> Andrew Pochinsky
>>> Anton Hartl
>>> Arthur Norman
>>> Aymeric Vincent
>>>
>>> === Conservative author entries ===
>>>
>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
>>>
>>> Regards,
>>>
>>> --
>>> Maxim Kuvyrkov
>>> https://www.linaro.org
>>>
>>>
>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>>>
>>>>
>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>
>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>> spelling and other mistakes in the commit authors?
>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>>> Jakub Jakub Jelinek (1):
>>>>> Jakub Jeilnek (1):
>>>>> Jelinek (1):
>>>>> entries next to the expected one with most of the commits.
>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>>>> other names and if we have one with many commits and then one with very few
>>>>> with small edit distance from those, flag it for human review.
>>>>
>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>>>
>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>
>>>> == Merges on trunk ==
>>>>
>>>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>>>
>>>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>>>
>>>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>>>
>>>> == Differences in trees ==
>>>>
>>>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>> ------------------------------------------------------------------------
>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>> Changed paths:
>>>>  A /trunk (from /trunk:130802)
>>>> ------------------------------------------------------------------------
>>>>
>>>> Reposurgeon conversion has:
>>>> -------------
>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>
>>>>   Readd trunk
>>>>
>>>>   Legacy-ID: 130805
>>>>
>>>> .gitignore | 17 -----------------
>>>> 1 file changed, 17 deletions(-)
>>>> -------------
>>>> and my conversion has:
>>>> -------------
>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>> Author: Daniel Berlin <dberlin@dbrelin.org>
>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>
>>>>   Readd trunk
>>>>
>>>>
>>>>   git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>>>> -------------
>>>>
>>>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>>>
>>>> == Committer entries ==
>>>>
>>>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>>>
>>>> reposurgeon-5a:
>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>>>
>>>> pretty:
>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>>>
>>>> == Bad summary line ==
>>>>
>>>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>>>
>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>> Author: Chris Fairles <chris.fairles@gmail.com>
>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>
>>>>   acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>
>>>>   2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>>>
>>>>           * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>           Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>           * src/Makefile.am: Use it.
>>>>           * configure: Regenerate.
>>>>           * configure.in: Likewise.
>>>>           * Makefile.in: Likewise.
>>>>           * src/Makefile.in: Likewise.
>>>>           * libsup++/Makefile.in: Likewise.
>>>>           * po/Makefile.in: Likewise.
>>>>           * doc/Makefile.in: Likewise.
>>>>
>>>>   Legacy-ID: 138087
>>>>
>>>>
>>>> --
>>>> Maxim Kuvyrkov
>>>> https://www.linaro.org
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-29 23:13                                           ` Segher Boessenkool
@ 2019-12-30 15:36                                             ` Richard Earnshaw (lists)
  2019-12-30 22:37                                               ` Segher Boessenkool
  0 siblings, 1 reply; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-30 15:36 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	frnchfrgg

On 29/12/2019 23:13, Segher Boessenkool wrote:
> On Sun, Dec 29, 2019 at 11:00:08PM +0000, Joseph Myers wrote:
>> fixups in bugdb.py - and that way benefit both from reposurgeon making 
>> choices that are as conservatively safe as possible, which seems a 
>> desirable property for problem cases that haven't been manually reviewed, 
> 
> Problem cases that haven't been manually reviewed should *be* manually
> reviewed, or the heuristics improved so there are fewer problem cases.
> 

Thank you for offering to help with the checking.

;-)

R.

> As I've said many many times now, we only have *one* repository to
> convert here.  Taking shortcuts is *good*, making problems for ourselves
> by pretending we do things more generically is *bad*.
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 15:31                                         ` Richard Earnshaw (lists)
@ 2019-12-30 15:49                                           ` Maxim Kuvyrkov
  2019-12-30 16:08                                             ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2019-12-30 15:49 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: GCC Development, Joseph Myers, Alexandre Oliva, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>> 
>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>>> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
>>>> 
>>>> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
>>>> 
>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
>>>> 
>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
>>>> 
>>> 
>>> I don't think either of these conversions are any more ready to use than
>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>> major issues to resolve first before they can be considered.
>>> 
>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>> release tags, showing the tags as being made directly from trunk with
>>> massive deltas representing the roll-up of all the commits that were
>>> made on the gcc-3 release branch.
>> 
>> I will clarify the above statement, and please correct me where you think I'm wrong.  Gcc-pretty conversion has the exact right parent information for the gcc-3 era
>> release tags as recorded in SVN version history.  Gcc-pretty conversion aims to produce an exact copy of SVN history in git.  IMO, it manages to do so just fine.
>> 
>> It is a different thing that SVN history has a screwed up record of gcc-3 era tags.
> 
> It's not screwed up in svn.  Svn shows the correct history information for the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.
> 
> For example, looking at gcc_3_0_release in expr.c with git blame and svn blame shows

In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and from different revisions of this branch!).

$ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c \|/tags/gcc_3_0_release/gcc/reload.c "
   A /tags/gcc_3_0_release (from /trunk:39596)
   R /tags/gcc_3_0_release/gcc/expr.c (from /branches/gcc-3_0-branch/gcc/expr.c:43255)
   R /tags/gcc_3_0_release/gcc/reload.c (from /branches/gcc-3_0-branch/gcc/reload.c:42007)

IMO, from such history (absent external knowledge about better reparenting options) the best choice for parent branch is /trunk@39596, not /branches/gcc-3_0-branch at a random revision from the replaced files.

Still, I see your point, and I will fix reparenting support.  Whether GCC community opts to reparent or not reparent is a different topic.

--
Maxim Kuvyrkov
https://www.linaro.org


> git blame expr.c:
> 
> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)         return temp;
> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       }
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     /* Copy the address into a pseudo, so that the returned value
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)        remains correct across calls to emit_queue.  */
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
> 59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     return new;
> 
> git log 5fbf0b0d5828
> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
> Author: no-author <no-author@gcc.gnu.org>
> Date:   Sun Jun 17 19:44:25 2001 +0000
> 
>    This commit was manufactured by cvs2svn to create tag
>    'gcc_3_0_release'.
> 
> while svn blame expr.c correctly shows:
> 
>   386     kenner             return temp;
>   386     kenner           }
> 42209     bernds         /* Copy the address into a pseudo, so that the returned value
> 42209     bernds            remains correct across calls to emit_queue.  */
> 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>  6375     kenner         return new;
> 
> svn log -r42209 ^/
> ------------------------------------------------------------------------
> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines
> 
> Fix queueing-related bugs
> 
> In other words, svn can correctly track the files that were modified on the release branch, while the git conversion looses that information, rolling up all the diffs on the release branch into a single unattributed commit.
> 
> As I said, gcc-reparent is better in this regard, but there are still artefacts from conversion, such as incorrect merge records, that show up.
> 
> R.
> 
>> 
>>> 
>>> gcc-reparent is better, but many (most?) of the release tags are shown
>>> as merge commits with a fake parent back to the gcc-3 branch point,
>>> which is certainly not what happened when the tagging was done at that
>>> time.
>> 
>> I agree with you here.
>> 
>>> 
>>> Both of these factually misrepresent the history at the time of the
>>> release tag being made.
>> 
>> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time.  I would argue their continued brokenness is not a show-stopper.
>> 
>> Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to
>> 1. Convert entire SVN history to git.
>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries.
>> 
>> Gcc-pretty does (1) in entirety.
>> 
>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents.  I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty.
>> 
>>> 
>>> As for converting my script to work with your tools, I'm afraid I don't
>>> have time to work on that right now.  I'm still bogged down validating
>>> the incorrect bug ids that the script has identified for some commits.
>>> I'm making good progress (we're down to 160 unreviewed commits now), but
>>> it is still going to take what time I have over the next week to
>>> complete that task.
>>> 
>>> Furthermore, there is no documentation on how your conversion scripts
>>> work, so it is not possible for me to test any work I might do in order
>>> to validate such changes.  Not being able to run the script locally to
>>> test change would be a non-starter.
>>> 
>>> You are welcome, of course, to clone the script I have and attempt to
>>> modify it yourself, it's reasonably well documented.  The sources can be
>>> found in esr's gcc-conversion repository here:
>>> https://gitlab.com/esr/gcc-conversion.git
>> 
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>>> 
>>> 
>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
>>>> 
>>>> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
>>>> 
>>>> 
>>>> === Missed merges ===
>>>> 
>>>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>>>> 
>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>>> ----
>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>>> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>> 
>>>>   Merge trunk through to r149768
>>>> 
>>>>   Legacy-ID: 149804
>>>> 
>>>> COPYING.RUNTIME                                     |    73 +
>>>> ChangeLog                                           |   270 +-
>>>> MAINTAINERS                                         |    19 +-
>>>> <MANY OTHER FILES>
>>>> ----
>>>> 
>>>> at the same time for svn-git scripts we have:
>>>> 
>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>>> ----
>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>>> Merge: 4970119c20da 3a69b1e566a7
>>>> Author: Richard Earnshaw <rearnsha@arm.com>
>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>> 
>>>>   Merge trunk through to r149768
>>>> 
>>>>   git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
>>>> ----
>>>> 
>>>> ... which agrees with
>>>> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>>> /trunk:142588-149768
>>>> 
>>>> === Bad author entries ===
>>>> 
>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
>>>> 
>>>> === Missed authors ===
>>>> 
>>>> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
>>>> 
>>>> Akos Kiss
>>>> Anders Bertelrud
>>>> Andrew Pochinsky
>>>> Anton Hartl
>>>> Arthur Norman
>>>> Aymeric Vincent
>>>> 
>>>> === Conservative author entries ===
>>>> 
>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
>>>> 
>>>> Regards,
>>>> 
>>>> --
>>>> Maxim Kuvyrkov
>>>> https://www.linaro.org
>>>> 
>>>> 
>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>>>> 
>>>>> 
>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>> 
>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>>> spelling and other mistakes in the commit authors?
>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>>>> Jakub Jakub Jelinek (1):
>>>>>> Jakub Jeilnek (1):
>>>>>> Jelinek (1):
>>>>>> entries next to the expected one with most of the commits.
>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>>>>> other names and if we have one with many commits and then one with very few
>>>>>> with small edit distance from those, flag it for human review.
>>>>> 
>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>>>> 
>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>> 
>>>>> == Merges on trunk ==
>>>>> 
>>>>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>>>> 
>>>>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>>>> 
>>>>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>>>> 
>>>>> == Differences in trees ==
>>>>> 
>>>>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>>> ------------------------------------------------------------------------
>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>>> Changed paths:
>>>>> A /trunk (from /trunk:130802)
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> Reposurgeon conversion has:
>>>>> -------------
>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>> 
>>>>>  Readd trunk
>>>>> 
>>>>>  Legacy-ID: 130805
>>>>> 
>>>>> .gitignore | 17 -----------------
>>>>> 1 file changed, 17 deletions(-)
>>>>> -------------
>>>>> and my conversion has:
>>>>> -------------
>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>>> Author: Daniel Berlin <dberlin@dbrelin.org>
>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>> 
>>>>>  Readd trunk
>>>>> 
>>>>> 
>>>>>  git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>> -------------
>>>>> 
>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>>>> 
>>>>> == Committer entries ==
>>>>> 
>>>>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>>>> 
>>>>> reposurgeon-5a:
>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>>>> 
>>>>> pretty:
>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>>>> 
>>>>> == Bad summary line ==
>>>>> 
>>>>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>>>> 
>>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>>> Author: Chris Fairles <chris.fairles@gmail.com>
>>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>> 
>>>>>  acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>> 
>>>>>  2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>>>> 
>>>>>          * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>          Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>>          * src/Makefile.am: Use it.
>>>>>          * configure: Regenerate.
>>>>>          * configure.in: Likewise.
>>>>>          * Makefile.in: Likewise.
>>>>>          * src/Makefile.in: Likewise.
>>>>>          * libsup++/Makefile.in: Likewise.
>>>>>          * po/Makefile.in: Likewise.
>>>>>          * doc/Makefile.in: Likewise.
>>>>> 
>>>>>  Legacy-ID: 138087
>>>>> 
>>>>> 
>>>>> --
>>>>> Maxim Kuvyrkov
>>>>> https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 15:49                                           ` Maxim Kuvyrkov
@ 2019-12-30 16:08                                             ` Richard Earnshaw (lists)
  2020-01-02  2:59                                               ` Alexandre Oliva
  2020-01-08 20:46                                               ` Maxim Kuvyrkov
  0 siblings, 2 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-30 16:08 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Joseph Myers, Alexandre Oliva, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

On 30/12/2019 15:49, Maxim Kuvyrkov wrote:
>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>>>> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
>>>>>
>>>>> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
>>>>>
>>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
>>>>>
>>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
>>>>>
>>>>
>>>> I don't think either of these conversions are any more ready to use than
>>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>>> major issues to resolve first before they can be considered.
>>>>
>>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>>> release tags, showing the tags as being made directly from trunk with
>>>> massive deltas representing the roll-up of all the commits that were
>>>> made on the gcc-3 release branch.
>>>
>>> I will clarify the above statement, and please correct me where you think I'm wrong.  Gcc-pretty conversion has the exact right parent information for the gcc-3 era
>>> release tags as recorded in SVN version history.  Gcc-pretty conversion aims to produce an exact copy of SVN history in git.  IMO, it manages to do so just fine.
>>>
>>> It is a different thing that SVN history has a screwed up record of gcc-3 era tags.
>>
>> It's not screwed up in svn.  Svn shows the correct history information for the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.
>>
>> For example, looking at gcc_3_0_release in expr.c with git blame and svn blame shows
> 
> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and from different revisions of this branch!).
> 
> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c \|/tags/gcc_3_0_release/gcc/reload.c "
>    A /tags/gcc_3_0_release (from /trunk:39596)
>    R /tags/gcc_3_0_release/gcc/expr.c (from /branches/gcc-3_0-branch/gcc/expr.c:43255)
>    R /tags/gcc_3_0_release/gcc/reload.c (from /branches/gcc-3_0-branch/gcc/reload.c:42007)
> 

Right, (and wrong).  You have to understand how the release branches and
tags are represented in CVS to understand why the SVN conversion is done
this way.  When a branch was created in CVS a tag was added to each
commit which would then be used in any future revisions along that
branch.  But until a commit is made on that branch, the release branch
is just a placeholder.

When a CVS release tag is created, the tag labels the relevant commit
that is to be used.  If that commit is unchanged from the trunk revision
(no commit on the branch), then that is what gets labelled, and it
*appears* to still come from trunk - but that does not matter, since it
is the same as the version on trunk.

The svn copy operations are formed from this set of information by
copying the SVN revision of trunk that applied at the point the branch
was made, and then overriding the copy information for each file that
was then modified on the branch with information about that copy.  This
is sufficient for svn to fully understand the history information for
each and every file in the tag.

Unfortunately, git-svn mis-interprets this when building its graph of
what happened and while it copies the right *content* into the release
branch, it does not copy the right *history*.  The SVN R operation
copies the history from named revision, not just the content.  That's
the significant difference between the two.

R
> IMO, from such history (absent external knowledge about better reparenting options) the best choice for parent branch is /trunk@39596, not /branches/gcc-3_0-branch at a random revision from the replaced files.
> 
> Still, I see your point, and I will fix reparenting support.  Whether GCC community opts to reparent or not reparent is a different topic.
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
> 
>> git blame expr.c:
>>
>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)         return temp;
>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       }
>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     /* Copy the address into a pseudo, so that the returned value
>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)        remains correct across calls to emit_queue.  */
>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>> 59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     return new;
>>
>> git log 5fbf0b0d5828
>> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
>> Author: no-author <no-author@gcc.gnu.org>
>> Date:   Sun Jun 17 19:44:25 2001 +0000
>>
>>    This commit was manufactured by cvs2svn to create tag
>>    'gcc_3_0_release'.
>>
>> while svn blame expr.c correctly shows:
>>
>>   386     kenner             return temp;
>>   386     kenner           }
>> 42209     bernds         /* Copy the address into a pseudo, so that the returned value
>> 42209     bernds            remains correct across calls to emit_queue.  */
>> 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>>  6375     kenner         return new;
>>
>> svn log -r42209 ^/
>> ------------------------------------------------------------------------
>> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines
>>
>> Fix queueing-related bugs
>>
>> In other words, svn can correctly track the files that were modified on the release branch, while the git conversion looses that information, rolling up all the diffs on the release branch into a single unattributed commit.
>>
>> As I said, gcc-reparent is better in this regard, but there are still artefacts from conversion, such as incorrect merge records, that show up.
>>
>> R.
>>
>>>
>>>>
>>>> gcc-reparent is better, but many (most?) of the release tags are shown
>>>> as merge commits with a fake parent back to the gcc-3 branch point,
>>>> which is certainly not what happened when the tagging was done at that
>>>> time.
>>>
>>> I agree with you here.
>>>
>>>>
>>>> Both of these factually misrepresent the history at the time of the
>>>> release tag being made.
>>>
>>> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time.  I would argue their continued brokenness is not a show-stopper.
>>>
>>> Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to
>>> 1. Convert entire SVN history to git.
>>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries.
>>>
>>> Gcc-pretty does (1) in entirety.
>>>
>>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents.  I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty.
>>>
>>>>
>>>> As for converting my script to work with your tools, I'm afraid I don't
>>>> have time to work on that right now.  I'm still bogged down validating
>>>> the incorrect bug ids that the script has identified for some commits.
>>>> I'm making good progress (we're down to 160 unreviewed commits now), but
>>>> it is still going to take what time I have over the next week to
>>>> complete that task.
>>>>
>>>> Furthermore, there is no documentation on how your conversion scripts
>>>> work, so it is not possible for me to test any work I might do in order
>>>> to validate such changes.  Not being able to run the script locally to
>>>> test change would be a non-starter.
>>>>
>>>> You are welcome, of course, to clone the script I have and attempt to
>>>> modify it yourself, it's reasonably well documented.  The sources can be
>>>> found in esr's gcc-conversion repository here:
>>>> https://gitlab.com/esr/gcc-conversion.git
>>>
>>> --
>>> Maxim Kuvyrkov
>>> https://www.linaro.org
>>>
>>>>
>>>>
>>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
>>>>>
>>>>> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
>>>>>
>>>>>
>>>>> === Missed merges ===
>>>>>
>>>>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>>>>>
>>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>>>> ----
>>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>>>> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>
>>>>>   Merge trunk through to r149768
>>>>>
>>>>>   Legacy-ID: 149804
>>>>>
>>>>> COPYING.RUNTIME                                     |    73 +
>>>>> ChangeLog                                           |   270 +-
>>>>> MAINTAINERS                                         |    19 +-
>>>>> <MANY OTHER FILES>
>>>>> ----
>>>>>
>>>>> at the same time for svn-git scripts we have:
>>>>>
>>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>>>> ----
>>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>>>> Merge: 4970119c20da 3a69b1e566a7
>>>>> Author: Richard Earnshaw <rearnsha@arm.com>
>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>
>>>>>   Merge trunk through to r149768
>>>>>
>>>>>   git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>> ----
>>>>>
>>>>> ... which agrees with
>>>>> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>>>> /trunk:142588-149768
>>>>>
>>>>> === Bad author entries ===
>>>>>
>>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
>>>>>
>>>>> === Missed authors ===
>>>>>
>>>>> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
>>>>>
>>>>> Akos Kiss
>>>>> Anders Bertelrud
>>>>> Andrew Pochinsky
>>>>> Anton Hartl
>>>>> Arthur Norman
>>>>> Aymeric Vincent
>>>>>
>>>>> === Conservative author entries ===
>>>>>
>>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
>>>>>
>>>>> Regards,
>>>>>
>>>>> --
>>>>> Maxim Kuvyrkov
>>>>> https://www.linaro.org
>>>>>
>>>>>
>>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>>
>>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>>>> spelling and other mistakes in the commit authors?
>>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>>>>> Jakub Jakub Jelinek (1):
>>>>>>> Jakub Jeilnek (1):
>>>>>>> Jelinek (1):
>>>>>>> entries next to the expected one with most of the commits.
>>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>>>>>> other names and if we have one with many commits and then one with very few
>>>>>>> with small edit distance from those, flag it for human review.
>>>>>>
>>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>>>>>
>>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>>>
>>>>>> == Merges on trunk ==
>>>>>>
>>>>>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>>>>>
>>>>>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>>>>>
>>>>>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>>>>>
>>>>>> == Differences in trees ==
>>>>>>
>>>>>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>>>> ------------------------------------------------------------------------
>>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>>>> Changed paths:
>>>>>> A /trunk (from /trunk:130802)
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> Reposurgeon conversion has:
>>>>>> -------------
>>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>>>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>
>>>>>>  Readd trunk
>>>>>>
>>>>>>  Legacy-ID: 130805
>>>>>>
>>>>>> .gitignore | 17 -----------------
>>>>>> 1 file changed, 17 deletions(-)
>>>>>> -------------
>>>>>> and my conversion has:
>>>>>> -------------
>>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>>>> Author: Daniel Berlin <dberlin@dbrelin.org>
>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>
>>>>>>  Readd trunk
>>>>>>
>>>>>>
>>>>>>  git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>>> -------------
>>>>>>
>>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>>>>>
>>>>>> == Committer entries ==
>>>>>>
>>>>>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>>>>>
>>>>>> reposurgeon-5a:
>>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>>>>>
>>>>>> pretty:
>>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>>>>>
>>>>>> == Bad summary line ==
>>>>>>
>>>>>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>>>>>
>>>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>>>> Author: Chris Fairles <chris.fairles@gmail.com>
>>>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>>>
>>>>>>  acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>>
>>>>>>  2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>>>>>
>>>>>>          * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>>          Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>>>          * src/Makefile.am: Use it.
>>>>>>          * configure: Regenerate.
>>>>>>          * configure.in: Likewise.
>>>>>>          * Makefile.in: Likewise.
>>>>>>          * src/Makefile.in: Likewise.
>>>>>>          * libsup++/Makefile.in: Likewise.
>>>>>>          * po/Makefile.in: Likewise.
>>>>>>          * doc/Makefile.in: Likewise.
>>>>>>
>>>>>>  Legacy-ID: 138087
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Maxim Kuvyrkov
>>>>>> https://www.linaro.org
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 15:36                                             ` Richard Earnshaw (lists)
@ 2019-12-30 22:37                                               ` Segher Boessenkool
  2019-12-30 22:58                                                 ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-30 22:37 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	frnchfrgg

On Mon, Dec 30, 2019 at 03:36:42PM +0000, Richard Earnshaw (lists) wrote:
> On 29/12/2019 23:13, Segher Boessenkool wrote:
> > On Sun, Dec 29, 2019 at 11:00:08PM +0000, Joseph Myers wrote:
> >> fixups in bugdb.py - and that way benefit both from reposurgeon making 
> >> choices that are as conservatively safe as possible, which seems a 
> >> desirable property for problem cases that haven't been manually reviewed, 
> > 
> > Problem cases that haven't been manually reviewed should *be* manually
> > reviewed, or the heuristics improved so there are fewer problem cases.
> > 
> 
> Thank you for offering to help with the checking.
> 
> ;-)

I am telling you what you (imo) need to do at a minimum to make your
candidate conversion acceptable, if it has the problems you say it has.

To make it not be super much work, I'd do the second option: better
heuristics.  Those in Maxim's conversion have been great since over half
a year, you could borrow some, or peek for inspiration?

I have no interest in improving another candidate conversion, as I'm sure
you realise.  And I'm supposed to have time off now ;-)

If you guys want to ever finish, you'll need to drop the quest for
perfection, because this leads to a) much more work, and b) worse quality
in the end.  And before you protest, please look at the evidence again.
*Your own* evidence.

HTH, this is supposed to be constructive, not a flame,

Best wishes,

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 22:37                                               ` Segher Boessenkool
@ 2019-12-30 22:58                                                 ` Joseph Myers
  2019-12-31  0:23                                                   ` Segher Boessenkool
  2019-12-31  3:09                                                   ` Eric S. Raymond
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-30 22:58 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Earnshaw (lists),
	Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	frnchfrgg

On Mon, 30 Dec 2019, Segher Boessenkool wrote:

> To make it not be super much work, I'd do the second option: better
> heuristics.  Those in Maxim's conversion have been great since over half
> a year, you could borrow some, or peek for inspiration?

Actually, comparing authors between the two conversions shows plenty of 
places where the more aggressive ChangeLog extraction in Maxim's 
conversion has produced less good attributions than reposurgeon (e.g. 
attributing merges to some random author from a ChangeLog modified in the 
merge, rather than to the committer of the merge, or attributing fixes in 
a ChangeLog to the author of a random entry that got fixed), as well as 
places where it's simply failed to extract an author from a ChangeLog that 
reposurgeon has extracted.  So for "great", read "have some good ideas to 
learn from, but plenty of places with problems as well".

I'm working on more detailed comparison of authors with some more 
heuristics to help identify the most interesting cases for manual 
inspection (those where it's more likely Maxim's heuristics are finding 
valid authors reposurgeon didn't) and separate those from cases where 
different subjective choices were made (e.g. of how to assign an author 
when one person backports another's patch, or multi-author commits where 
one conversion chose one author as the main one and the other conversion 
chose the other author).

> If you guys want to ever finish, you'll need to drop the quest for
> perfection, because this leads to a) much more work, and b) worse quality
> in the end.

To me, that indicates that using a conversion tool that is conservative in 
its heuristics, and then selectively applying improvements to the extent 
they can be done safely with manual review in a reasonable time, is better 
than applying a conversion tool with more aggressive heuristics.

The issues with the reposurgeon conversion listed in Maxim's last comments 
were of the form "reposurgeon is being conservative in how it generates 
metadata from SVN information".  I think that's a very good basis for 
adding on a limited set of safe improvements to authors and commit 
messages that can be done reasonably soon and then doing the final 
conversion with reposurgeon.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 22:58                                                 ` Joseph Myers
@ 2019-12-31  0:23                                                   ` Segher Boessenkool
  2019-12-31 12:48                                                     ` Segher Boessenkool
  2019-12-31  3:09                                                   ` Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-31  0:23 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Earnshaw (lists),
	Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	frnchfrgg

On Mon, Dec 30, 2019 at 10:58:05PM +0000, Joseph Myers wrote:
> > If you guys want to ever finish, you'll need to drop the quest for
> > perfection, because this leads to a) much more work, and b) worse quality
> > in the end.
> 
> To me, that indicates that using a conversion tool that is conservative in 
> its heuristics, and then selectively applying improvements to the extent 
> they can be done safely with manual review in a reasonable time, is better 
> than applying a conversion tool with more aggressive heuristics.

Then you need to just completely drop this, and always use
<username@gcc.gnu.org>, because a large percentage will get that anyway
then.  Which is fine with me, fwiw: it's correct, and it's a little
inconvenient perhaps, but it doesn't really make the result less usable
at all.

Precisely like weird merges on svn tags that aren't even on a branch.
Perfect is the enemy of ever getting a conversion done.

> The issues with the reposurgeon conversion listed in Maxim's last comments 
> were of the form "reposurgeon is being conservative in how it generates 
> metadata from SVN information".  I think that's a very good basis for 
> adding on a limited set of safe improvements to authors and commit 
> messages that can be done reasonably soon and then doing the final 
> conversion with reposurgeon.

No, we want to *see* why it would be better than the alternatives, what
the differences are.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 22:58                                                 ` Joseph Myers
  2019-12-31  0:23                                                   ` Segher Boessenkool
@ 2019-12-31  3:09                                                   ` Eric S. Raymond
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2019-12-31  3:09 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Segher Boessenkool, Richard Earnshaw (lists),
	Maxim Kuvyrkov, GCC Development, Alexandre Oliva, Jeff Law,
	Mark Wielaard, Jakub Jelinek, frnchfrgg

Joseph Myers <joseph@codesourcery.com>:
> To me, that indicates that using a conversion tool that is conservative in 
> its heuristics, and then selectively applying improvements to the extent 
> they can be done safely with manual review in a reasonable time, is better 
> than applying a conversion tool with more aggressive heuristics.

There's a more general point here, which I'm developing in my
book-in-progress.

Clean data-conversion problems can be done algorithmically without a
human in the loop.  Messy data-conversion problems need judgment
amplifiers.

Maxim's scripts try to treat a messy conversion problem as though it
were a clean one. Maxim is pretty sharp, so this almost works. Almost.
But the failure mode is predictable - overinterpreting badly-formed
input leads to plausible garbage on output.  

When this happens, it's the Goddess Eris's way of telling you that
there needs to be human judgment in the loop.  Instead of trying to
automate it out, you should be building tools that partion the process 
into things a computer does well, driven by choices a human makes well.

This is a point that needs making because programmers thrown at messy
conversion problems tend to be more fixated on achieving full
automation than they perhaps ought to be.

Elswhere I have written of Zeno tarpits:
http://esr.ibiblio.org/?p=6772 Subversion dump streams are not quite a
Zeno tarpit - they actually obey something that has the effect of a
formal specification - but ChangeLog parsing is.

> The issues with the reposurgeon conversion listed in Maxim's last comments 
> were of the form "reposurgeon is being conservative in how it generates 
> metadata from SVN information".  I think that's a very good basis for 
> adding on a limited set of safe improvements to authors and commit 
> messages that can be done reasonably soon and then doing the final 
> conversion with reposurgeon.

The flip side of this is that Joseph has been making intelligent and
realistic suggestions for how to improve reposurgeon.  That is
*invaluable* - it captures knowledge that will make future comparisons
easier and better.

Software engineers (outside of a few AI specialists) don't ordinarily
think of themselves as being in the knowledge-capture business. But
it's a useful perspective to cultivate.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-31  0:23                                                   ` Segher Boessenkool
@ 2019-12-31 12:48                                                     ` Segher Boessenkool
  0 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-31 12:48 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Earnshaw (lists),
	Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	frnchfrgg

On Mon, Dec 30, 2019 at 06:23:16PM -0600, Segher Boessenkool wrote:
> > To me, that indicates that using a conversion tool that is conservative in 
> > its heuristics, and then selectively applying improvements to the extent 
> > they can be done safely with manual review in a reasonable time, is better 
> > than applying a conversion tool with more aggressive heuristics.
> 
> Then you need to just completely drop this, and always use
> <username@gcc.gnu.org>, because a large percentage will get that anyway
> then.  Which is fine with me, fwiw: it's correct, and it's a little
> inconvenient perhaps, but it doesn't really make the result less usable
> at all.
> 
> Precisely like weird merges on svn tags that aren't even on a branch.
> Perfect is the enemy of ever getting a conversion done.

Oh, and let me add:

$ date -d "aug 20 2015 + 1600 days"

That is how long this reposurgeon obstinence has delayed us so far.

Happy turn of the year everyone,


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-16 16:47                   ` Jeff Law
@ 2019-12-31 13:43                     ` Joseph Myers
  2019-12-31 14:13                       ` Richard Earnshaw (lists)
  2019-12-31 17:26                       ` Segher Boessenkool
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2019-12-31 13:43 UTC (permalink / raw)
  To: Jeff Law
  Cc: esr, Segher Boessenkool, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Mon, 16 Dec 2019, Jeff Law wrote:

> > Joseph Myers has made his choice.  He has said repeatedly that he
> > wants to follow through with the reposurgeon conversion, and he's
> > putting his effort behind that by writing tests and even contributing
> > code to reposurgeon.
> > 
> > We'll get this done faster if nobody is joggling his elbow. Or mine.
> And just to be clear, my preference is for reposurgeon, if it's ready. 
> But if it isn't, then I'm absolutely comfortable dropping back to
> Maxim's conversion or even the existing mirror.

As the remaining changes being made to the reposurgeon conversion are of 
the form "tidy things up where reposurgeon is already making a reasonable 
conservative choice but minor improvements are still possible", I think 
it's very clearly ready and propose doing the actual conversion with 
reposurgeon over the weekend of 11/12 January [*], with whatever 
improvements to commit messages and authors are ready by then.  (People 
would then be free to note issues found afterwards, with the potential to 
address them if some other version control system takes over from git in 
20 years' time, just as some issues from cvs2svn are being addressed in 
this conversion.)

This is explicitly not aiming for perfection, but saying that having some 
improved commit message summaries and authors, based on a combination of 
sufficiently safe heuristics and manual review of cases heuristics suggest 
may be questionable and that can be reviewed in time, without trying to 
have the best possible commit message or author in every case, is better 
than falling back to only the original commit messages and only using the 
committer as the author.

[*] The time taken by a reposurgeon conversion is actually dominated by 
time spent in git (git-fast-import takes about four hours to import the 
repository, git gc --aggressive takes over an hour to repack it 
afterwards) and in validation against SVN, not in reposurgeon itself, so 
doesn't take a whole weekend, but there will be other things such as hook 
setup and testing and documenting usage of the repository.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-31 13:43                     ` Joseph Myers
@ 2019-12-31 14:13                       ` Richard Earnshaw (lists)
  2019-12-31 17:26                       ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2019-12-31 14:13 UTC (permalink / raw)
  To: Joseph Myers, Jeff Law
  Cc: esr, Segher Boessenkool, Mark Wielaard, Maxim Kuvyrkov, gcc

On 31/12/2019 13:42, Joseph Myers wrote:
> On Mon, 16 Dec 2019, Jeff Law wrote:
> 
>>> Joseph Myers has made his choice.  He has said repeatedly that he
>>> wants to follow through with the reposurgeon conversion, and he's
>>> putting his effort behind that by writing tests and even contributing
>>> code to reposurgeon.
>>>
>>> We'll get this done faster if nobody is joggling his elbow. Or mine.
>> And just to be clear, my preference is for reposurgeon, if it's ready. 
>> But if it isn't, then I'm absolutely comfortable dropping back to
>> Maxim's conversion or even the existing mirror.
> 
> As the remaining changes being made to the reposurgeon conversion are of 
> the form "tidy things up where reposurgeon is already making a reasonable 
> conservative choice but minor improvements are still possible", I think 
> it's very clearly ready and propose doing the actual conversion with 
> reposurgeon over the weekend of 11/12 January [*], with whatever 
> improvements to commit messages and authors are ready by then.  (People 
> would then be free to note issues found afterwards, with the potential to 
> address them if some other version control system takes over from git in 
> 20 years' time, just as some issues from cvs2svn are being addressed in 
> this conversion.)
> 
> This is explicitly not aiming for perfection, but saying that having some 
> improved commit message summaries and authors, based on a combination of 
> sufficiently safe heuristics and manual review of cases heuristics suggest 
> may be questionable and that can be reviewed in time, without trying to 
> have the best possible commit message or author in every case, is better 
> than falling back to only the original commit messages and only using the 
> committer as the author.
> 
> [*] The time taken by a reposurgeon conversion is actually dominated by 
> time spent in git (git-fast-import takes about four hours to import the 
> repository, git gc --aggressive takes over an hour to repack it 
> afterwards) and in validation against SVN, not in reposurgeon itself, so 
> doesn't take a whole weekend, but there will be other things such as hook 
> setup and testing and documenting usage of the repository.
> 

We can develop and test the hook setup on one of the trial repositories.
 In fact, we could probably open one of them up to allow commits from
the community on the understanding that all such commits are for testing
purposes only and will be lost during the final conversion.

That will give folk an opportunity to test their own local setups so
that when the switch does occur they are well prepared.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-31 13:43                     ` Joseph Myers
  2019-12-31 14:13                       ` Richard Earnshaw (lists)
@ 2019-12-31 17:26                       ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2019-12-31 17:26 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jeff Law, esr, Mark Wielaard, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	gcc

On Tue, Dec 31, 2019 at 01:42:55PM +0000, Joseph Myers wrote:
> As the remaining changes being made to the reposurgeon conversion are of 
> the form "tidy things up where reposurgeon is already making a reasonable 
> conservative choice but minor improvements are still possible", I think 
> it's very clearly ready and propose doing the actual conversion with 
> reposurgeon over the weekend of 11/12 January [*], with whatever 
> improvements to commit messages and authors are ready by then.

I propose following the original plan, instead, and choosing the best
conversion that *exist* today, or at whatever later date we choose.

Or we can use what we had ready over half a year ago, which was a Fine
conversion already.  Or what we had *over ten years ago*, a repo made
as a plain git-svn mirror, which is perfectly serviceable as well.  (We
*know* it is, many of us have used it daily for that long).

Switching to a conversion that is different, in some ways better, sure,
but in some ways worse as well, is not a good idea imo.  *Especially*
since we asked many times for an evaluation where it is worse or better,
but nothing is forthcoming, we are just asked to accept on blind faith
that it is better, all evidence to the contrary notwithstanding.

Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 16:08                                             ` Richard Earnshaw (lists)
@ 2020-01-02  2:59                                               ` Alexandre Oliva
  2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
  2020-01-08 20:46                                               ` Maxim Kuvyrkov
  1 sibling, 1 reply; 198+ messages in thread
From: Alexandre Oliva @ 2020-01-02  2:59 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, GCC Development, Joseph Myers, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

On Dec 30, 2019, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:

> Right, (and wrong).  You have to understand how the release branches and
> tags are represented in CVS to understand why the SVN conversion is done
> this way.

I'm curious and ignorant, is the convoluted representation that Maxim
described what SVN normally uses for tree copies, that any conversion
tool from SVN to GIT thus ought to be able to figure out, or is it just
an unusual artifact of the conversion from CVS to SVN, that we'd like to
fix in the conversion from SVN to GIT with some specialized recovery for
such errors in repos poorly converted from CVS?

Thanks in advance for cluing me in,

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-02  2:59                                               ` Alexandre Oliva
@ 2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2020-01-02 10:58 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Maxim Kuvyrkov, GCC Development, Joseph Myers, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

On 02/01/2020 02:58, Alexandre Oliva wrote:
> On Dec 30, 2019, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:
> 
>> Right, (and wrong).  You have to understand how the release branches and
>> tags are represented in CVS to understand why the SVN conversion is done
>> this way.
> 
> I'm curious and ignorant, is the convoluted representation that Maxim
> described what SVN normally uses for tree copies, that any conversion
> tool from SVN to GIT thus ought to be able to figure out, or is it just
> an unusual artifact of the conversion from CVS to SVN, that we'd like to
> fix in the conversion from SVN to GIT with some specialized recovery for
> such errors in repos poorly converted from CVS?
> 
> Thanks in advance for cluing me in,
> 

I think it mostly comes from cvs2svn.  You probably could manufacture
something similar directly in SVN, but you'd have to try very hard to
create such brain damage.  Some thing like

svn cp ^/trunk ^/branches/foo
svn rm -f ^/branches/foo/fred.c
svn cp ^/branches/bar/fred ^/branches/foo/fred.c
...
svn ci

Which would create a copy of trunk in foo with a copy of fred.c from the
bar branch etc.

Normal SVN copies to a branch use a simple node copy of the top-level
directory, which is why branching in SVN is cheap (essentially O(1) in
time).

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2019-12-30 16:08                                             ` Richard Earnshaw (lists)
  2020-01-02  2:59                                               ` Alexandre Oliva
@ 2020-01-08 20:46                                               ` Maxim Kuvyrkov
  2020-01-08 22:11                                                 ` Eric S. Raymond
  1 sibling, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2020-01-08 20:46 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: GCC Development, Joseph Myers, Alexandre Oliva, Eric S. Raymond,
	Jeff Law, Segher Boessenkool, Mark Wielaard, Jakub Jelinek

> On Dec 30, 2019, at 7:08 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 30/12/2019 15:49, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>> 
>>> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>>> 
>>>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>>>>> Below are several more issues I found in reposurgeon-6a conversion comparing it against gcc-reparent conversion.
>>>>>> 
>>>>>> I am sure, these and whatever other problems I may find in the reposurgeon conversion can be fixed in time.  However, I don't see why should bother.  My conversion has been available since summer 2019, I made it ready in time for GCC Cauldron 2019, and it didn't change in any significant way since then.
>>>>>> 
>>>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon conversion can be considered "ready".  Also, I expected a diligent developer to compare new conversion (aka reposurgeon's) against existing conversion (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even "ready".  The data I'm seeing in differences between my and reposurgeon conversions shows that gcc-reparent conversion is /better/.
>>>>>> 
>>>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent conversion.  I welcome Richard E. to modify his summary scripts to work with svn-git scripts, which should be straightforward, and I'm ready to help.
>>>>>> 
>>>>> 
>>>>> I don't think either of these conversions are any more ready to use than
>>>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>>>> major issues to resolve first before they can be considered.
>>>>> 
>>>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>>>> release tags, showing the tags as being made directly from trunk with
>>>>> massive deltas representing the roll-up of all the commits that were
>>>>> made on the gcc-3 release branch.
>>>> 
>>>> I will clarify the above statement, and please correct me where you think I'm wrong.  Gcc-pretty conversion has the exact right parent information for the gcc-3 era
>>>> release tags as recorded in SVN version history.  Gcc-pretty conversion aims to produce an exact copy of SVN history in git.  IMO, it manages to do so just fine.
>>>> 
>>>> It is a different thing that SVN history has a screwed up record of gcc-3 era tags.
>>> 
>>> It's not screwed up in svn.  Svn shows the correct history information for the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.
>>> 
>>> For example, looking at gcc_3_0_release in expr.c with git blame and svn blame shows
>> 
>> In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and from different revisions of this branch!).
>> 
>> $ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep "/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c \|/tags/gcc_3_0_release/gcc/reload.c "
>>   A /tags/gcc_3_0_release (from /trunk:39596)
>>   R /tags/gcc_3_0_release/gcc/expr.c (from /branches/gcc-3_0-branch/gcc/expr.c:43255)
>>   R /tags/gcc_3_0_release/gcc/reload.c (from /branches/gcc-3_0-branch/gcc/reload.c:42007)
>> 
> 
> Right, (and wrong).  You have to understand how the release branches and
> tags are represented in CVS to understand why the SVN conversion is done
> this way.  When a branch was created in CVS a tag was added to each
> commit which would then be used in any future revisions along that
> branch.  But until a commit is made on that branch, the release branch
> is just a placeholder.
> 
> When a CVS release tag is created, the tag labels the relevant commit
> that is to be used.  If that commit is unchanged from the trunk revision
> (no commit on the branch), then that is what gets labelled, and it
> *appears* to still come from trunk - but that does not matter, since it
> is the same as the version on trunk.
> 
> The svn copy operations are formed from this set of information by
> copying the SVN revision of trunk that applied at the point the branch
> was made, and then overriding the copy information for each file that
> was then modified on the branch with information about that copy.  This
> is sufficient for svn to fully understand the history information for
> each and every file in the tag.
> 
> Unfortunately, git-svn mis-interprets this when building its graph of
> what happened and while it copies the right *content* into the release
> branch, it does not copy the right *history*.  The SVN R operation
> copies the history from named revision, not just the content.  That's
> the significant difference between the two.
> 
> R
>> IMO, from such history (absent external knowledge about better reparenting options) the best choice for parent branch is /trunk@39596, not /branches/gcc-3_0-branch at a random revision from the replaced files.
>> 
>> Still, I see your point, and I will fix reparenting support.  Whether GCC community opts to reparent or not reparent is a different topic.

I've added proper reparenting support to svn-git scripts, and gcc-reparent will be updated in a day or so.  I've also added a few minor improvements and fixed things that Joseph pointed out in my conversion.

Once gcc-reparent conversion is regenerated, I'll do another round of comparisons between it and whatever the latest reposurgeon version is.

--
Maxim Kuvyrkov
https://www.linaro.org

>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>> 
>>> git blame expr.c:
>>> 
>>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)         return temp;
>>> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       }
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     /* Copy the address into a pseudo, so that the returned value
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)        remains correct across calls to emit_queue.  */
>>> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>>> 59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     return new;
>>> 
>>> git log 5fbf0b0d5828
>>> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
>>> Author: no-author <no-author@gcc.gnu.org>
>>> Date:   Sun Jun 17 19:44:25 2001 +0000
>>> 
>>>   This commit was manufactured by cvs2svn to create tag
>>>   'gcc_3_0_release'.
>>> 
>>> while svn blame expr.c correctly shows:
>>> 
>>>  386     kenner             return temp;
>>>  386     kenner           }
>>> 42209     bernds         /* Copy the address into a pseudo, so that the returned value
>>> 42209     bernds            remains correct across calls to emit_queue.  */
>>> 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>>> 6375     kenner         return new;
>>> 
>>> svn log -r42209 ^/
>>> ------------------------------------------------------------------------
>>> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines
>>> 
>>> Fix queueing-related bugs
>>> 
>>> In other words, svn can correctly track the files that were modified on the release branch, while the git conversion looses that information, rolling up all the diffs on the release branch into a single unattributed commit.
>>> 
>>> As I said, gcc-reparent is better in this regard, but there are still artefacts from conversion, such as incorrect merge records, that show up.
>>> 
>>> R.
>>> 
>>>> 
>>>>> 
>>>>> gcc-reparent is better, but many (most?) of the release tags are shown
>>>>> as merge commits with a fake parent back to the gcc-3 branch point,
>>>>> which is certainly not what happened when the tagging was done at that
>>>>> time.
>>>> 
>>>> I agree with you here.
>>>> 
>>>>> 
>>>>> Both of these factually misrepresent the history at the time of the
>>>>> release tag being made.
>>>> 
>>>> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time.  I would argue their continued brokenness is not a show-stopper.
>>>> 
>>>> Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to
>>>> 1. Convert entire SVN history to git.
>>>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries.
>>>> 
>>>> Gcc-pretty does (1) in entirety.
>>>> 
>>>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents.  I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty.
>>>> 
>>>>> 
>>>>> As for converting my script to work with your tools, I'm afraid I don't
>>>>> have time to work on that right now.  I'm still bogged down validating
>>>>> the incorrect bug ids that the script has identified for some commits.
>>>>> I'm making good progress (we're down to 160 unreviewed commits now), but
>>>>> it is still going to take what time I have over the next week to
>>>>> complete that task.
>>>>> 
>>>>> Furthermore, there is no documentation on how your conversion scripts
>>>>> work, so it is not possible for me to test any work I might do in order
>>>>> to validate such changes.  Not being able to run the script locally to
>>>>> test change would be a non-starter.
>>>>> 
>>>>> You are welcome, of course, to clone the script I have and attempt to
>>>>> modify it yourself, it's reasonably well documented.  The sources can be
>>>>> found in esr's gcc-conversion repository here:
>>>>> https://gitlab.com/esr/gcc-conversion.git
>>>> 
>>>> --
>>>> Maxim Kuvyrkov
>>>> https://www.linaro.org
>>>> 
>>>>> 
>>>>> 
>>>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent conversion to bring in "missing" branches (the ones, which don't share history with trunk@1) and restart daily updates of gcc-reparent conversion.
>>>>>> 
>>>>>> Finally, with the comparison data I have, I consider statements about git-svn's poor quality to be very misleading.  Git-svn may have had serious bugs years ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot of development has happened and many problems have been fixed since them.  At the moment it is reposurgeon that is producing conversions with obscure mistakes in repository metadata.
>>>>>> 
>>>>>> 
>>>>>> === Missed merges ===
>>>>>> 
>>>>>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>>>>>> 
>>>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>>>>> ----
>>>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>>>>> Author: Richard Earnshaw <rearnsha@gcc.gnu.org>
>>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>> 
>>>>>>  Merge trunk through to r149768
>>>>>> 
>>>>>>  Legacy-ID: 149804
>>>>>> 
>>>>>> COPYING.RUNTIME                                     |    73 +
>>>>>> ChangeLog                                           |   270 +-
>>>>>> MAINTAINERS                                         |    19 +-
>>>>>> <MANY OTHER FILES>
>>>>>> ----
>>>>>> 
>>>>>> at the same time for svn-git scripts we have:
>>>>>> 
>>>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>>>>> ----
>>>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>>>>> Merge: 4970119c20da 3a69b1e566a7
>>>>>> Author: Richard Earnshaw <rearnsha@arm.com>
>>>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>>>> 
>>>>>>  Merge trunk through to r149768
>>>>>> 
>>>>>>  git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>>> ----
>>>>>> 
>>>>>> ... which agrees with
>>>>>> $ svn propget svn:mergeinfo file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>>>>> /trunk:142588-149768
>>>>>> 
>>>>>> === Bad author entries ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely to start with a digit.
>>>>>> 
>>>>>> === Missed authors ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion misses many authors, below is a list of people with names starting with "A".
>>>>>> 
>>>>>> Akos Kiss
>>>>>> Anders Bertelrud
>>>>>> Andrew Pochinsky
>>>>>> Anton Hartl
>>>>>> Arthur Norman
>>>>>> Aymeric Vincent
>>>>>> 
>>>>>> === Conservative author entries ===
>>>>>> 
>>>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits where svn-git conversion manages to extract valid email from commit data.  This happens for hundreds of author entries.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> --
>>>>>> Maxim Kuvyrkov
>>>>>> https://www.linaro.org
>>>>>> 
>>>>>> 
>>>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>>> 
>>>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>>>>> spelling and other mistakes in the commit authors?
>>>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>>>>>> Jakub Jakub Jelinek (1):
>>>>>>>> Jakub Jeilnek (1):
>>>>>>>> Jelinek (1):
>>>>>>>> entries next to the expected one with most of the commits.
>>>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>>>>>>>> other names and if we have one with many commits and then one with very few
>>>>>>>> with small edit distance from those, flag it for human review.
>>>>>>> 
>>>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and gcc-reparent conversions.  It ignores 1-3 character differences in author/committer names and email addresses.  I've audited results for all branches and didn't spot any mistakes.
>>>>>>> 
>>>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and gcc-reposurgeon-5a repos among themselves.  Below are current notes for comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>>>> 
>>>>>>> == Merges on trunk ==
>>>>>>> 
>>>>>>> Reposurgeon creates merge entries on trunk when changes from a branch are merged into trunk.  This brings entire development history from the branch to trunk, which is both good and bad.  The good part is that we get more visibility into how the code evolved.  The bad part is that we get many "noisy" commits from merged branch (e.g., "Merge in trunk" every few revisions) and that our SVN branches are work-in-progress quality, not ready for review/commit quality.  It's common for files to be re-written in large chunks on branches.
>>>>>>> 
>>>>>>> Also, reposurgeon's commit logs don't have information on SVN path from which the change came, so there is no easy way to determine that a given commit is from a merged branch, not an original trunk commit.  Git-svn, on the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit logs.
>>>>>>> 
>>>>>>> My conversion follows current GCC development policy that trunk history should be linear.  Branch merges to trunk are squashed.  Merges between non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>>>>>> 
>>>>>>> == Differences in trees ==
>>>>>>> 
>>>>>>> Git trees (aka filesystem content) match between pretty/trunk and reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>>>>> ------------------------------------------------------------------------
>>>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>>>>> Changed paths:
>>>>>>> A /trunk (from /trunk:130802)
>>>>>>> ------------------------------------------------------------------------
>>>>>>> 
>>>>>>> Reposurgeon conversion has:
>>>>>>> -------------
>>>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>>>>> Author: Daniel Berlin <dberlin@gcc.gnu.org>
>>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>> 
>>>>>>> Readd trunk
>>>>>>> 
>>>>>>> Legacy-ID: 130805
>>>>>>> 
>>>>>>> .gitignore | 17 -----------------
>>>>>>> 1 file changed, 17 deletions(-)
>>>>>>> -------------
>>>>>>> and my conversion has:
>>>>>>> -------------
>>>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>>>>> Author: Daniel Berlin <dberlin@dbrelin.org>
>>>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>>>> 
>>>>>>> Readd trunk
>>>>>>> 
>>>>>>> 
>>>>>>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>>>> -------------
>>>>>>> 
>>>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then deleted at r130805.  In SVN repository .gitignore was added in r195087.  I speculate that addition of .gitignore at r1 is expected, but it's deletion at r130805 is highly suspicious.
>>>>>>> 
>>>>>>> == Committer entries ==
>>>>>>> 
>>>>>>> Reposurgeon uses $user@gcc.gnu.org for committer email addresses even when it correctly detects author name from ChangeLog.
>>>>>>> 
>>>>>>> reposurgeon-5a:
>>>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <marxin@gcc.gnu.org>
>>>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozefl@gcc.gnu.org>
>>>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@gcc.gnu.org>
>>>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <gjl@gcc.gnu.org>
>>>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenth@gcc.gnu.org>
>>>>>>> 
>>>>>>> pretty:
>>>>>>> r278995 Martin Liska <mliska@suse.cz> Martin Liska <mliska@suse.cz>
>>>>>>> r278994 Jozef Lawrynowicz <jozef.l@mittosystems.com> Jozef Lawrynowicz <jozef.l@mittosystems.com>
>>>>>>> r278993 Frederik Harwath <frederik@codesourcery.com> Frederik Harwath <frederik@codesourcery.com>
>>>>>>> r278992 Georg-Johann Lay <avr@gjlay.de> Georg-Johann Lay <avr@gjlay.de>
>>>>>>> r278991 Richard Biener <rguenther@suse.de> Richard Biener <rguenther@suse.de>
>>>>>>> 
>>>>>>> == Bad summary line ==
>>>>>>> 
>>>>>>> While looking around r138087, below caught my eye.  Is the contents of summary line as expected?
>>>>>>> 
>>>>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>>>>> Author: Chris Fairles <chris.fairles@gmail.com>
>>>>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>>>> 
>>>>>>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>>> 
>>>>>>> 2008-07-23  Chris Fairles <chris.fairles@gmail.com>
>>>>>>> 
>>>>>>>         * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>>>>         Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>>>>         * src/Makefile.am: Use it.
>>>>>>>         * configure: Regenerate.
>>>>>>>         * configure.in: Likewise.
>>>>>>>         * Makefile.in: Likewise.
>>>>>>>         * src/Makefile.in: Likewise.
>>>>>>>         * libsup++/Makefile.in: Likewise.
>>>>>>>         * po/Makefile.in: Likewise.
>>>>>>>         * doc/Makefile.in: Likewise.
>>>>>>> 
>>>>>>> Legacy-ID: 138087
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Maxim Kuvyrkov
>>>>>>> https://www.linaro.org
>> 
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-08 20:46                                               ` Maxim Kuvyrkov
@ 2020-01-08 22:11                                                 ` Eric S. Raymond
  2020-01-08 23:34                                                   ` Joseph Myers
  0 siblings, 1 reply; 198+ messages in thread
From: Eric S. Raymond @ 2020-01-08 22:11 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Richard Earnshaw (lists),
	GCC Development, Joseph Myers, Alexandre Oliva, Jeff Law,
	Segher Boessenkool, Mark Wielaard, Jakub Jelinek

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>:
> Once gcc-reparent conversion is regenerated, I'll do another round of comparisons between it and whatever the latest reposurgeon version is.

Thanks, Maxim. Those comparisons have been very helpful to Joseph and
Richard and to the reposurgeon devteam as well.

They use your feedback to find places where their comment-processing
scripts could be improved; we've used it learn what additional
oddities in ChangeLogs we need to be able to handle automatically.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-08 22:11                                                 ` Eric S. Raymond
@ 2020-01-08 23:34                                                   ` Joseph Myers
  2020-01-09  2:38                                                     ` Segher Boessenkool
  2020-01-09  5:07                                                     ` Jeff Law
  0 siblings, 2 replies; 198+ messages in thread
From: Joseph Myers @ 2020-01-08 23:34 UTC (permalink / raw)
  To: Eric S. Raymond
  Cc: Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Jeff Law, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Wed, 8 Jan 2020, Eric S. Raymond wrote:

> They use your feedback to find places where their comment-processing
> scripts could be improved; we've used it learn what additional
> oddities in ChangeLogs we need to be able to handle automatically.

I've used comparisons of authors in the two conversions - in cases where 
they get different human identities for the author, not just different 
email addresses or name variants - to identify cases for manual review, 
since ChangeLog parsing is the most subjective part of doing a conversion 
and cases where different heuristics produce different results indicate 
those worthy of manual review.

Apart from about 1600 with no changes to ChangeLog files but a ChangeLog 
entry in the commit message, which I reviewed mostly automatically to make 
sure I agreed with Maxim's author extraction with only limited manual 
checks on those that looked like suspect cases, that involved reviewing 
around 3000 commits manually; I've now completed that review.  Some of 
those are also subjective cases even after review (for example, where the 
commit involved one person backporting another person's patch).

In the set of around 1200 commits with both ChangeLog and non-ChangeLog 
files being changed, which did not look like backports, for example, I 
arrived at around 400 author improvements from this review (not all of 
them the same authors as in Maxim's conversion), while for around 800 
commits I concluded the reposurgeon author was preferable.  (The typical 
case where reposurgeon does better is where successive commits add new 
ChangeLog entries under an existing ChangeLog header.  The typical case 
where I added fixes was where a commit made nonsubstantive changes under 
an existing header, as well as adding new entries, which is hard to 
distinguish automatically from a multi-author commit so reposurgeon 
conservatively treats as a multi-author commit.)

In the case of ChangeLog-only commits, where reposurgeon assumes they are 
likely to be fixing typos or similar and so does not extract an 
attribution from ChangeLog files in such commits, manual review identified 
many cases (especially in the earlier parts of the history) where the 
ChangeLog was committed separately from the substantive parts of the patch 
and so a better attribution could be assigned to those substantive 
commits.

I consider the reposurgeon-based conversion machinery to be in essentially 
its final state now; I don't have any further authors to review, Richard 
doesn't have any further Bugzilla-based commit summaries to review and we 
don't know of any relevant reposurgeon bugs or missing features.  I'm 
running a conversion now to verify both the current state of the fixups 
and the Makefile integration of the conversion and subsequent automated 
validation, and will make that converted repository available for final 
checks if this succeeds.  Compared to the previous converted repository, 
this one has many author fixups, a fix for a bug in the author fixups 
where they broke commit dates, and reposurgeon improvements to avoid 
producing unidiomatic empty git commits in the converted repository for 
things such as branch and tag creation.

This converted repository uses the ref rearrangements along the lines 
proposed by Richard (so dead branches and vendor branches are available 
but not fetched by default); the objects from the existing git mirror will 
also be included in the repository (so existing gitweb links to such 
objects in list archives continue to work, for example, as long as they 
aren't links to objects that were made unreachable at some point in the 
mirror's history), but again under ref names that are not fetched by 
default.

As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
and change the SVN hooks to make SVN readonly, then disable gccadmin's 
cron jobs that build snapshots and update online documentation until they 
are ready to run with the git repository.  Once the existing git mirror 
has picked up the last changes I'll make that read-only and disable that 
cron job as well, and start the conversion process with a view to having 
the converted repository in place this weekend (it could either be made 
writable as soon as I think it's ready, or left read-only until people 
have had time to do any final checks on Monday).  Before then, I'll work 
on hooks, documentation and maintainer-scripts updates.

As well as having objects from the existing git mirror available under 
refs that are not fetched by default, that mirror will remain available 
read-only at git://gcc.gnu.org/git/gcc-old.git (which already exists, 
currently a symlink to the mirror).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-08 23:34                                                   ` Joseph Myers
@ 2020-01-09  2:38                                                     ` Segher Boessenkool
  2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
  2020-01-10  7:33                                                       ` Maxim Kuvyrkov
  2020-01-09  5:07                                                     ` Jeff Law
  1 sibling, 2 replies; 198+ messages in thread
From: Segher Boessenkool @ 2020-01-09  2:38 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Jakub Jelinek

On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
> cron jobs that build snapshots and update online documentation until they 
> are ready to run with the git repository.  Once the existing git mirror 
> has picked up the last changes I'll make that read-only and disable that 
> cron job as well, and start the conversion process with a view to having 
> the converted repository in place this weekend (it could either be made 
> writable as soon as I think it's ready, or left read-only until people 
> have had time to do any final checks on Monday).  Before then, I'll work 
> on hooks, documentation and maintainer-scripts updates.

Where and when and by who was it decided to use this conversion?

Will it at least be *tested* first?


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-08 23:34                                                   ` Joseph Myers
  2020-01-09  2:38                                                     ` Segher Boessenkool
@ 2020-01-09  5:07                                                     ` Jeff Law
  2020-01-09 12:30                                                       ` Joseph Myers
  1 sibling, 1 reply; 198+ messages in thread
From: Jeff Law @ 2020-01-09  5:07 UTC (permalink / raw)
  To: Joseph Myers, Eric S. Raymond
  Cc: Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Wed, 2020-01-08 at 23:34 +0000, Joseph Myers wrote:
> 
> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
> cron jobs that build snapshots and update online documentation until they 
> are ready to run with the git repository.  Once the existing git mirror 
> has picked up the last changes I'll make that read-only and disable that 
> cron job as well, and start the conversion process with a view to having 
> the converted repository in place this weekend (it could either be made 
> writable as soon as I think it's ready, or left read-only until people 
> have had time to do any final checks on Monday).  Before then, I'll work 
> on hooks, documentation and maintainer-scripts updates.
Is there any chance we could get one more trunk snapshot before the
conversion starts -- even if that means firing up the snapshot process
Friday?  It'd be quite useful for the ongoing Fedora build testing.

If it's a significant hassle, then don't worry, I'll create one
manually.

Jeff
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09  2:38                                                     ` Segher Boessenkool
@ 2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
  2020-01-09 14:01                                                         ` Eric S. Raymond
  2020-01-11 11:30                                                         ` Segher Boessenkool
  2020-01-10  7:33                                                       ` Maxim Kuvyrkov
  1 sibling, 2 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2020-01-09 12:12 UTC (permalink / raw)
  To: Segher Boessenkool, Joseph Myers
  Cc: Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek

On 09/01/2020 02:38, Segher Boessenkool wrote:
> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
>> and change the SVN hooks to make SVN readonly, then disable gccadmin's
>> cron jobs that build snapshots and update online documentation until they
>> are ready to run with the git repository.  Once the existing git mirror
>> has picked up the last changes I'll make that read-only and disable that
>> cron job as well, and start the conversion process with a view to having
>> the converted repository in place this weekend (it could either be made
>> writable as soon as I think it's ready, or left read-only until people
>> have had time to do any final checks on Monday).  Before then, I'll work
>> on hooks, documentation and maintainer-scripts updates.
> 
> Where and when and by who was it decided to use this conversion?
> 
> Will it at least be *tested* first?
> 
> 
> Segher
> 

Tested for what?  We run many tests on the conversion, for example to 
check that the branch tips are all sane, and many other things as well.

Additionally, Joseph has made many trial conversions available for 
public examination as we've been iterating towards the final result.

FWIW, I now support using reposurgeon for the final conversion.

I want to also take this opportunity to thank Maxim for the work he has 
done.  Having that fallback option has meant that we could press harder 
for a timely solution and has also driven several significant 
improvements to the overall result.  I do not think we would have 
achieved as good a result overall if he hadn't developed his scripts.

R.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09  5:07                                                     ` Jeff Law
@ 2020-01-09 12:30                                                       ` Joseph Myers
  2020-01-10 15:27                                                         ` Joseph Myers
                                                                           ` (2 more replies)
  0 siblings, 3 replies; 198+ messages in thread
From: Joseph Myers @ 2020-01-09 12:30 UTC (permalink / raw)
  To: Jeff Law
  Cc: Eric S. Raymond, Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Wed, 8 Jan 2020, Jeff Law wrote:

> Is there any chance we could get one more trunk snapshot before the
> conversion starts -- even if that means firing up the snapshot process
> Friday?  It'd be quite useful for the ongoing Fedora build testing.

I could run a snapshot manually.  I was planning to run at least one 
snapshot (for some branch) manually *after* the conversion to test the 
conversion of the gcc_release script to use git (in snapshot mode that 
doesn't make any commits so could be done while the git repository is 
still read-only for checking).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
@ 2020-01-09 14:01                                                         ` Eric S. Raymond
  2020-01-11 11:30                                                         ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2020-01-09 14:01 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Segher Boessenkool, Joseph Myers, Maxim Kuvyrkov,
	GCC Development, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Jakub Jelinek

Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>:
> I want to also take this opportunity to thank Maxim for the work he has
> done.  Having that fallback option has meant that we could press harder for
> a timely solution and has also driven several significant improvements to
> the overall result.  I do not think we would have achieved as good a result
> overall if he hadn't developed his scripts.

Yes. Reposurgeon's ChangeLog processing, in particular, was significantly
improved using lessons learned from maxim's scripts.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09  2:38                                                     ` Segher Boessenkool
  2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
@ 2020-01-10  7:33                                                       ` Maxim Kuvyrkov
  2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
                                                                           ` (2 more replies)
  1 sibling, 3 replies; 198+ messages in thread
From: Maxim Kuvyrkov @ 2020-01-10  7:33 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Jakub Jelinek, Segher Boessenkool

> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
>> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
>> cron jobs that build snapshots and update online documentation until they 
>> are ready to run with the git repository.  Once the existing git mirror 
>> has picked up the last changes I'll make that read-only and disable that 
>> cron job as well, and start the conversion process with a view to having 
>> the converted repository in place this weekend (it could either be made 
>> writable as soon as I think it's ready, or left read-only until people 
>> have had time to do any final checks on Monday).  Before then, I'll work 
>> on hooks, documentation and maintainer-scripts updates.
> 
> Where and when and by who was it decided to use this conversion?

Joseph, please point to message on gcc@ mailing list that expresses consensus of GCC community to use reposurgeon conversion.  Otherwise, it is not appropriate to substitute one's opinion for community consensus.

I want GCC community to get the best possible conversion, being it mine or reposurgeon's.  To this end I'm comparing the two conversions and will post my results later today.

Unfortunately, the comparison is complicated by the fact that you uploaded only "b" version of gcc-reposurgeon-8 repository, which uses modified branch layout (or confirm that there are no substantial differences between "7" and "8" reposurgeon conversions).

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10  7:33                                                       ` Maxim Kuvyrkov
@ 2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
  2020-01-10 11:38                                                           ` Richard Biener
  2020-01-11 11:52                                                           ` Segher Boessenkool
  2020-01-10 13:31                                                         ` Bernd Schmidt
  2020-01-10 15:09                                                         ` Maxim Kuvyrkov
  2 siblings, 2 replies; 198+ messages in thread
From: Richard Earnshaw (lists) @ 2020-01-10  9:49 UTC (permalink / raw)
  To: Maxim Kuvyrkov, Joseph Myers
  Cc: Eric S. Raymond, GCC Development, Alexandre Oliva, Jeff Law,
	Mark Wielaard, Jakub Jelinek, Segher Boessenkool

On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
>> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>
>> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
>>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
>>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
>>> and change the SVN hooks to make SVN readonly, then disable gccadmin's
>>> cron jobs that build snapshots and update online documentation until they
>>> are ready to run with the git repository.  Once the existing git mirror
>>> has picked up the last changes I'll make that read-only and disable that
>>> cron job as well, and start the conversion process with a view to having
>>> the converted repository in place this weekend (it could either be made
>>> writable as soon as I think it's ready, or left read-only until people
>>> have had time to do any final checks on Monday).  Before then, I'll work
>>> on hooks, documentation and maintainer-scripts updates.
>>
>> Where and when and by who was it decided to use this conversion?
> 
> Joseph, please point to message on gcc@ mailing list that expresses consensus of GCC community to use reposurgeon conversion.  Otherwise, it is not appropriate to substitute one's opinion for community consensus.
> 

I've gone back through this thread (if I've missed, or misrepresented, 
anybody who's expressed an opinion I apologize now).

Segher Boessenkool <segher@kernel.crashing.org>
"If Joseph and Richard agree a candidate is good, then I will agree as
well.  All that can be left is nit-picking, and that is not worth it
anyway:"

Jeff Law <law@redhat.com>
"When Richard and I spoke we generally agreed that we felt a reposurgeon
conversion, if it could be made to work was the preferred solution,
followed by Maxim's approach and lastly the existing git-svn mirror."

Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>
FWIW, I now support using reposurgeon for the final conversion.

And, of course, I'm taking Joseph's opinion as read :-)

So I don't see any clear dissent and most folks just want to get this 
done.

> I want GCC community to get the best possible conversion, being it mine or reposurgeon's.  To this end I'm comparing the two conversions and will post my results later today.
> 

> Unfortunately, the comparison is complicated by the fact that you uploaded only "b" version of gcc-reposurgeon-8 repository, which uses modified branch layout (or confirm that there are no substantial differences between "7" and "8" reposurgeon conversions).

The main differences are

a) more revisions added due to commits upstream
b) release tags from SVN era now on the main release branch rather than 
in sidings
c) more author fixups from Joseph's cross validation against your 
repository and reposurgeon's own reports of suspect attributions


R.
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 


^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
@ 2020-01-10 11:38                                                           ` Richard Biener
  2020-01-10 12:09                                                             ` Iain Sandoe
                                                                               ` (2 more replies)
  2020-01-11 11:52                                                           ` Segher Boessenkool
  1 sibling, 3 replies; 198+ messages in thread
From: Richard Biener @ 2020-01-10 11:38 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, Joseph Myers, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

On Fri, Jan 10, 2020 at 10:49 AM Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
> >> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >>
> >> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
> >>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16
> >>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk
> >>> and change the SVN hooks to make SVN readonly, then disable gccadmin's
> >>> cron jobs that build snapshots and update online documentation until they
> >>> are ready to run with the git repository.  Once the existing git mirror
> >>> has picked up the last changes I'll make that read-only and disable that
> >>> cron job as well, and start the conversion process with a view to having
> >>> the converted repository in place this weekend (it could either be made
> >>> writable as soon as I think it's ready, or left read-only until people
> >>> have had time to do any final checks on Monday).  Before then, I'll work
> >>> on hooks, documentation and maintainer-scripts updates.
> >>
> >> Where and when and by who was it decided to use this conversion?
> >
> > Joseph, please point to message on gcc@ mailing list that expresses consensus of GCC community to use reposurgeon conversion.  Otherwise, it is not appropriate to substitute one's opinion for community consensus.
> >
>
> I've gone back through this thread (if I've missed, or misrepresented,
> anybody who's expressed an opinion I apologize now).
>
> Segher Boessenkool <segher@kernel.crashing.org>
> "If Joseph and Richard agree a candidate is good, then I will agree as
> well.  All that can be left is nit-picking, and that is not worth it
> anyway:"
>
> Jeff Law <law@redhat.com>
> "When Richard and I spoke we generally agreed that we felt a reposurgeon
> conversion, if it could be made to work was the preferred solution,
> followed by Maxim's approach and lastly the existing git-svn mirror."
>
> Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>
> FWIW, I now support using reposurgeon for the final conversion.
>
> And, of course, I'm taking Joseph's opinion as read :-)
>
> So I don't see any clear dissent and most folks just want to get this
> done.

Just to chime in I also just want to get it done (well, I can handle
SVN as well :P).
I trust Joseph, too, but then from my POV anything not worse than the current
mirror works for me.  Thanks to Maxim anyway for all the work - without that
we'd not switch in 10 other years...

Btw, "consensus" among the quiet doesn't usually work and "consensus" among
the most vocal isn't really "consensus".  I think GCC (and FOSS) works by
giving power to those who actually do the work.  Doesn't make it easier when
there are two, of course ;)

Richard.

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 11:38                                                           ` Richard Biener
@ 2020-01-10 12:09                                                             ` Iain Sandoe
  2020-01-10 13:11                                                               ` Joseph Myers
  2020-01-10 12:53                                                             ` Nathan Sidwell
  2020-01-11 11:57                                                             ` Segher Boessenkool
  2 siblings, 1 reply; 198+ messages in thread
From: Iain Sandoe @ 2020-01-10 12:09 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Earnshaw (lists),
	Maxim Kuvyrkov, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool, Richard Biener

Richard Biener <richard.guenther@gmail.com> wrote:

> On Fri, Jan 10, 2020 at 10:49 AM Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>> On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
>>>> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool  
>>>> <segher@kernel.crashing.org> wrote:
>>>>
>>>> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
>>>>> As noted on overseers, once Saturday's DATESTAMP update has run at  
>>>>> 00:16
>>>>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN  
>>>>> trunk
>>>>> and change the SVN hooks to make SVN readonly, then disable gccadmin's
>>>>> cron jobs that build snapshots and update online documentation until  
>>>>> they
>>>>> are ready to run with the git repository.  Once the existing git mirror
>>>>> has picked up the last changes I'll make that read-only and disable  
>>>>> that
>>>>> cron job as well, and start the conversion process with a view to  
>>>>> having
>>>>> the converted repository in place this weekend (it could either be made
>>>>> writable as soon as I think it's ready, or left read-only until people
>>>>> have had time to do any final checks on Monday).  Before then, I'll  
>>>>> work
>>>>> on hooks, documentation and maintainer-scripts updates.
>>>>
>>>> Where and when and by who was it decided to use this conversion?
>>>
>>> Joseph, please point to message on gcc@ mailing list that expresses  
>>> consensus of GCC community to use reposurgeon conversion.  Otherwise,  
>>> it is not appropriate to substitute one's opinion for community  
>>> consensus.
>>
>> I've gone back through this thread (if I've missed, or misrepresented,
>> anybody who's expressed an opinion I apologize now).
>>
>> Segher Boessenkool <segher@kernel.crashing.org>
>> "If Joseph and Richard agree a candidate is good, then I will agree as
>> well.  All that can be left is nit-picking, and that is not worth it
>> anyway:"
>>
>> Jeff Law <law@redhat.com>
>> "When Richard and I spoke we generally agreed that we felt a reposurgeon
>> conversion, if it could be made to work was the preferred solution,
>> followed by Maxim's approach and lastly the existing git-svn mirror."
>>
>> Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>
>> FWIW, I now support using reposurgeon for the final conversion.
>>
>> And, of course, I'm taking Joseph's opinion as read :-)
>>
>> So I don't see any clear dissent and most folks just want to get this
>> done.
>
> Just to chime in I also just want to get it done (well, I can handle
> SVN as well :P).
> I trust Joseph, too, but then from my POV anything not worse than the  
> current
> mirror works for me.  Thanks to Maxim anyway for all the work - without  
> that
> we'd not switch in 10 other years...
>
> Btw, "consensus" among the quiet doesn't usually work and "consensus" among
> the most vocal isn't really "consensus".  I think GCC (and FOSS) works by
> giving power to those who actually do the work.  Doesn't make it easier  
> when
> there are two, of course ;)

Thanks to all those whoâ€™ve put (a lot of) effort into doing this work and  
those whoâ€™ve
challenged and tested the conversions, for my part, I am also happy to take  
Josephâ€™s
recommendation.

One minor nit (and accepted that this might be too late).

mail commit messages like this:
[gcc-reposurgeon-8(refs/users/jsm28/heads/test-branch)] Test git hooks  
interaction with Bugzilla.

seem to have a title stretched by redundant infomation ;
at least "users/jsm28/test-branchâ€ would seem to contain all the necessary  
information

will commits in the user namespace appear on the mailing list in the end?

thanks again
Iain

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 11:38                                                           ` Richard Biener
  2020-01-10 12:09                                                             ` Iain Sandoe
@ 2020-01-10 12:53                                                             ` Nathan Sidwell
  2020-01-10 14:13                                                               ` Martin Liška
  2020-01-11 11:57                                                             ` Segher Boessenkool
  2 siblings, 1 reply; 198+ messages in thread
From: Nathan Sidwell @ 2020-01-10 12:53 UTC (permalink / raw)
  To: Richard Biener, Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, Joseph Myers, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

On 1/10/20 6:38 AM, Richard Biener wrote:

>> So I don't see any clear dissent and most folks just want to get this
>> done.
> 
> Just to chime in I also just want to get it done (well, I can handle
> SVN as well :P).
> I trust Joseph, too, but then from my POV anything not worse than the current
> mirror works for me.  Thanks to Maxim anyway for all the work - without that
> we'd not switch in 10 other years...

Joseph's conversion please

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 12:09                                                             ` Iain Sandoe
@ 2020-01-10 13:11                                                               ` Joseph Myers
  0 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2020-01-10 13:11 UTC (permalink / raw)
  To: Iain Sandoe
  Cc: Richard Earnshaw (lists),
	Maxim Kuvyrkov, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Fri, 10 Jan 2020, Iain Sandoe wrote:

> One minor nit (and accepted that this might be too late).

Hooks can always be changed after the conversion is live.

> mail commit messages like this:
> [gcc-reposurgeon-8(refs/users/jsm28/heads/test-branch)] Test git hooks
> interaction with Bugzilla.
> 
> seem to have a title stretched by redundant infomation ;
> at least "users/jsm28/test-branchâ€ would seem to contain all the necessary
> information

I guess this is something to consider for any cleaner upstream support in 
the hooks for custom branch namespaces.

> will commits in the user namespace appear on the mailing list in the end?

Right now the configuration is for all commits to appear on the mailing 
list, just as all SVN commits do.  I think user namespace commits are 
interesting to see, but we can set hooks.no-emails to refs/users/ in the 
hook configuration if we don't want them to appear on the list - that 
configuration option already exists.  https://github.com/AdaCore/git-hooks 
documents the available configuration options.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10  7:33                                                       ` Maxim Kuvyrkov
  2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
@ 2020-01-10 13:31                                                         ` Bernd Schmidt
  2020-01-10 15:27                                                           ` Eric S. Raymond
  2020-01-10 15:09                                                         ` Maxim Kuvyrkov
  2 siblings, 1 reply; 198+ messages in thread
From: Bernd Schmidt @ 2020-01-10 13:31 UTC (permalink / raw)
  To: Maxim Kuvyrkov, Joseph Myers
  Cc: Eric S. Raymond, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Jakub Jelinek, Segher Boessenkool

On 1/10/20 8:33 AM, Maxim Kuvyrkov wrote:
> Joseph, please point to message on gcc@ mailing list that expresses consensus of GCC community to use reposurgeon conversion.  Otherwise, it is not appropriate to substitute one's opinion for community consensus.

I was on the fence for a long time, since I felt that the rewritten 
reposurgeon was still somewhat unproven. However, I think the 
reposurgeon conversion is probably the best choice at this point, given 
that an actual problem caused by the use of git-svn was demonstrated by 
Richard E, indicating that the scripts do not have an inherent 
reliability advantage. I also think Joseph has a very throrough pair of 
eyeballs.

I have no opinion one way or another which method should be used to 
identify author names, since I have not looked into that.

Bernd

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 12:53                                                             ` Nathan Sidwell
@ 2020-01-10 14:13                                                               ` Martin Liška
  0 siblings, 0 replies; 198+ messages in thread
From: Martin Liška @ 2020-01-10 14:13 UTC (permalink / raw)
  To: Nathan Sidwell, Richard Biener, Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, Joseph Myers, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

On 1/10/20 1:53 PM, Nathan Sidwell wrote:
> On 1/10/20 6:38 AM, Richard Biener wrote:
> 
>>> So I don't see any clear dissent and most folks just want to get this
>>> done.
>>
>> Just to chime in I also just want to get it done (well, I can handle
>> SVN as well :P).
>> I trust Joseph, too, but then from my POV anything not worse than the current
>> mirror works for me.Â  Thanks to Maxim anyway for all the work - without that
>> we'd not switch in 10 other years...
> 
> Joseph's conversion please

+ 1

I would like to thank to all people involved in the GIT conversion!

Martin

> 
> nathan
> 

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10  7:33                                                       ` Maxim Kuvyrkov
  2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
  2020-01-10 13:31                                                         ` Bernd Schmidt
@ 2020-01-10 15:09                                                         ` Maxim Kuvyrkov
  2020-01-10 15:16                                                           ` Joseph Myers
  2 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2020-01-10 15:09 UTC (permalink / raw)
  To: GCC Development
  Cc: Joseph Myers, Eric S. Raymond, Richard Earnshaw (lists),
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

> On Jan 10, 2020, at 10:33 AM, Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> wrote:
> 
>> On Jan 9, 2020, at 5:38 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> 
>> On Wed, Jan 08, 2020 at 11:34:32PM +0000, Joseph Myers wrote:
>>> As noted on overseers, once Saturday's DATESTAMP update has run at 00:16 
>>> UTC on Saturday, I intend to add a README.MOVED_TO_GIT file on SVN trunk 
>>> and change the SVN hooks to make SVN readonly, then disable gccadmin's 
>>> cron jobs that build snapshots and update online documentation until they 
>>> are ready to run with the git repository.  Once the existing git mirror 
>>> has picked up the last changes I'll make that read-only and disable that 
>>> cron job as well, and start the conversion process with a view to having 
>>> the converted repository in place this weekend (it could either be made 
>>> writable as soon as I think it's ready, or left read-only until people 
>>> have had time to do any final checks on Monday).  Before then, I'll work 
>>> on hooks, documentation and maintainer-scripts updates.
>> 
>> Where and when and by who was it decided to use this conversion?
> 
> Joseph, please point to message on gcc@ mailing list that expresses consensus of GCC community to use reposurgeon conversion.  Otherwise, it is not appropriate to substitute one's opinion for community consensus.
> 
> I want GCC community to get the best possible conversion, being it mine or reposurgeon's.  To this end I'm comparing the two conversions and will post my results later today.
> 
> Unfortunately, the comparison is complicated by the fact that you uploaded only "b" version of gcc-reposurgeon-8 repository, which uses modified branch layout (or confirm that there are no substantial differences between "7" and "8" reposurgeon conversions).

There are plenty of difference between reposurgeon and svn-git conversions; today I've ignored subjective differences like author and committer entries and focused on comparing histories of branches.

Redhat's branches are among the most complicated and below analysis is difficult to follow.  It took me most of today to untangle it.  Let's look at redhat/gcc-9-branch.

TL;DR:
1. Reposurgeon conversion has extra history (more commits than intended) of redhat/gcc-4_7-branch@182541 merged into redhat/gcc-4_8-branch, which is then propagated into all following branches including redhat/gcc-9-branch.
2. Svn-git conversion has redhat/gcc-4_8-branch with history corresponding to SVN history, with no less and no more commits.
3. Other branches are likely to have similar issues, I didn't check.
4. I consider history of reposurgeon conversion to be incorrect.
5. The only history artifact (extra merges in reparented branches/tags) of svn-git conversion has been fixed.
6. I can appreciate that GCC community is tired of this discussion and wants it to go away.

Analysis:
Commit histories for redhat/gcc-9-branch match up to history inherited from redhat/gcc-4_8-branch (yes, redhat's branch history goes into ancient branches).  So now we are looking at redhat/gcc-4_8-branch, and the two conversions have different commit histories for redhat/gcc-4_8-branch.  This is relevant because it shows up in current development branch.  The histories diverge at r194477:
------------------------------------------------------------------------
r194477 | jakub | 2012-12-13 13:34:44 +0000 (Thu, 13 Dec 2012) | 3 lines

svn merge -r182540:182541 svn+ssh://gcc.gnu.org/svn/gcc/branches/redhat/gcc-4_7-branch
svn merge -r182546:182547 svn+ssh://gcc.gnu.org/svn/gcc/branches/redhat/gcc-4_7-branch
------------------------------------------------------------------------
Added: svn:mergeinfo
## -0,0 +0,4 ##
   Merged /branches/redhat/gcc-4_4-branch:r143377,143388,144574,144578,155228
   Merged /branches/redhat/gcc-4_5-branch:r161595
   Merged /branches/redhat/gcc-4_6-branch:r168425
   Merged /branches/redhat/gcc-4_7-branch:r182541,182547
------------------------------------------------------------------------

To me this looks like cherry-picks of r182541 and r182547 from redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

[1] Note that commit r182541 is itself a merge of redhat/gcc-4_6-branch@168425 into redhat/gcc-4_7-branch and cherry-picks from the other branches.  It is an actual merge (not cherry-pick) from redhat/gcc-4_6-branch@168425 because r168425 is the only commit to redhat/gcc-4_6-branch@168425 not present on trunk.  The other branches had more commits to their histories, so they can't be represented as git merges.

Reposurgeon commit for r194477 (e601ffdd860b0deed6d7ce78da61e8964c287b0b) merges in commit for r182541 from redhat/gcc-4_7-branch bringing *full* history of redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

Svn-git commit for r194477 (98d65ca0b53332e7c9cb62dfe85936a0e71d077e) cherry-picks commits r182541 and r182547 from redhat/gcc-4_7-branch.  As part of cherry-picking commit r182541 svn-git conversion merges redhat/gcc-4_6-branch@168425 into redhat/gcc-4_8-branch.  Merge from redhat/gcc-4_6-branch@168425 is expected, see [1].

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 15:09                                                         ` Maxim Kuvyrkov
@ 2020-01-10 15:16                                                           ` Joseph Myers
  2020-01-10 15:33                                                             ` Maxim Kuvyrkov
  0 siblings, 1 reply; 198+ messages in thread
From: Joseph Myers @ 2020-01-10 15:16 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Development, Eric S. Raymond, Richard Earnshaw (lists),
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:

> To me this looks like cherry-picks of r182541 and r182547 from 
> redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.

r182541 is the first commit on /branches/redhat/gcc-4_7-branch after it 
was created as a copy of trunk.  I.e., merging and cherry-picking it are 
indistinguishable, and it's entirely correct for reposurgeon to consider a 
commit merging it as a merge from r182541 (together with a cherry-pick of 
r182547).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09 12:30                                                       ` Joseph Myers
@ 2020-01-10 15:27                                                         ` Joseph Myers
  2020-01-11  7:06                                                         ` Gerald Pfeifer
  2020-01-14  8:21                                                         ` Jeff Law
  2 siblings, 0 replies; 198+ messages in thread
From: Joseph Myers @ 2020-01-10 15:27 UTC (permalink / raw)
  To: Jeff Law
  Cc: Eric S. Raymond, Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Thu, 9 Jan 2020, Joseph Myers wrote:

> On Wed, 8 Jan 2020, Jeff Law wrote:
> 
> > Is there any chance we could get one more trunk snapshot before the
> > conversion starts -- even if that means firing up the snapshot process
> > Friday?  It'd be quite useful for the ongoing Fedora build testing.
> 
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).

This gcc-10-20200110 snapshot is now available.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 13:31                                                         ` Bernd Schmidt
@ 2020-01-10 15:27                                                           ` Eric S. Raymond
  0 siblings, 0 replies; 198+ messages in thread
From: Eric S. Raymond @ 2020-01-10 15:27 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Maxim Kuvyrkov, Joseph Myers, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Jeff Law, Mark Wielaard,
	Jakub Jelinek, Segher Boessenkool

Bernd Schmidt <bernds_cb1@t-online.de>:
> I was on the fence for a long time, since I felt that the rewritten
> reposurgeon was still somewhat unproven.

And that was a fair criticism for a short while, until the first compare-all
verification on the GCC history came back clean.

The most difficult point in the whole process for me was in late
November.  That was when I faced up to the fact that, while I had a
Subversion dump reader that was 95% good, (1) that 5% could
disqualify it for this complex a history, and (2) I wasn't going to
be able to solve that last 5% without tearing down most of the reader
and rebuilding it.

The problem was that I'd been patching the dump reader to fix edge
cases for too long, and the code had rigidified. Too many auxiliary
data structures with partially overlapping semabtics - I couldn't
change anything without breaking everything. Which is the universe's
way of telling you it's time for a rewrite.

Of course the risk was that I wouldn't get that rewrite done in time
for deadline. But I had two assets that mitigated the risk. One was
a couple of very sharp collaborators, Julien Rivaud and Daniel Brooks
(and later another, Edward Cree). The other was having a really good
test suite, and a well-established procedure for integrating new
tests that jsm and rearnshaw were able to use.

It was (as the Duke of Wellington famously said) a damned near-run
thing. With all those advantages, if I had waited even a week longer
to make the crucial scrap-and-rebuild decision, the new reader might
have landed too late.

There's a lesson in here somewhere. When I figure out what it is, I'll
put it in my next book.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 15:16                                                           ` Joseph Myers
@ 2020-01-10 15:33                                                             ` Maxim Kuvyrkov
  2020-01-11  7:04                                                               ` Gerald Pfeifer
  0 siblings, 1 reply; 198+ messages in thread
From: Maxim Kuvyrkov @ 2020-01-10 15:33 UTC (permalink / raw)
  To: Joseph Myers
  Cc: GCC Development, Eric S. Raymond, Richard Earnshaw (lists),
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

> On Jan 10, 2020, at 6:15 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> 
> On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:
> 
>> To me this looks like cherry-picks of r182541 and r182547 from 
>> redhat/gcc-4_7-branch into redhat/gcc-4_8-branch.
> 
> r182541 is the first commit on /branches/redhat/gcc-4_7-branch after it 
> was created as a copy of trunk.  I.e., merging and cherry-picking it are 
> indistinguishable, and it's entirely correct for reposurgeon to consider a 
> commit merging it as a merge from r182541 (together with a cherry-pick of 
> r182547).

I was wrong re. r182541, I didn't notice that it is the first commit on branch.  This renders the analysis in favor of reposurgeon conversion, not svn-git.

--
Maxim Kuvyrkov
https://www.linaro.org

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 15:33                                                             ` Maxim Kuvyrkov
@ 2020-01-11  7:04                                                               ` Gerald Pfeifer
  0 siblings, 0 replies; 198+ messages in thread
From: Gerald Pfeifer @ 2020-01-11  7:04 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Joseph Myers, GCC Development, Eric S. Raymond,
	Richard Earnshaw (lists),
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek,
	Segher Boessenkool

On Fri, 10 Jan 2020, Maxim Kuvyrkov wrote:
> I was wrong re. r182541, I didn't notice that it is the first commit on 
> branch.  This renders the analysis in favor of reposurgeon conversion, 
> not svn-git.

Kudos for that statement, Maxim.

And thanks a bunch for all the work you have been doing, even if
the other conversion was picked in the end.  Like others have said,
without that we would not be where we are now.

Thank you,
Gerald

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09 12:30                                                       ` Joseph Myers
  2020-01-10 15:27                                                         ` Joseph Myers
@ 2020-01-11  7:06                                                         ` Gerald Pfeifer
  2020-01-14  8:21                                                         ` Jeff Law
  2 siblings, 0 replies; 198+ messages in thread
From: Gerald Pfeifer @ 2020-01-11  7:06 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jeff Law, Eric S. Raymond, Maxim Kuvyrkov,
	Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Thu, 9 Jan 2020, Joseph Myers wrote:
>> Is there any chance we could get one more trunk snapshot before the
>> conversion starts -- even if that means firing up the snapshot process
>> Friday?  It'd be quite useful for the ongoing Fedora build testing.
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).

Saturday's the GCC 9 snapshots are on, Sunday's GCC 10, so with a
GCC 10 snapshot out yesterday, perhaps run a GCC 9 snapshot today
or tomorrow and then fall back to the regular cadence?

Gerald

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
  2020-01-09 14:01                                                         ` Eric S. Raymond
@ 2020-01-11 11:30                                                         ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2020-01-11 11:30 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Joseph Myers, Eric S. Raymond, Maxim Kuvyrkov, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek

On Thu, Jan 09, 2020 at 12:12:49PM +0000, Richard Earnshaw (lists) wrote:
> On 09/01/2020 02:38, Segher Boessenkool wrote:
> >Where and when and by who was it decided to use this conversion?
> >
> >Will it at least be *tested* first?
> 
> Tested for what?

Acceptance test, of course, the only test that matters.  I.e. the GCC
community gets to decide if this conversion is acceptable to them, instead
of being confronted with it as a fait accompli.

> I want to also take this opportunity to thank Maxim for the work he has 
> done.  Having that fallback option has meant that we could press harder 
> for a timely solution and has also driven several significant 
> improvements to the overall result.  I do not think we would have 
> achieved as good a result overall if he hadn't developed his scripts.

And my thanks go to you and everyone else who tried to make this result
in a git conversion that is the most useful for us, the GCC developers
(and other consumers of our repo)!


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
  2020-01-10 11:38                                                           ` Richard Biener
@ 2020-01-11 11:52                                                           ` Segher Boessenkool
  1 sibling, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2020-01-11 11:52 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Maxim Kuvyrkov, Joseph Myers, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek

On Fri, Jan 10, 2020 at 09:49:41AM +0000, Richard Earnshaw (lists) wrote:
> On 10/01/2020 07:33, Maxim Kuvyrkov wrote:
> >>On Jan 9, 2020, at 5:38 AM, Segher Boessenkool 
> >><segher@kernel.crashing.org> wrote:
> >>Where and when and by who was it decided to use this conversion?
> >
> >Joseph, please point to message on gcc@ mailing list that expresses 
> >consensus of GCC community to use reposurgeon conversion.  Otherwise, it 
> >is not appropriate to substitute one's opinion for community consensus.
> 
> I've gone back through this thread (if I've missed, or misrepresented, 
> anybody who's expressed an opinion I apologize now).
> 
> Segher Boessenkool <segher@kernel.crashing.org>
> "If Joseph and Richard agree a candidate is good, then I will agree as
> well.  All that can be left is nit-picking, and that is not worth it
> anyway:"

That is not saying I agree the reposurgeon conversion is best if you two
agree.  It says that if you think that is a good conversion, then I agree.
However I do still think it is the worst of the three options, in some
regards.

> So I don't see any clear dissent and most folks just want to get this 
> done.

Yes.  After the GCC community took over five years to decide to switch
to git, and then we were delayed by another almost five years because it
just *had* to be done using reposurgeon, we just want it *done*, and even
the reposurgeon option is acceptable, in my book.

I don't look at old commit messages *at all* (*).  Mangled patch authors
can be harder, but I do have old trees as well, worst case.  We'll
survive, the info in the changelogs is still there.  And hopefully new
patches will eventually have good author info and commit messages.

To a gitty future, onwards and upwards, etc.,

Segher

(*) That's a lie: I look at it a lot, but only to extract the SVN
revision number from it!

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-10 11:38                                                           ` Richard Biener
  2020-01-10 12:09                                                             ` Iain Sandoe
  2020-01-10 12:53                                                             ` Nathan Sidwell
@ 2020-01-11 11:57                                                             ` Segher Boessenkool
  2 siblings, 0 replies; 198+ messages in thread
From: Segher Boessenkool @ 2020-01-11 11:57 UTC (permalink / raw)
  To: Richard Biener
  Cc: Richard Earnshaw (lists),
	Maxim Kuvyrkov, Joseph Myers, Eric S. Raymond, GCC Development,
	Alexandre Oliva, Jeff Law, Mark Wielaard, Jakub Jelinek

On Fri, Jan 10, 2020 at 12:38:10PM +0100, Richard Biener wrote:
> Just to chime in I also just want to get it done (well, I can handle
> SVN as well :P).

I will never have to learn it!  I'm so happy!

> I trust Joseph, too, but then from my POV anything not worse than the current
> mirror works for me.  Thanks to Maxim anyway for all the work - without that
> we'd not switch in 10 other years...

Absolutely agreed!  Thank you, Maxim.


Segher

^ permalink raw reply	[flat|nested] 198+ messages in thread

* Re: Proposal for the transition timetable for the move to GIT
  2020-01-09 12:30                                                       ` Joseph Myers
  2020-01-10 15:27                                                         ` Joseph Myers
  2020-01-11  7:06                                                         ` Gerald Pfeifer
@ 2020-01-14  8:21                                                         ` Jeff Law
  2 siblings, 0 replies; 198+ messages in thread
From: Jeff Law @ 2020-01-14  8:21 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Eric S. Raymond, Maxim Kuvyrkov, Richard Earnshaw (lists),
	GCC Development, Alexandre Oliva, Segher Boessenkool,
	Mark Wielaard, Jakub Jelinek

On Thu, 2020-01-09 at 12:30 +0000, Joseph Myers wrote:
> On Wed, 8 Jan 2020, Jeff Law wrote:
> 
> > Is there any chance we could get one more trunk snapshot before the
> > conversion starts -- even if that means firing up the snapshot process
> > Friday?  It'd be quite useful for the ongoing Fedora build testing.
> 
> I could run a snapshot manually.  I was planning to run at least one 
> snapshot (for some branch) manually *after* the conversion to test the 
> conversion of the gcc_release script to use git (in snapshot mode that 
> doesn't make any commits so could be done while the git repository is 
> still read-only for checking).
THanks.  It was greatly appreciated.

Jeff

^ permalink raw reply	[flat|nested] 198+ messages in thread

end of thread, other threads:[~2020-01-13 22:18 UTC | newest]

Thread overview: 198+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 12:02 Proposal for the transition timetable for the move to GIT Richard Earnshaw (lists)
2019-09-17 12:24 ` Richard Biener
2019-09-17 13:50   ` Richard Earnshaw (lists)
2019-09-17 16:35   ` Joseph Myers
2019-09-17 17:51     ` Richard Earnshaw (lists)
2019-09-17 16:33 ` Joseph Myers
2019-09-19 12:04 ` Janne Blomqvist
2019-09-19 14:43   ` Damian Rouson
2019-09-19 15:30     ` Janne Blomqvist
2019-10-25 14:10     ` Richard Earnshaw (lists)
2019-10-25 16:32       ` Jeff Law
2019-09-19 15:30   ` Richard Earnshaw (lists)
2019-09-19 15:49     ` Damian Rouson
2019-09-19 15:35 ` Maxim Kuvyrkov
2019-12-06 14:44   ` Maxim Kuvyrkov
2019-12-06 17:21     ` Eric S. Raymond
2019-12-06 17:39       ` Richard Biener
2019-12-06 19:46         ` Eric S. Raymond
2019-12-06 20:43           ` Sandra Loosemore
2019-12-07  2:57           ` Segher Boessenkool
2019-12-09 18:19           ` Joseph Myers
2019-12-09 18:40             ` Bernd Schmidt
2019-12-09 20:45               ` Joseph Myers
2019-12-09 22:12               ` Eric S. Raymond
2019-12-09 19:28             ` Eric S. Raymond
2019-12-11 14:40             ` Maxim Kuvyrkov
2019-12-11 15:03               ` Richard Earnshaw (lists)
2019-12-11 15:19                 ` Jonathan Wakely
2019-12-11 15:21                   ` Richard Earnshaw (lists)
2019-12-11 15:36                     ` Joseph Myers
2019-12-11 16:02                       ` Jonathan Wakely
2019-12-11 17:47                         ` Eric S. Raymond
2019-12-16  2:19                       ` Joseph Myers
2019-12-11 15:30                   ` Dennis Luehring
2019-12-11 15:36                     ` Richard Earnshaw
2019-12-11 17:36                   ` Eric S. Raymond
2019-12-06 20:49       ` Bernd Schmidt
2019-12-16  9:53     ` Mark Wielaard
2019-12-16 11:29       ` Joseph Myers
2019-12-16 12:43         ` Mark Wielaard
2019-12-16 13:36           ` Segher Boessenkool
2019-12-16 13:54             ` Eric S. Raymond
2019-12-16 14:05               ` Segher Boessenkool
2019-12-16 14:13                 ` Joseph Myers
2019-12-16 15:37                   ` Segher Boessenkool
2019-12-16 16:36                     ` Joseph Myers
2019-12-16 17:40                     ` Jeff Law
2019-12-25  8:12                       ` Alexandre Oliva
2019-12-25 12:07                         ` Eric S. Raymond
2019-12-25 12:24                           ` Segher Boessenkool
2019-12-25 14:16                             ` Joseph Myers
2019-12-25 18:50                             ` Eric S. Raymond
2019-12-25 19:18                               ` Segher Boessenkool
2019-12-26  6:09                           ` Alexandre Oliva
2019-12-26 11:04                             ` Joseph Myers
2019-12-26 11:17                               ` Jakub Jelinek
2019-12-26 12:10                                 ` Joseph Myers
2019-12-26 16:11                                 ` Maxim Kuvyrkov
2019-12-26 16:58                                   ` Joseph Myers
2019-12-26 18:36                                     ` Jakub Jelinek
2019-12-26 18:59                                       ` Joseph Myers
2019-12-27 11:21                                         ` Richard Earnshaw (lists)
2019-12-27 11:33                                           ` Andrew Pinski
2019-12-27 13:35                                             ` Segher Boessenkool
2019-12-27 11:35                                           ` Joseph Myers
2019-12-27 12:37                                             ` Richard Earnshaw (lists)
2019-12-28  2:27                                               ` Eric S. Raymond
2019-12-28 11:23                                                 ` Joseph Myers
2019-12-28 12:19                                             ` Segher Boessenkool
2019-12-28 17:11                                               ` Richard Earnshaw (lists)
2019-12-28 20:28                                                 ` Segher Boessenkool
2019-12-29  1:45                                                   ` Julien "FrnchFrgg" Rivaud
2019-12-29 10:41                                                     ` Segher Boessenkool
2019-12-29 11:02                                                       ` Richard Biener
2019-12-29 11:47                                                         ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:31                                                           ` Segher Boessenkool
2019-12-29 13:51                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 12:15                                                         ` Segher Boessenkool
2019-12-29 16:32                                                           ` Richard Earnshaw
2019-12-29 16:37                                                             ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 11:42                                                       ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 13:26                                                         ` Segher Boessenkool
2019-12-29 13:48                                                           ` Julien '_FrnchFrgg_' RIVAUD
2019-12-29 15:01                                                             ` Segher Boessenkool
2019-12-29 17:31                                                             ` Ian Lance Taylor via gcc
2019-12-30  0:31                                                               ` Julien "FrnchFrgg" Rivaud
2019-12-29 21:31                                                           ` Thomas Koenig
2019-12-29 23:57                                                             ` Jeff Law
2019-12-27 13:29                                           ` Segher Boessenkool
2019-12-26 20:31                                     ` Richard Biener
2019-12-27  1:32                                     ` Joseph Myers
2019-12-27 10:14                                       ` Maxim Kuvyrkov
2019-12-28  1:55                                         ` Eric S. Raymond
2019-12-29 18:31                                   ` Maxim Kuvyrkov
2019-12-29 18:55                                     ` Joseph Myers
2019-12-29 22:47                                       ` Eric S. Raymond
2019-12-29 23:00                                         ` Joseph Myers
2019-12-29 23:13                                           ` Segher Boessenkool
2019-12-30 15:36                                             ` Richard Earnshaw (lists)
2019-12-30 22:37                                               ` Segher Boessenkool
2019-12-30 22:58                                                 ` Joseph Myers
2019-12-31  0:23                                                   ` Segher Boessenkool
2019-12-31 12:48                                                     ` Segher Boessenkool
2019-12-31  3:09                                                   ` Eric S. Raymond
2019-12-29 22:24                                     ` Richard Earnshaw (lists)
2019-12-30  0:18                                       ` Joseph Myers
2019-12-30  0:44                                         ` Julien "FrnchFrgg" Rivaud
2019-12-30 12:39                                         ` Maxim Kuvyrkov
2019-12-30 13:01                                       ` Maxim Kuvyrkov
2019-12-30 15:31                                         ` Richard Earnshaw (lists)
2019-12-30 15:49                                           ` Maxim Kuvyrkov
2019-12-30 16:08                                             ` Richard Earnshaw (lists)
2020-01-02  2:59                                               ` Alexandre Oliva
2020-01-02 10:58                                                 ` Richard Earnshaw (lists)
2020-01-08 20:46                                               ` Maxim Kuvyrkov
2020-01-08 22:11                                                 ` Eric S. Raymond
2020-01-08 23:34                                                   ` Joseph Myers
2020-01-09  2:38                                                     ` Segher Boessenkool
2020-01-09 12:12                                                       ` Richard Earnshaw (lists)
2020-01-09 14:01                                                         ` Eric S. Raymond
2020-01-11 11:30                                                         ` Segher Boessenkool
2020-01-10  7:33                                                       ` Maxim Kuvyrkov
2020-01-10  9:49                                                         ` Richard Earnshaw (lists)
2020-01-10 11:38                                                           ` Richard Biener
2020-01-10 12:09                                                             ` Iain Sandoe
2020-01-10 13:11                                                               ` Joseph Myers
2020-01-10 12:53                                                             ` Nathan Sidwell
2020-01-10 14:13                                                               ` Martin Liška
2020-01-11 11:57                                                             ` Segher Boessenkool
2020-01-11 11:52                                                           ` Segher Boessenkool
2020-01-10 13:31                                                         ` Bernd Schmidt
2020-01-10 15:27                                                           ` Eric S. Raymond
2020-01-10 15:09                                                         ` Maxim Kuvyrkov
2020-01-10 15:16                                                           ` Joseph Myers
2020-01-10 15:33                                                             ` Maxim Kuvyrkov
2020-01-11  7:04                                                               ` Gerald Pfeifer
2020-01-09  5:07                                                     ` Jeff Law
2020-01-09 12:30                                                       ` Joseph Myers
2020-01-10 15:27                                                         ` Joseph Myers
2020-01-11  7:06                                                         ` Gerald Pfeifer
2020-01-14  8:21                                                         ` Jeff Law
2019-12-26 22:33                                 ` Joseph Myers
2019-12-26 19:16                             ` Eric S. Raymond
2019-12-26 20:08                               ` Alexandre Oliva
2019-12-26 20:28                                 ` Joseph Myers
2019-12-27 12:06                                   ` Alexandre Oliva
2019-12-27 12:21                                     ` Joseph Myers
2019-12-28  2:33                                       ` Eric S. Raymond
2019-12-26 21:19                                 ` Eric S. Raymond
2019-12-25 12:10                         ` Segher Boessenkool
2019-12-25 14:13                           ` Joseph Myers
2019-12-29 16:47                           ` Mark Wielaard
2019-12-29 22:42                             ` Joseph Myers
2019-12-16 16:27                   ` Eric S. Raymond
2019-12-16 16:47                     ` Segher Boessenkool
2019-12-16 16:04               ` Jeff Law
2019-12-16 16:37                 ` Eric S. Raymond
2019-12-16 16:47                   ` Jeff Law
2019-12-31 13:43                     ` Joseph Myers
2019-12-31 14:13                       ` Richard Earnshaw (lists)
2019-12-31 17:26                       ` Segher Boessenkool
2019-12-16 13:56             ` Joseph Myers
2019-12-16 14:17               ` Mark Wielaard
2019-12-16 16:29                 ` Joseph Myers
2019-12-16 13:53           ` Joseph Myers
2019-12-16 16:39             ` Jeff Law
2019-12-16 17:57               ` Richard Biener
2019-12-16 16:55         ` Jeff Law
2019-12-16 17:08           ` Joseph Myers
2019-12-16 19:15             ` Eric S. Raymond
2019-12-16 21:59             ` Segher Boessenkool
2019-12-16 22:14               ` Jeff Law
2019-12-16 22:42                 ` Segher Boessenkool
2019-12-16 23:26                   ` Joseph Myers
2019-12-16 23:44                     ` Eric S. Raymond
2019-12-18 18:07                   ` Jeff Law
2019-12-18 18:24                     ` Joseph Myers
2019-12-19  0:57                       ` Eric S. Raymond
2019-12-18 19:50                     ` Segher Boessenkool
2019-12-18 20:43                       ` Jeff Law
2019-12-20 16:28                         ` Segher Boessenkool
2019-12-19  2:34                       ` Unix philosopy vs. poor semantic locality Eric S. Raymond
2019-12-19  3:16                         ` Joseph Myers
2019-12-19  5:46                           ` Eric S. Raymond
2019-12-19  0:46                     ` Proposal for the transition timetable for the move to GIT Eric S. Raymond
2019-12-16 23:34                 ` Eric S. Raymond
2019-12-16 23:18               ` Joseph Myers
2019-12-16 23:19               ` Eric S. Raymond
2019-12-18 17:27                 ` Segher Boessenkool
2019-12-16 13:33       ` Segher Boessenkool
2019-09-19 17:04 ` Paul Koning
2019-10-25 14:02   ` Richard Earnshaw (lists)
2019-09-20 15:49 ` Jeff Law
2019-09-21  9:11   ` Segher Boessenkool
2019-09-21  9:39     ` Andreas Schwab
2019-09-21  9:51       ` Segher Boessenkool
2019-09-21 10:04         ` Andreas Schwab
2019-09-21  9:26 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).