public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Repo conversion troubles.
@ 2018-07-09 19:19 Eric S. Raymond
  2018-07-09 19:40 ` Jeff Law
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 19:19 UTC (permalink / raw)
  To: GCC Development, fallenpegasus

Last time I did a comparison between SVN head and the git conversion
tip they matched exactly.  This time I have mismatches in the following
files.

libtool.m4
libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp
ltmain.sh
lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c
MAINTAINERS
maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release
Makefile.def
Makefile.in
Makefile.tpl
zlib/configure
zlib/configure.ac

Now I'll explain what this means and why it's a serious problem.

Reposurgeon is never confused by linear history, branching, or
tagging; I have lots of regression tests for those cases.  When it
screws up it is invariably around branch copy operations, because
there are cases near those where the data model of Subversion stream
files is underspecified. That model was in fact entirely undocumented
before I reverse-engineered it and wrote the description that now
lives in the Subversion source tree.  But that description is not
complete; nobody, not even Subversion's designers, knows how to fill
in all the corner cases.

Thus, a content mismatch like this means there was some recent branch
merge to trunk in the gcc history that reposurgeon is not interpreting
as intended, or more likely an operator error such as a non-Subversion
directory copy followed by a commit - my analyzer can recover from
most such cases but not all.

There are brute-force ways to pin down such malformations, but none of
them are practical at the huge scale of this repository.  The main
problem here wouldn't reposurgeon itself but the fact that Subversion
checkouts on a repo this large are very slow. I've seen a single one
take 12 hours; an attempt at a whole bisection run to pin down the
divergence point on trunk would therefore probably cost log2 of the
commit length times that, or about 18 days.

So...does that list of changed files look familar to anyone?  If we can
identify the revision number of the bad commit, the odds of being able
to unscramble this mess go way up.  They still aren't good, not when
merely loading the repository for examination takes over four hours,
but they would way better than if I were starting from zero.

This is serious. I have preduced demonstrably correct history
conversions of the gcc repo in the past.  We may now be in a situation
where I will never again be able to do that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The real point of audits is to instill fear, not to extract revenue;
the IRS aims at winning through intimidation and (thereby) getting
maximum voluntary compliance
	-- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:19 Repo conversion troubles Eric S. Raymond
@ 2018-07-09 19:40 ` Jeff Law
  2018-07-09 19:57   ` Eric S. Raymond
  2018-07-09 19:46 ` Bernd Schmidt
  2018-07-09 20:04 ` Richard Biener
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2018-07-09 19:40 UTC (permalink / raw)
  To: Eric S. Raymond, GCC Development, fallenpegasus

On 07/09/2018 01:19 PM, Eric S. Raymond wrote:
> Last time I did a comparison between SVN head and the git conversion
> tip they matched exactly.  This time I have mismatches in the following
> files.
> 
> libtool.m4
> libvtv/ChangeLog
> libvtv/configure
> libvtv/testsuite/lib/libvtv.exp
> ltmain.sh
> lto-plugin/ChangeLog
> lto-plugin/configure
> lto-plugin/lto-plugin.c
> MAINTAINERS
> maintainer-scripts/ChangeLog
> maintainer-scripts/crontab
> maintainer-scripts/gcc_release
> Makefile.def
> Makefile.in
> Makefile.tpl
> zlib/configure
> zlib/configure.ac
> 
> Now I'll explain what this means and why it's a serious problem.
[ ... ]
That's weird -- let's take maintainer-scripts/crontab as our victim.
That file (according to the git mirror) has only changed on the trunk 3
times in the last year.  They're all changes from Jakub and none look
unusual at all.  Just trivial looking updates.

libvtv.exp is another interesting file.  It changed twice in early May
of this year.  Prior to that it hadn't changed since 2015.


[ ... ]

> 
> There are brute-force ways to pin down such malformations, but none of
> them are practical at the huge scale of this repository.  The main
> problem here wouldn't reposurgeon itself but the fact that Subversion
> checkouts on a repo this large are very slow. I've seen a single one
> take 12 hours; an attempt at a whole bisection run to pin down the
> divergence point on trunk would therefore probably cost log2 of the
> commit length times that, or about 18 days.
I'm not aware of any such merges, but any that occurred most likely
happened after mid-April when the trunk was re-opened for development.

I'm assuming that it's only work that merges onto the trunk that's
potentially problematical here.

> 
> So...does that list of changed files look familar to anyone?  If we can
> identify the revision number of the bad commit, the odds of being able
> to unscramble this mess go way up.  They still aren't good, not when
> merely loading the repository for examination takes over four hours,
> but they would way better than if I were starting from zero.
They're familiar only in the sense that I know what those files are :-)

Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:19 Repo conversion troubles Eric S. Raymond
  2018-07-09 19:40 ` Jeff Law
@ 2018-07-09 19:46 ` Bernd Schmidt
  2018-07-09 19:59   ` Eric S. Raymond
  2018-07-09 20:04 ` Richard Biener
  2 siblings, 1 reply; 20+ messages in thread
From: Bernd Schmidt @ 2018-07-09 19:46 UTC (permalink / raw)
  To: Eric S. Raymond, GCC Development, fallenpegasus

On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> Last time I did a comparison between SVN head and the git conversion
> tip they matched exactly.  This time I have mismatches in the following
> files.

So what are the diffs? Are we talking about small differences (like one
change missing) or large-scale mismatches?


Bernd


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:40 ` Jeff Law
@ 2018-07-09 19:57   ` Eric S. Raymond
  2018-07-09 20:01     ` Jeff Law
  0 siblings, 1 reply; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 19:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Development, fallenpegasus

Jeff Law <law@redhat.com>:
> > There are brute-force ways to pin down such malformations, but none of
> > them are practical at the huge scale of this repository.  The main
> > problem here wouldn't reposurgeon itself but the fact that Subversion
> > checkouts on a repo this large are very slow. I've seen a single one
> > take 12 hours; an attempt at a whole bisection run to pin down the
> > divergence point on trunk would therefore probably cost log2 of the
> > commit length times that, or about 18 days.
>
> I'm not aware of any such merges, but any that occurred most likely
> happened after mid-April when the trunk was re-opened for development.

I agree it can't have been earlier than that, or I'd have hit this rock
sooner.  I'd bet on the problem having arisen within the last six weeks.

> I'm assuming that it's only work that merges onto the trunk that's
> potentially problematical here.

Yes.  It is possible there are also content mismatches on branches - I
haven't run that check yet, it takes an absurd amount of time to complete -
- but not much point in worrying about that if we can't get trunk right.

I'm pretty certain things were still good at r256000.  I've started that
check running.  Not expecting results in less than twelve hours.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:46 ` Bernd Schmidt
@ 2018-07-09 19:59   ` Eric S. Raymond
  2018-07-10  1:13     ` Alexandre Oliva
  2018-07-10  8:20     ` Jonathan Wakely
  0 siblings, 2 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 19:59 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Development, fallenpegasus

Bernd Schmidt <bernds_cb1@t-online.de>:
> On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > Last time I did a comparison between SVN head and the git conversion
> > tip they matched exactly.  This time I have mismatches in the following
> > files.
> 
> So what are the diffs? Are we talking about small differences (like one
> change missing) or large-scale mismatches?

Large-scale, I'm afraid.  The context diff is about a GLOC.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:57   ` Eric S. Raymond
@ 2018-07-09 20:01     ` Jeff Law
  2018-07-09 20:06       ` Eric S. Raymond
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2018-07-09 20:01 UTC (permalink / raw)
  To: esr; +Cc: GCC Development, fallenpegasus

On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> Jeff Law <law@redhat.com>:
>>> There are brute-force ways to pin down such malformations, but none of
>>> them are practical at the huge scale of this repository.  The main
>>> problem here wouldn't reposurgeon itself but the fact that Subversion
>>> checkouts on a repo this large are very slow. I've seen a single one
>>> take 12 hours; an attempt at a whole bisection run to pin down the
>>> divergence point on trunk would therefore probably cost log2 of the
>>> commit length times that, or about 18 days.
>>
>> I'm not aware of any such merges, but any that occurred most likely
>> happened after mid-April when the trunk was re-opened for development.
> 
> I agree it can't have been earlier than that, or I'd have hit this rock
> sooner.  I'd bet on the problem having arisen within the last six weeks.
> 
>> I'm assuming that it's only work that merges onto the trunk that's
>> potentially problematical here.
> 
> Yes.  It is possible there are also content mismatches on branches - I
> haven't run that check yet, it takes an absurd amount of time to complete -
> - but not much point in worrying about that if we can't get trunk right.
> 
> I'm pretty certain things were still good at r256000.  I've started that
> check running.  Not expecting results in less than twelve hours.
r256000 would be roughly Christmas 2017.  I'd be very surprised if any
merges to the trunk happened between that point and early April.  We're
essentially in regression bugfixes only during that timeframe.  Not a
time for branch->trunk merging :-)

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:19 Repo conversion troubles Eric S. Raymond
  2018-07-09 19:40 ` Jeff Law
  2018-07-09 19:46 ` Bernd Schmidt
@ 2018-07-09 20:04 ` Richard Biener
  2018-07-09 20:20   ` Eric S. Raymond
  2 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2018-07-09 20:04 UTC (permalink / raw)
  To: gcc, esr, GCC Development, fallenpegasus

On July 9, 2018 9:19:11 PM GMT+02:00, esr@thyrsus.com wrote:
>Last time I did a comparison between SVN head and the git conversion
>tip they matched exactly.  This time I have mismatches in the following
>files.
>
>libtool.m4
>libvtv/ChangeLog
>libvtv/configure
>libvtv/testsuite/lib/libvtv.exp
>ltmain.sh
>lto-plugin/ChangeLog
>lto-plugin/configure
>lto-plugin/lto-plugin.c
>MAINTAINERS
>maintainer-scripts/ChangeLog
>maintainer-scripts/crontab
>maintainer-scripts/gcc_release
>Makefile.def
>Makefile.in
>Makefile.tpl
>zlib/configure
>zlib/configure.ac
>
>Now I'll explain what this means and why it's a serious problem.
>
>Reposurgeon is never confused by linear history, branching, or
>tagging; I have lots of regression tests for those cases.  When it
>screws up it is invariably around branch copy operations, because
>there are cases near those where the data model of Subversion stream
>files is underspecified. That model was in fact entirely undocumented
>before I reverse-engineered it and wrote the description that now
>lives in the Subversion source tree.  But that description is not
>complete; nobody, not even Subversion's designers, knows how to fill
>in all the corner cases.
>
>Thus, a content mismatch like this means there was some recent branch
>merge to trunk in the gcc history that reposurgeon is not interpreting
>as intended, or more likely an operator error such as a non-Subversion
>directory copy followed by a commit - my analyzer can recover from
>most such cases but not all.
>
>There are brute-force ways to pin down such malformations, but none of
>them are practical at the huge scale of this repository.  The main
>problem here wouldn't reposurgeon itself but the fact that Subversion
>checkouts on a repo this large are very slow. I've seen a single one
>take 12 hours; an attempt at a whole bisection run to pin down the
>divergence point on trunk would therefore probably cost log2 of the
>commit length times that, or about 18 days.

12 hours from remote I guess? The subversion repository is available through rsync so you can create a local mirror to work from (we've been doing that at suse for years) 

Richard. 

>
>So...does that list of changed files look familar to anyone?  If we can
>identify the revision number of the bad commit, the odds of being able
>to unscramble this mess go way up.  They still aren't good, not when
>merely loading the repository for examination takes over four hours,
>but they would way better than if I were starting from zero.
>
>This is serious. I have preduced demonstrably correct history
>conversions of the gcc repo in the past.  We may now be in a situation
>where I will never again be able to do that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 20:01     ` Jeff Law
@ 2018-07-09 20:06       ` Eric S. Raymond
  0 siblings, 0 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 20:06 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Development, fallenpegasus

Jeff Law <law@redhat.com>:
> > I'm pretty certain things were still good at r256000.  I've started that
> > check running.  Not expecting results in less than twelve hours.

> r256000 would be roughly Christmas 2017.  I'd be very surprised if any
> merges to the trunk happened between that point and early April.  We're
> essentially in regression bugfixes only during that timeframe.  Not a
> time for branch->trunk merging :-)

Thanks, that's useful to know.  That means if the r256000 check passes
I can jump forward to 1 Apr reasonably expecting that one to pass too.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 20:04 ` Richard Biener
@ 2018-07-09 20:20   ` Eric S. Raymond
  2018-07-10  4:57     ` Richard Biener
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 20:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, fallenpegasus

Richard Biener <richard.guenther@gmail.com>:
> 12 hours from remote I guess? The subversion repository is available through rsync so you can create a local mirror to work from (we've been doing that at suse for years) 

I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
about this and his response was basically "Well...we don't do that often."

You probably never see thids case.  Update from a remote is much faster.

I'm trying to do a manual correctness check via update to commit 256000 now.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:59   ` Eric S. Raymond
@ 2018-07-10  1:13     ` Alexandre Oliva
  2018-07-20 21:48       ` Joseph Myers
  2018-07-10  8:20     ` Jonathan Wakely
  1 sibling, 1 reply; 20+ messages in thread
From: Alexandre Oliva @ 2018-07-10  1:13 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: Bernd Schmidt, GCC Development, fallenpegasus

On Jul  9, 2018, Jeff Law <law@redhat.com> wrote:

> On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
>> Jeff Law <law@redhat.com>:
>>> I'm not aware of any such merges, but any that occurred most likely
>>> happened after mid-April when the trunk was re-opened for development.

>> I'm pretty certain things were still good at r256000.  I've started that
>> check running.  Not expecting results in less than twelve hours.

> r256000 would be roughly Christmas 2017.

When was the RAID/LVM disk corruption incident?  Could it possibly have
left any of our svn repo metadata in a corrupted way that confuses
reposurgeon, and that leads to such huge differences?

On Jul  9, 2018, "Eric S. Raymond" <esr@thyrsus.com> wrote:

> Bernd Schmidt <bernds_cb1@t-online.de>:
>> So what are the diffs? Are we talking about small differences (like one
>> change missing) or large-scale mismatches?

> Large-scale, I'm afraid.  The context diff is about a GLOC.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free!         FSF Latin America board member
GNU Toolchain Engineer                Free Software Evangelist

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 20:20   ` Eric S. Raymond
@ 2018-07-10  4:57     ` Richard Biener
  2018-07-10 11:22     ` Philip Martin
  2018-07-20 21:43     ` Joseph Myers
  2 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2018-07-10  4:57 UTC (permalink / raw)
  To: esr, Eric S. Raymond; +Cc: gcc, fallenpegasus

On July 9, 2018 10:20:39 PM GMT+02:00, "Eric S. Raymond" <esr@thyrsus.com> wrote:
>Richard Biener <richard.guenther@gmail.com>:
>> 12 hours from remote I guess? The subversion repository is available
>through rsync so you can create a local mirror to work from (we've been
>doing that at suse for years) 
>
>I'm saying I see rsync plus local checkout take 10-12 hours. 

For a fresh rsync I can guess that's true. But it works incremental just fine and quick for me... 

 I asked
>Jason
>about this and his response was basically "Well...we don't do that
>often."
>
>You probably never see thids case.  Update from a remote is much
>faster.
>
>I'm trying to do a manual correctness check via update to commit 256000
>now.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 19:59   ` Eric S. Raymond
  2018-07-10  1:13     ` Alexandre Oliva
@ 2018-07-10  8:20     ` Jonathan Wakely
  2018-07-10  8:34       ` Jonathan Wakely
  2018-07-20 22:06       ` Joseph Myers
  1 sibling, 2 replies; 20+ messages in thread
From: Jonathan Wakely @ 2018-07-10  8:20 UTC (permalink / raw)
  To: Eric Raymond; +Cc: Bernd Schmidt, gcc, fallenpegasus

On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond <esr@thyrsus.com> wrote:
>
> Bernd Schmidt <bernds_cb1@t-online.de>:
> > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > Last time I did a comparison between SVN head and the git conversion
> > > tip they matched exactly.  This time I have mismatches in the following
> > > files.
> >
> > So what are the diffs? Are we talking about small differences (like one
> > change missing) or large-scale mismatches?
>
> Large-scale, I'm afraid.  The context diff is about a GLOC.

I don't see how that's possible. Most of those files are tiny, or
change very rarely, so I don't see how that large a diff can happen.

Take zlib/configure.ac and zlib/configure, there's only been one
change in the past 18 months: https://gcc.gnu.org/r261739
That change didn't touch the other files in the list.

libtool.m4 has one change in the past 2 years (just a few days ago):
https://gcc.gnu.org/r262451
That was also tiny, and didn't touch the other files.

maintainer-scripts/crontab only has one change in the past 6 months:
https://gcc.gnu.org/r259637
That was a tiny change, and didn't touch any other files.

None of those were merges from any other branch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-10  8:20     ` Jonathan Wakely
@ 2018-07-10  8:34       ` Jonathan Wakely
  2018-07-10 10:48         ` Eric S. Raymond
  2018-07-20 22:06       ` Joseph Myers
  1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Wakely @ 2018-07-10  8:34 UTC (permalink / raw)
  To: Eric Raymond; +Cc: Bernd Schmidt, gcc, fallenpegasus

On Tue, 10 Jul 2018 at 09:19, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond <esr@thyrsus.com> wrote:
> >
> > Bernd Schmidt <bernds_cb1@t-online.de>:
> > > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > > Last time I did a comparison between SVN head and the git conversion
> > > > tip they matched exactly.  This time I have mismatches in the following
> > > > files.
> > >
> > > So what are the diffs? Are we talking about small differences (like one
> > > change missing) or large-scale mismatches?
> >
> > Large-scale, I'm afraid.  The context diff is about a GLOC.
>
> I don't see how that's possible. Most of those files are tiny, or
> change very rarely, so I don't see how that large a diff can happen.
>
> Take zlib/configure.ac and zlib/configure, there's only been one
> change in the past 18 months: https://gcc.gnu.org/r261739
> That change didn't touch the other files in the list.
>
> libtool.m4 has one change in the past 2 years (just a few days ago):
> https://gcc.gnu.org/r262451
> That was also tiny, and didn't touch the other files.
>
> maintainer-scripts/crontab only has one change in the past 6 months:
> https://gcc.gnu.org/r259637
> That was a tiny change, and didn't touch any other files.
>
> None of those were merges from any other branch.

libtool.m4
ltmain.sh

Changed by https://gcc.gnu.org/r262451

libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp

Changed by https://gcc.gnu.org/r257809 https://gcc.gnu.org/r259462
https://gcc.gnu.org/r259487 https://gcc.gnu.org/r259837
https://gcc.gnu.org/r259838 (but mostly one line changes).

lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c

Changed by https://gcc.gnu.org/r259462 and https://gcc.gnu.org/r260960

MAINTAINERS

This file sees a air bit of churn, but all one line changes.
https://gcc.gnu.org/viewcvs/gcc/trunk/MAINTAINERS?view=log

maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release

Changed by https://gcc.gnu.org/r257045 and https://gcc.gnu.org/r259637
and https://gcc.gnu.org/r259881

Makefile.def
Makefile.in
Makefile.tpl

Changed by https://gcc.gnu.org/r261717 (which didn't touch any other
files) but also by some large changes, which might have been merges:
https://gcc.gnu.org/r255195 (large removal of feature)
https://gcc.gnu.org/r259669 https://gcc.gnu.org/r259755
https://gcc.gnu.org/r261304 (another large feature removal)
https://gcc.gnu.org/r262267

zlib/configure
zlib/configure.ac

Changed by https://gcc.gnu.org/r261739

There's no single change that touched all of them. Not even two or
three changes that seem seem to have anything in common, except for
autoconf regeneration, which happens frequently throughout GCC's
history.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-10  8:34       ` Jonathan Wakely
@ 2018-07-10 10:48         ` Eric S. Raymond
  0 siblings, 0 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-10 10:48 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Bernd Schmidt, gcc, fallenpegasus

Jonathan Wakely <jwakely.gcc@gmail.com>:
> On Tue, 10 Jul 2018 at 09:19, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> >
> > On Mon, 9 Jul 2018 at 21:00, Eric S. Raymond <esr@thyrsus.com> wrote:
> > >
> > > Bernd Schmidt <bernds_cb1@t-online.de>:
> > > > On 07/09/2018 09:19 PM, Eric S. Raymond wrote:
> > > > > Last time I did a comparison between SVN head and the git conversion
> > > > > tip they matched exactly.  This time I have mismatches in the following
> > > > > files.
> > > >
> > > > So what are the diffs? Are we talking about small differences (like one
> > > > change missing) or large-scale mismatches?
> > >
> > > Large-scale, I'm afraid.  The context diff is about a GLOC.
> >
> > I don't see how that's possible. Most of those files are tiny, or
> > change very rarely, so I don't see how that large a diff can happen.
> >
> > Take zlib/configure.ac and zlib/configure, there's only been one
> > change in the past 18 months: https://gcc.gnu.org/r261739
> > That change didn't touch the other files in the list.
> >
> > libtool.m4 has one change in the past 2 years (just a few days ago):
> > https://gcc.gnu.org/r262451
> > That was also tiny, and didn't touch the other files.
> >
> > maintainer-scripts/crontab only has one change in the past 6 months:
> > https://gcc.gnu.org/r259637
> > That was a tiny change, and didn't touch any other files.
> >
> > None of those were merges from any other branch.
> 
> libtool.m4
> ltmain.sh
> 
> Changed by https://gcc.gnu.org/r262451
> 
> libvtv/ChangeLog
> libvtv/configure
> libvtv/testsuite/lib/libvtv.exp
> 
> Changed by https://gcc.gnu.org/r257809 https://gcc.gnu.org/r259462
> https://gcc.gnu.org/r259487 https://gcc.gnu.org/r259837
> https://gcc.gnu.org/r259838 (but mostly one line changes).
> 
> lto-plugin/ChangeLog
> lto-plugin/configure
> lto-plugin/lto-plugin.c
> 
> Changed by https://gcc.gnu.org/r259462 and https://gcc.gnu.org/r260960
> 
> MAINTAINERS
> 
> This file sees a air bit of churn, but all one line changes.
> https://gcc.gnu.org/viewcvs/gcc/trunk/MAINTAINERS?view=log
> 
> maintainer-scripts/ChangeLog
> maintainer-scripts/crontab
> maintainer-scripts/gcc_release
> 
> Changed by https://gcc.gnu.org/r257045 and https://gcc.gnu.org/r259637
> and https://gcc.gnu.org/r259881
> 
> Makefile.def
> Makefile.in
> Makefile.tpl
> 
> Changed by https://gcc.gnu.org/r261717 (which didn't touch any other
> files) but also by some large changes, which might have been merges:
> https://gcc.gnu.org/r255195 (large removal of feature)
> https://gcc.gnu.org/r259669 https://gcc.gnu.org/r259755
> https://gcc.gnu.org/r261304 (another large feature removal)
> https://gcc.gnu.org/r262267
> 
> zlib/configure
> zlib/configure.ac
> 
> Changed by https://gcc.gnu.org/r261739
> 
> There's no single change that touched all of them. Not even two or
> three changes that seem seem to have anything in common, except for
> autoconf regeneration, which happens frequently throughout GCC's
> history.

I don't know what's going on either, yet.  I'm trying to idenify the
earliest point of content mismatch now.

Thanks for all this data.  It may help a lot.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 20:20   ` Eric S. Raymond
  2018-07-10  4:57     ` Richard Biener
@ 2018-07-10 11:22     ` Philip Martin
  2018-07-20 21:43     ` Joseph Myers
  2 siblings, 0 replies; 20+ messages in thread
From: Philip Martin @ 2018-07-10 11:22 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: Richard Biener, gcc, fallenpegasus

"Eric S. Raymond" <esr@thyrsus.com> writes:

> I'm saying I see rsync plus local checkout take 10-12 hours.

The rsync is a one-off cost.  Once you have the repository locally you
can checkout any individual revision much more quickly.  I have a local
copy of the gcc repository and a checkout of gcc trunk from localhost
takes about 40 seconds.  I'm not using fancy hardware.  I can even check
it out across my very average WiFi in just over 60 seconds.

-- 
Philip

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-09 20:20   ` Eric S. Raymond
  2018-07-10  4:57     ` Richard Biener
  2018-07-10 11:22     ` Philip Martin
@ 2018-07-20 21:43     ` Joseph Myers
  2018-07-20 23:48       ` Eric S. Raymond
  2 siblings, 1 reply; 20+ messages in thread
From: Joseph Myers @ 2018-07-20 21:43 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: Richard Biener, gcc, fallenpegasus

On Mon, 9 Jul 2018, Eric S. Raymond wrote:

> Richard Biener <richard.guenther@gmail.com>:
> > 12 hours from remote I guess? The subversion repository is available through rsync so you can create a local mirror to work from (we've been doing that at suse for years) 
> 
> I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
> about this and his response was basically "Well...we don't do that often."

Isn't that a local checkout *of top-level of the repository*, i.e. 
checking out all branches and tags?  Which is indeed something developers 
would never normally do - they'd just check out the particular branches 
they're working on.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-10  1:13     ` Alexandre Oliva
@ 2018-07-20 21:48       ` Joseph Myers
  2018-07-21  2:04         ` Eric S. Raymond
  0 siblings, 1 reply; 20+ messages in thread
From: Joseph Myers @ 2018-07-20 21:48 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Eric S. Raymond, Bernd Schmidt, GCC Development, fallenpegasus

On Mon, 9 Jul 2018, Alexandre Oliva wrote:

> On Jul  9, 2018, Jeff Law <law@redhat.com> wrote:
> 
> > On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> >> Jeff Law <law@redhat.com>:
> >>> I'm not aware of any such merges, but any that occurred most likely
> >>> happened after mid-April when the trunk was re-opened for development.
> 
> >> I'm pretty certain things were still good at r256000.  I've started that
> >> check running.  Not expecting results in less than twelve hours.
> 
> > r256000 would be roughly Christmas 2017.
> 
> When was the RAID/LVM disk corruption incident?  Could it possibly have
> left any of our svn repo metadata in a corrupted way that confuses
> reposurgeon, and that leads to such huge differences?

That was 14/15 Aug 2017, and all the SVN revision data up to r251080 were 
restored from backup within 24 hours or so.  I found no signs of damage to 
revisions from the 24 hours or so between r251080 and the time of the 
corruption when I examined diffs for all those revisions by hand at that 
time.

(If anyone rsynced corrupted old revisions from the repository during the 
window of corruption, those corrupted old revisions might remain in their 
rsynced repository copy because the restoration preserved file times and 
size, just fixing corrupted contents.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-10  8:20     ` Jonathan Wakely
  2018-07-10  8:34       ` Jonathan Wakely
@ 2018-07-20 22:06       ` Joseph Myers
  1 sibling, 0 replies; 20+ messages in thread
From: Joseph Myers @ 2018-07-20 22:06 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Eric Raymond, Bernd Schmidt, gcc, fallenpegasus

On Tue, 10 Jul 2018, Jonathan Wakely wrote:

> > Large-scale, I'm afraid.  The context diff is about a GLOC.
> 
> I don't see how that's possible. Most of those files are tiny, or
> change very rarely, so I don't see how that large a diff can happen.

Concretely, the *complete GCC source tree* (trunk, that is) is under 1 GB.  
A complete diff generating the whole source tree from nothing would only 
be about 15 MLOC.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-20 21:43     ` Joseph Myers
@ 2018-07-20 23:48       ` Eric S. Raymond
  0 siblings, 0 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-20 23:48 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Richard Biener, gcc, fallenpegasus

Joseph Myers <joseph@codesourcery.com>:
> On Mon, 9 Jul 2018, Eric S. Raymond wrote:
> 
> > Richard Biener <richard.guenther@gmail.com>:
> > > 12 hours from remote I guess? The subversion repository is available through rsync so you can create a local mirror to work from (we've been doing that at suse for years) 
> > 
> > I'm saying I see rsync plus local checkout take 10-12 hours.  I asked Jason
> > about this and his response was basically "Well...we don't do that often."
> 
> Isn't that a local checkout *of top-level of the repository*, i.e. 
> checking out all branches and tags?  Which is indeed something developers 
> would never normally do - they'd just check out the particular branches 
> they're working on.

It is.  I have to check out all tags and branches to validate the conversion.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Repo conversion troubles.
  2018-07-20 21:48       ` Joseph Myers
@ 2018-07-21  2:04         ` Eric S. Raymond
  0 siblings, 0 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-21  2:04 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Alexandre Oliva, Bernd Schmidt, GCC Development, fallenpegasus

Joseph Myers <joseph@codesourcery.com>:
> On Mon, 9 Jul 2018, Alexandre Oliva wrote:
> 
> > On Jul  9, 2018, Jeff Law <law@redhat.com> wrote:
> > 
> > > On 07/09/2018 01:57 PM, Eric S. Raymond wrote:
> > >> Jeff Law <law@redhat.com>:
> > >>> I'm not aware of any such merges, but any that occurred most likely
> > >>> happened after mid-April when the trunk was re-opened for development.
> > 
> > >> I'm pretty certain things were still good at r256000.  I've started that
> > >> check running.  Not expecting results in less than twelve hours.
> > 
> > > r256000 would be roughly Christmas 2017.
> > 
> > When was the RAID/LVM disk corruption incident?  Could it possibly have
> > left any of our svn repo metadata in a corrupted way that confuses
> > reposurgeon, and that leads to such huge differences?
> 
> That was 14/15 Aug 2017, and all the SVN revision data up to r251080 were 
> restored from backup within 24 hours or so.  I found no signs of damage to 
> revisions from the 24 hours or so between r251080 and the time of the 
> corruption when I examined diffs for all those revisions by hand at that 
> time.

Agreed. I don't think that incident is at the root of the problems.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-07-20 23:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 19:19 Repo conversion troubles Eric S. Raymond
2018-07-09 19:40 ` Jeff Law
2018-07-09 19:57   ` Eric S. Raymond
2018-07-09 20:01     ` Jeff Law
2018-07-09 20:06       ` Eric S. Raymond
2018-07-09 19:46 ` Bernd Schmidt
2018-07-09 19:59   ` Eric S. Raymond
2018-07-10  1:13     ` Alexandre Oliva
2018-07-20 21:48       ` Joseph Myers
2018-07-21  2:04         ` Eric S. Raymond
2018-07-10  8:20     ` Jonathan Wakely
2018-07-10  8:34       ` Jonathan Wakely
2018-07-10 10:48         ` Eric S. Raymond
2018-07-20 22:06       ` Joseph Myers
2018-07-09 20:04 ` Richard Biener
2018-07-09 20:20   ` Eric S. Raymond
2018-07-10  4:57     ` Richard Biener
2018-07-10 11:22     ` Philip Martin
2018-07-20 21:43     ` Joseph Myers
2018-07-20 23:48       ` Eric S. Raymond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).