public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Repo conversion troubles.
@ 2018-07-09 19:19 Eric S. Raymond
  2018-07-09 19:40 ` Jeff Law
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Eric S. Raymond @ 2018-07-09 19:19 UTC (permalink / raw)
  To: GCC Development, fallenpegasus

Last time I did a comparison between SVN head and the git conversion
tip they matched exactly.  This time I have mismatches in the following
files.

libtool.m4
libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp
ltmain.sh
lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c
MAINTAINERS
maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release
Makefile.def
Makefile.in
Makefile.tpl
zlib/configure
zlib/configure.ac

Now I'll explain what this means and why it's a serious problem.

Reposurgeon is never confused by linear history, branching, or
tagging; I have lots of regression tests for those cases.  When it
screws up it is invariably around branch copy operations, because
there are cases near those where the data model of Subversion stream
files is underspecified. That model was in fact entirely undocumented
before I reverse-engineered it and wrote the description that now
lives in the Subversion source tree.  But that description is not
complete; nobody, not even Subversion's designers, knows how to fill
in all the corner cases.

Thus, a content mismatch like this means there was some recent branch
merge to trunk in the gcc history that reposurgeon is not interpreting
as intended, or more likely an operator error such as a non-Subversion
directory copy followed by a commit - my analyzer can recover from
most such cases but not all.

There are brute-force ways to pin down such malformations, but none of
them are practical at the huge scale of this repository.  The main
problem here wouldn't reposurgeon itself but the fact that Subversion
checkouts on a repo this large are very slow. I've seen a single one
take 12 hours; an attempt at a whole bisection run to pin down the
divergence point on trunk would therefore probably cost log2 of the
commit length times that, or about 18 days.

So...does that list of changed files look familar to anyone?  If we can
identify the revision number of the bad commit, the odds of being able
to unscramble this mess go way up.  They still aren't good, not when
merely loading the repository for examination takes over four hours,
but they would way better than if I were starting from zero.

This is serious. I have preduced demonstrably correct history
conversions of the gcc repo in the past.  We may now be in a situation
where I will never again be able to do that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The real point of audits is to instill fear, not to extract revenue;
the IRS aims at winning through intimidation and (thereby) getting
maximum voluntary compliance
	-- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-07-20 23:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 19:19 Repo conversion troubles Eric S. Raymond
2018-07-09 19:40 ` Jeff Law
2018-07-09 19:57   ` Eric S. Raymond
2018-07-09 20:01     ` Jeff Law
2018-07-09 20:06       ` Eric S. Raymond
2018-07-09 19:46 ` Bernd Schmidt
2018-07-09 19:59   ` Eric S. Raymond
2018-07-10  1:13     ` Alexandre Oliva
2018-07-20 21:48       ` Joseph Myers
2018-07-21  2:04         ` Eric S. Raymond
2018-07-10  8:20     ` Jonathan Wakely
2018-07-10  8:34       ` Jonathan Wakely
2018-07-10 10:48         ` Eric S. Raymond
2018-07-20 22:06       ` Joseph Myers
2018-07-09 20:04 ` Richard Biener
2018-07-09 20:20   ` Eric S. Raymond
2018-07-10  4:57     ` Richard Biener
2018-07-10 11:22     ` Philip Martin
2018-07-20 21:43     ` Joseph Myers
2018-07-20 23:48       ` Eric S. Raymond

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).