public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: esr@thyrsus.com (Eric S. Raymond)
To: GCC Development <gcc@gcc.gnu.org>, fallenpegasus@gmail.com
Subject: Repo conversion troubles.
Date: Mon, 09 Jul 2018 19:19:00 -0000	[thread overview]
Message-ID: <20180709191911.648443A4AA7@snark.thyrsus.com> (raw)

Last time I did a comparison between SVN head and the git conversion
tip they matched exactly.  This time I have mismatches in the following
files.

libtool.m4
libvtv/ChangeLog
libvtv/configure
libvtv/testsuite/lib/libvtv.exp
ltmain.sh
lto-plugin/ChangeLog
lto-plugin/configure
lto-plugin/lto-plugin.c
MAINTAINERS
maintainer-scripts/ChangeLog
maintainer-scripts/crontab
maintainer-scripts/gcc_release
Makefile.def
Makefile.in
Makefile.tpl
zlib/configure
zlib/configure.ac

Now I'll explain what this means and why it's a serious problem.

Reposurgeon is never confused by linear history, branching, or
tagging; I have lots of regression tests for those cases.  When it
screws up it is invariably around branch copy operations, because
there are cases near those where the data model of Subversion stream
files is underspecified. That model was in fact entirely undocumented
before I reverse-engineered it and wrote the description that now
lives in the Subversion source tree.  But that description is not
complete; nobody, not even Subversion's designers, knows how to fill
in all the corner cases.

Thus, a content mismatch like this means there was some recent branch
merge to trunk in the gcc history that reposurgeon is not interpreting
as intended, or more likely an operator error such as a non-Subversion
directory copy followed by a commit - my analyzer can recover from
most such cases but not all.

There are brute-force ways to pin down such malformations, but none of
them are practical at the huge scale of this repository.  The main
problem here wouldn't reposurgeon itself but the fact that Subversion
checkouts on a repo this large are very slow. I've seen a single one
take 12 hours; an attempt at a whole bisection run to pin down the
divergence point on trunk would therefore probably cost log2 of the
commit length times that, or about 18 days.

So...does that list of changed files look familar to anyone?  If we can
identify the revision number of the bad commit, the odds of being able
to unscramble this mess go way up.  They still aren't good, not when
merely loading the repository for examination takes over four hours,
but they would way better than if I were starting from zero.

This is serious. I have preduced demonstrably correct history
conversions of the gcc repo in the past.  We may now be in a situation
where I will never again be able to do that.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The real point of audits is to instill fear, not to extract revenue;
the IRS aims at winning through intimidation and (thereby) getting
maximum voluntary compliance
	-- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980

             reply	other threads:[~2018-07-09 19:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-09 19:19 Eric S. Raymond [this message]
2018-07-09 19:40 ` Jeff Law
2018-07-09 19:57   ` Eric S. Raymond
2018-07-09 20:01     ` Jeff Law
2018-07-09 20:06       ` Eric S. Raymond
2018-07-09 19:46 ` Bernd Schmidt
2018-07-09 19:59   ` Eric S. Raymond
2018-07-10  1:13     ` Alexandre Oliva
2018-07-20 21:48       ` Joseph Myers
2018-07-21  2:04         ` Eric S. Raymond
2018-07-10  8:20     ` Jonathan Wakely
2018-07-10  8:34       ` Jonathan Wakely
2018-07-10 10:48         ` Eric S. Raymond
2018-07-20 22:06       ` Joseph Myers
2018-07-09 20:04 ` Richard Biener
2018-07-09 20:20   ` Eric S. Raymond
2018-07-10  4:57     ` Richard Biener
2018-07-10 11:22     ` Philip Martin
2018-07-20 21:43     ` Joseph Myers
2018-07-20 23:48       ` Eric S. Raymond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180709191911.648443A4AA7@snark.thyrsus.com \
    --to=esr@thyrsus.com \
    --cc=fallenpegasus@gmail.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).