public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* git pack-objects run amok
@ 2012-02-27  6:03 Christopher Faylor
  2012-02-28 19:28 ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: Christopher Faylor @ 2012-02-27  6:03 UTC (permalink / raw)
  To: overseers

The load average on gcc.gnu.org was in the 300-400 range for a while
tonight.  The culprit was apparently due to a lot of these:

git pack-objects --revs --thin --stdout --delta-base-offset

I renamed /usr/local/bin/git to /usr/local/bin/git-saf and killed a
bunch of these to get the load average down so that I could do
something.  Then, eventually, I just killed the rest with 'SIGHUP'.

Looking at the logs, it looks like we had at least two git abusers:

    760 130.161.158.181
    303 195.113.20.142

So I blocked them via iptables.  When i renamed git back we started
to get flooded again from 169.237.4.230 so I blocked them too.

I have no idea what the fallout from this is going to be but I'll
leave that to the morning shift.  I have to be at the airport at
7AM and it's 1AM here so I can't do much more tonight.

cgf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-27  6:03 git pack-objects run amok Christopher Faylor
@ 2012-02-28 19:28 ` Frank Ch. Eigler
  2012-02-28 19:52   ` Jim Meyering
  2012-02-29  4:16   ` Andrew Pinski
  0 siblings, 2 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2012-02-28 19:28 UTC (permalink / raw)
  To: overseers

HI -

> The load average on gcc.gnu.org was in the 300-400 range for a while
> tonight.  The culprit was apparently due to a lot of these [...]
> git pack-objects --revs --thin --stdout --delta-base-offset [...]

I added some /etc/xinetd.d/git-daemon constraints in an attempt to
preclude a reoccurrance, kind of like the anon-cvs limits.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 19:28 ` Frank Ch. Eigler
@ 2012-02-28 19:52   ` Jim Meyering
  2012-02-28 20:00     ` Frank Ch. Eigler
  2012-02-29  4:16   ` Andrew Pinski
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2012-02-28 19:52 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: overseers

Frank Ch. Eigler wrote:
>> The load average on gcc.gnu.org was in the 300-400 range for a while
>> tonight.  The culprit was apparently due to a lot of these [...]
>> git pack-objects --revs --thin --stdout --delta-base-offset [...]
>
> I added some /etc/xinetd.d/git-daemon constraints in an attempt to
> preclude a reoccurrance, kind of like the anon-cvs limits.

Hi Frank,

You can reduce the effect of the requests that provoked that by
running "git gc" on repositories for which "git count-objects"
reports more than a few megabytes.  That is something I do every
month or two on all of the git repositories at savannah.gnu.org.

For example, I see that glibc has 34MB (of 135MB) worth of not-packed objects:

  $ git count-objects
  2483 objects, 34872 kilobytes

After your "git gc" run, that will be 0.
This makes an especially big difference when the repository
is stored on spinning rust: far fewer seeks.  Of course, with
SSDs the seek effect is negligible, but the smaller footprint
helps even there.

Sometimes it's good to run the much more time/resource-consuming repack
operation, too:

I use this bash function:

git-repo-compress()
{
  local d=$1
  du -sh "$d"; start=$(date); /usr/bin/time \
    git --git-dir="$d" repack -afd --window=250 --depth=250
  echo started $start; date; du -sh "$d"
}

To give you an idea, on my local glibc clone, "git gc" reduced
"du -sh .git" from 134M to 92M.  Just to see, I ran the above
git-repo-compress and it shaved off only 3MB resulting in 89MB,
so the git-repo-compress run is not worth it, considering the cost.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 19:52   ` Jim Meyering
@ 2012-02-28 20:00     ` Frank Ch. Eigler
  2012-02-28 20:11       ` Jim Meyering
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2012-02-28 20:00 UTC (permalink / raw)
  To: Jim Meyering; +Cc: overseers

Hi, Jim -

> You can reduce the effect of the requests that provoked that by
> running "git gc" on repositories for which "git count-objects"
> reports more than a few megabytes.  [...]

One problem we've encountered in the past when doing git gc/repack was
corruption of the gcc-svn repository for gcc, so we've shied away from
that.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 20:00     ` Frank Ch. Eigler
@ 2012-02-28 20:11       ` Jim Meyering
  2012-02-28 20:40         ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2012-02-28 20:11 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: overseers

Frank Ch. Eigler wrote:
>> You can reduce the effect of the requests that provoked that by
>> running "git gc" on repositories for which "git count-objects"
>> reports more than a few megabytes.  [...]
>
> One problem we've encountered in the past when doing git gc/repack was
> corruption of the gcc-svn repository for gcc, so we've shied away from
> that.

Oh yeah...  gcc:

    sourceware$ cd /git/gcc.git && git count-objects
    4414 objects, 119956 kilobytes

Being an svn convert and with 120MB of slop, it would
see a dramatic improvement.

I remember hearing about the corruption you had to deal with.
From what I recall, that anecdote was already rather old when you
first mentioned it to me many months (even a year?) ago.
So it might have happened with a version of git from two or
more years ago.  In git dev. terms, that's ancient.

What would convince you that it's no longer a problem?
Successful "git gc", a passed "git fsck" and enumeration and
comparison of all commits on all branches that people care about?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 20:11       ` Jim Meyering
@ 2012-02-28 20:40         ` Frank Ch. Eigler
  2012-03-01 15:15           ` Jim Meyering
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2012-02-28 20:40 UTC (permalink / raw)
  To: Jim Meyering; +Cc: Frank Ch. Eigler, overseers

Hi -

On Tue, Feb 28, 2012 at 09:10:52PM +0100, Jim Meyering wrote:
> [...]
> From what I recall, that anecdote was already rather old when you
> first mentioned it to me many months (even a year?) ago.

Yeah, though it happened more than once.

> [...] What would convince you that it's no longer a problem?
> Successful "git gc", a passed "git fsck" and enumeration and
> comparison of all commits on all branches that people care about?

I'm not familiar enough with git-svn to know whether that would
be enough.  ISTR some failures after a successful git fsck, during
subsequent git-svn ops.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 19:28 ` Frank Ch. Eigler
  2012-02-28 19:52   ` Jim Meyering
@ 2012-02-29  4:16   ` Andrew Pinski
  2012-02-29  4:51     ` Jonathan Larmour
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Pinski @ 2012-02-29  4:16 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: overseers

On Tue, Feb 28, 2012 at 11:28 AM, Frank Ch. Eigler <fche@elastic.org> wrote:
> HI -
>
>> The load average on gcc.gnu.org was in the 300-400 range for a while
>> tonight.  The culprit was apparently due to a lot of these [...]
>> git pack-objects --revs --thin --stdout --delta-base-offset [...]
>
> I added some /etc/xinetd.d/git-daemon constraints in an attempt to
> preclude a reoccurrance, kind of like the anon-cvs limits.

I am getting:
fatal: The remote end hung up unexpectedly

This is because the load average is too high?

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-29  4:16   ` Andrew Pinski
@ 2012-02-29  4:51     ` Jonathan Larmour
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Larmour @ 2012-02-29  4:51 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Frank Ch. Eigler, overseers

On 29/02/12 04:16, Andrew Pinski wrote:
> On Tue, Feb 28, 2012 at 11:28 AM, Frank Ch. Eigler <fche@elastic.org> wrote:
>> HI -
>>
>>> The load average on gcc.gnu.org was in the 300-400 range for a while
>>> tonight.  The culprit was apparently due to a lot of these [...]
>>> git pack-objects --revs --thin --stdout --delta-base-offset [...]
>>
>> I added some /etc/xinetd.d/git-daemon constraints in an attempt to
>> preclude a reoccurrance, kind of like the anon-cvs limits.
> 
> I am getting:
> fatal: The remote end hung up unexpectedly
> 
> This is because the load average is too high?

Frank has set the limit at 10. But right now there are quite a few people
sucking from gcc's svn both directly and via http.

It's a bit dated now (mrtg now has ways of presenting things more
legibly), but did you know about http://sourceware.org/mrtg/summary.html ?
You can check load average there.

It's possible that to be fairer to the git users, more limits need to be
placed on the anonsvn users. Currently the svnserve service is allowed to
have up to 1000 simultaneous connections which seems high to me, bearing
in mind that maintainers will be using ssh.

It might be possible to limit the svn served by httpd as well, using
mod_bw. But I think to do that would need a new virtual host.

Jifl

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-02-28 20:40         ` Frank Ch. Eigler
@ 2012-03-01 15:15           ` Jim Meyering
  2012-03-02 18:47             ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2012-03-01 15:15 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Frank Ch. Eigler, overseers

Frank Ch. Eigler wrote:
> On Tue, Feb 28, 2012 at 09:10:52PM +0100, Jim Meyering wrote:
>> [...]
>> From what I recall, that anecdote was already rather old when you
>> first mentioned it to me many months (even a year?) ago.
>
> Yeah, though it happened more than once.
>
>> [...] What would convince you that it's no longer a problem?
>> Successful "git gc", a passed "git fsck" and enumeration and
>> comparison of all commits on all branches that people care about?
>
> I'm not familiar enough with git-svn to know whether that would
> be enough.  ISTR some failures after a successful git fsck, during
> subsequent git-svn ops.

It's little more than regular git, and everything I read suggests
that git gc should "just work".  There's also a "git svn gc".

I've copied it (1.2GiB) to a system where I can operate
on it efficiently without hosing sourceware.org.
Note that "git gc" did next to nothing, probably because
git svn now does that automatically when needed.

Repacked, it occupies 210MiB less space (940MiB).  I used this command:
git --git-dir=$d repack -afd --window=250 --depth=250

Is there much in gcc.git that is not derivable from /svn/gcc?
I.e., what would be lost if the entire repository were somehow
to explode with no backup?

Oh, and I also ran "git svn gc", which removed an additional 50MB,
leaving a new total size of 889MiB.

I suppose that removing only 20% of the size of such a large
repository is not worth the risk to you?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git pack-objects run amok
  2012-03-01 15:15           ` Jim Meyering
@ 2012-03-02 18:47             ` Frank Ch. Eigler
  0 siblings, 0 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2012-03-02 18:47 UTC (permalink / raw)
  To: Jim Meyering; +Cc: overseers

Hi, Jim -

> [...]
> Repacked, it occupies 210MiB less space (940MiB).  I used this command:
> git --git-dir=$d repack -afd --window=250 --depth=250

OK, thanks for checking.

> Is there much in gcc.git that is not derivable from /svn/gcc?
> I.e., what would be lost if the entire repository were somehow
> to explode with no backup?

I believe some people maintain private development branches in
gcc.git.

> [...]  I suppose that removing only 20% of the size of such a large
> repository is not worth the risk to you?

Not really.  We're not materially short of storage.  (CPU is far more
scarce.)

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-02 18:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-27  6:03 git pack-objects run amok Christopher Faylor
2012-02-28 19:28 ` Frank Ch. Eigler
2012-02-28 19:52   ` Jim Meyering
2012-02-28 20:00     ` Frank Ch. Eigler
2012-02-28 20:11       ` Jim Meyering
2012-02-28 20:40         ` Frank Ch. Eigler
2012-03-01 15:15           ` Jim Meyering
2012-03-02 18:47             ` Frank Ch. Eigler
2012-02-29  4:16   ` Andrew Pinski
2012-02-29  4:51     ` Jonathan Larmour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).