From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id C81E93841880 for ; Sat, 11 Jun 2022 19:58:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C81E93841880 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x09.wildebeest.org [172.31.17.139]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 81595302BBEB; Sat, 11 Jun 2022 21:58:33 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id F25602E83DD1; Sat, 11 Jun 2022 21:58:32 +0200 (CEST) Date: Sat, 11 Jun 2022 21:58:32 +0200 From: Mark Wielaard To: devel@buildbot.net Cc: buildbot@sourceware.org Subject: Re: [PATCH] docker: call docker_client.close() to prevent connection leaks Message-ID: References: <20220604214235.650978-1-mark@klomp.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220604214235.650978-1-mark@klomp.org> X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: buildbot@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "The https://builder.sourceware.org/ buildbot" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jun 2022 19:58:37 -0000 Hi, On Sat, Jun 04, 2022 at 11:42:35PM +0200, Mark Wielaard wrote: > In DockerLatentWorker both _thd_start_instance and _thd_stop_instance > use a new docker_client instance that is never cleaned up. This can > cause (ssh) connections to the docker socket to linger for a long > time. Explicitly call docker_client.close() when done with the client. > > Also add a mock close def to test/fake/docker.py. > --- It looks like the devel@buildbot.net mailinglist never got the patch. But someone on irc was nice enough to forward it so that it could be integrated upstream: https://github.com/buildbot/buildbot/pull/6538 It has been merged, so this should be in the next buildbot release when we upgrade. Cheers, Mark > Patch can also be found at: > https://code.wildebeest.org/git/user/mjw/buildbot/commit/?h=docker-py-close > > We are running a buildbot instance at https://builder.sourceware.org/ > that also has a couple of DockerLatentWorkers. These workers use > docker_host connection strings like "ssh://builder@bb.wildebeest.org:2021" > > However after a couple of weeks we found hundreds of lingering network > connections to the container host machines. This is caused by the > DockerLatentWorker creating a new connection each time a container is > initialized or stopped, but never closing the client connection. > > This patch makes sure the docker client is always closed and we are > not seeing any lingering network connections anymore. > > Our full configuration can be found in this git repository: > https://sourceware.org/git/builder.git > > master/buildbot/test/fake/docker.py | 4 ++++ > master/buildbot/worker/docker.py | 5 +++++ > 2 files changed, 9 insertions(+) > > diff --git a/master/buildbot/test/fake/docker.py b/master/buildbot/test/fake/docker.py > index 46aab1cc5..a46920d93 100644 > --- a/master/buildbot/test/fake/docker.py > +++ b/master/buildbot/test/fake/docker.py > @@ -115,6 +115,10 @@ class Client: > def remove_container(self, id, **kwargs): > del self._containers[id] > > + def close(self): > + # dummy close, no connection to cleanup > + pass > + > > class APIClient(Client): > pass > diff --git a/master/buildbot/worker/docker.py b/master/buildbot/worker/docker.py > index 49a08843e..c2ef707d6 100644 > --- a/master/buildbot/worker/docker.py > +++ b/master/buildbot/worker/docker.py > @@ -299,6 +299,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin, > if not self._image_exists(docker_client, image): > msg = f'Image "{image}" not found on docker host.' > log.msg(msg) > + docker_client.close() > raise LatentWorkerCannotSubstantiate(msg) > > volumes, binds = self._thd_parse_volumes(volumes) > @@ -319,6 +320,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin, > > if instance.get('Id') is None: > log.msg('Failed to create the container') > + docker_client.close() > raise LatentWorkerFailedToSubstantiate( > 'Failed to start container' > ) > @@ -331,6 +333,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin, > try: > docker_client.start(instance) > except docker.errors.APIError as e: > + docker_client.close() > # The following was noticed in certain usage of Docker on Windows > if 'The container operating system does not match the host operating system' in str(e): > msg = f'Image used for build is wrong: {str(e)}' > @@ -346,6 +349,7 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin, > if self.conn: > break > del logs > + docker_client.close() > return [instance['Id'], image] > > def stop_instance(self, fast=False): > @@ -374,3 +378,4 @@ class DockerLatentWorker(CompatibleLatentWorkerMixin, > docker_client.remove_image(image=instance['image']) > except docker.errors.APIError as e: > log.msg('Error while removing the image: %s', e) > + docker_client.close() > -- > 2.30.2 >