From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 71054 invoked by alias); 18 Nov 2019 14:17:30 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 71040 invoked by uid 89); 18 Nov 2019 14:17:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,SPF_PASS autolearn=ham version=3.3.1 spammy=considerable, MANAGEMENT, flood, pondering X-Spam-Status: No, score=-9.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: gnu.wildebeest.org Received: from wildebeest.demon.nl (HELO gnu.wildebeest.org) (212.238.236.112) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Nov 2019 14:17:27 +0000 Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id C54B3300099F; Mon, 18 Nov 2019 15:17:24 +0100 (CET) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id 77AAA413CEAA; Mon, 18 Nov 2019 15:17:24 +0100 (CET) Message-ID: Subject: Re: patch 2/2 debuginfod server etc. From: Mark Wielaard To: "Frank Ch. Eigler" Cc: elfutils-devel@sourceware.org, amerey@redhat.com Date: Mon, 18 Nov 2019 14:17:00 -0000 In-Reply-To: <20191114122953.GC873@redhat.com> References: <20191028190438.GC14349@redhat.com> <20191028190602.GD14349@redhat.com> <20191028190726.GE14349@redhat.com> <8d0b26865cc18838c24ea57c09f4ee5af713af16.camel@klomp.org> <20191114122953.GC873@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-5.el7) Mime-Version: 1.0 X-Spam-Flag: NO X-IsSubscribed: yes X-SW-Source: 2019-q4/txt/msg00164.txt.bz2 Hi, On Thu, 2019-11-14 at 07:29 -0500, Frank Ch. Eigler wrote: > > > -bin_PROGRAMS =3D debuginfod-find > > > +bin_PROGRAMS =3D debuginfod debuginfod-find > > > +debuginfod_SOURCES =3D debuginfod.cxx > > > +debuginfod_cxx_CXXFLAGS =3D -Wno-unused-functions > >=20 > > Should that be debuginfod_CXXFLAGS? > > Why are there unused functions are there? >=20 > I don't think there are in debuginfod itself. OK, lets get rid of this then. > > > +# g++ 4.8 on rhel7 otherwise errors gthr-default.h / > > > __gthrw2(__gthrw_(__pthread_key_create) undefs > >=20 > > Could you explain this comment a bit more? >=20 > Not sure, these might have been notes from diagnosing > the argp.h bug that messed with function attributes, > or something else. Lets remove it then. It is somewhat confusing. > > [...] > > So you give it directories with executables and directories with debug > > files. They are indexed if they have a build-id. You never provide a > > sources dir, but if something is indexed as a debug file then any > > source file referenced from it can be requested. >=20 > Yes. >=20 > > In theory you only need to provide the executable dirs, from that you > > can normally get the separate .debug files. >=20 > That's if those debug files are installed in the proper system=20 > location. A build tree mid-strippinng is not like that. And > in the case of RPMs, it's not an option at all. >=20 > > Why didn't you do that (only providing executable dirs)? > > And why do you allow sources to be found indirectly? >=20 > Because there's no need to search/index source files. We can already > tell instantly whether they exist (when we parse the dwarf file) via a > stat(2). We index debuginfo/source rpms for their content because > that instant check is not possible. >=20 > > Would it make sense to restrict the sources (prefix) dir? >=20 > I don't see any reason, because ... >=20 > > The questions above are simply because it makes me nervous that some, > > but not all, files that can be requested can indirectly be found > > anywhere. If you have an explicit debug dir, then it might make sense > > to also have an explicit sources dir. >=20 > If the executables we find are trusted, then the source file paths > inside are trustworthy too. (We don't have an explicit debug dir.) > > > In general I think it would be good to think about a selinux security > > context for debuginfod. This is not urgent, but I think we should see > > if we can get someone to do a security audit. >=20 > Will keep it in mind. > > > One concern I have is that if a process is run in a restricted security > > context in which it might not be able to access certain files, but it > > might have access to a local socket to debuginfod, then it can get > > access to files it normally wouldn't.=20 >=20 > Note that the systemd service runs under an unprivileged userid. >=20 > > And if it could somehow write into any directory that debuginfod > > reads then it could construct a .debug file that points to any file > > on the system by adding it to the DWARF debug_lines file list. All > > this might be unlikely because this is all locally running > > processes. But it would be good to have someone who is a bit > > paranoid take a look if we aren't accidentally "leaking" files. >=20 > I see where you're going with it, it's a reasonable concern. For now, > we can consider it covered by the "debuginfod should be given > trustworthy binaries" general rule. This does keep me slightly worried. Even "trustworthy binaries" could be produced by buggy compilers. I really think we should have some way to restrict which files on the file system can be served up. IMHO the default should be that no files outside directories explicit given to debuginfod should be allowed to be provided to the client. So it makes sense providing any extra source dirs, so you can check that any references to source files are actually correct/intended. > > > +.B "\-I REGEX" "\-\-include=3DREGEX" "\-X REGEX" "\-\-exclude=3DR= EGEX" > > > +Govern the inclusion and exclusion of file names under the search > > > +paths. [...] > >=20 > > Could/Should this default to -I/all/paths/given/* so only files from > > under those paths are included and no files from outside the search > > paths? >=20 > fullpath(3) canonicalization (and in a later patch, symlink following > mode) would mess that up. I know you're probably thinking about the > evil-source-file-escape problem, but I wouldn't handle that with an > index-time constraint like this. I envisioned using this more for > optimization purposes, if for example one wants to index only .x86_64. > rpms in a tree that has other architectures. Aha, yes, this is at a different level/time than what I was thinking of. > > > +and means that only one initial scan should performed. The default > > > +rescan time is 300 seconds. Receiving a SIGUSR1 signal triggers a n= ew > > > +scan, independent of the rescan time (including if it was zero). > >=20 > > 300 seconds seem somewhat arbitrary. But I don't have a better number > > :) >=20 > I was thinkinng it's about right for a developer's live build tree. OK, but those aren't included by default. It isn't that I think it is a bad default, it just seems a bit short if the default is just for installed binaries/debuginfo, which will not change that often. > > > +.TP > > > +.B "\-G" > > > +Run an extraordinary maximal-grooming pass at debuginfod startup. > > > +This pass can take considerable time, because it tries to remove any > > > +debuginfo-unrelated content from the RPM-related parts of the index. > > > +It should not be run if any recent RPM-related indexing operations > > > +were aborted early. It can take considerable space, because it > > > +finishes up with an sqlite "vacuum" operation, which repacks the > > > +database file by triplicating it temporarily. The default is not to > > > +do maximal-grooming. See also the \fIDATA MANAGEMENT\fP section. > >=20 > > So basically never, ever use this? :) > > Can you add a hint when you should use it? >=20 > See also the DATA MANAGEMENT section. :-) Which says: There is also an optional one-shot \fImaximal grooming\fP pass is available. It removes information debuginfo-unrelated data from the RPM content index, but it is slow and temporarily requires up to twice the database size as free space. Worse: it may result in missing source-code info if the RPM traversals were interrupted. Use it rarely to polish a complete index. Which suggest to use it to polish by removing debuginfo-unrelated data. I am still not sure what that unrelated data is or when I really want to do this. It doesn't really list any quantifiable benefits. > > > [...] Relative path names commonly appear in the DWARF > > > +file's source directory, but these paths are relative to > > > +individual compilation unit AT_comp_dir paths, and yet an executable > > > +is made up of multiple CUs. Therefore, to disambiguate, debuginfod > > > +expects source queries to prefix relative path names with the CU > > > +compilation-directory. > >=20 > > Note a subtlety with DWARF4 vs DWARF5. In both cases debug line index > > zero refers to the compile unit directory. But in DWARF5 that directory > > should actually be in the line table (so you don't have to look it up > > through the associated CU in the debug_info DIE tree). >=20 > Right, but do you think this subtlety should be elaborated to > debuginfod clients or administrators? As far as a debugger tool is > concerned, the compilation directory is the compilation directory, no > matter which line table and which slot it is in. Can you suggest a > wording tweak? I think I would simply say something like "Relative path should always be prefixed with the (absolute) directory table entry or compile unit directory they are relative to." And then maybe some advise on how to combine them when the directory path ends with a slash (see also below). > > I agree this is needed because it is how the data/paths are represented > > in DWARF. But it does concern me there is no /../ processing done to > > make sure something isn't requested outside the source paths. >=20 > I love your security mindset. In this case, the /../ stuff doesn't > matter because the resulting string is only used as search key in the > database, not used verbatim against the filesystem. Yes, thanks. I am confusing various paths in my head. Still pondering whether we would need/want to canonicalize everything. We kind of depend on any client just passing through what is the DW_AT_comp_dir and line table dir/sources lists unaltered. The documentation currently explicitly says: Note: you should not elide \fB../\fP or \fB/./\fP sorts of relative path components in the directory names, because if this is how those names appear in the DWARF files, that is what debuginfod needs to see too. That is good, but maybe give guidance on how to combine a dir and (relative) path. e.g. should a / always be added in between, even when the dir already ends with a /? > > > +You should ensure that ample disk space remains available. (The flo= od > > > +of error messages on -ENOSPC is ugly and nagging. But, like for most > > > +other errors, debuginfod will resume when resources permit.) If > > > +necessary, debuginfod can be stopped, the database file moved or > > > +removed, and debuginfod restarted. > >=20 > > Do we need and/or can we add some option to limit the disk space used? >=20 > We can't really limit it. The groom cycle messages include a > database-size report field, so ... I suppose we could quit when the > database gets large enough, but that wouldn't help anyone. Or could > stop indexing early ... but really the Proper solution is to rely on a > size-quota'd cachedir which is harmless if it hits ENOSPC, if it is > brought to the sysadmin's attention by our errors and by system > monitoring tools. Yeah. OK. I guess it isn't different from any other cache. > > Same question as with the client. Does it use the standard trust > > database for certificate verification with HTTPS? >=20 > The text says that debuginfod does not provide https service at all, > so this is not relevant. A front-end https reverse-proxy would have > to deal with this stuff. I plan to cover this in a blog post later > on, and probably in the form of software baked into a container image. But debuginfod might use HTTPS services itself for fallback. I think it is important to describe how/if those https ssl/tls connections are authenticated. Cheers, Mark