From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 67890 invoked by alias); 19 Jul 2019 13:24:47 -0000 Mailing-List: contact elfutils-devel-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: elfutils-devel-owner@sourceware.org Received: (qmail 67749 invoked by uid 89); 19 Jul 2019 13:24:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Checked: by ClamAV 0.100.3 on sourceware.org X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=settled, Processing, fragile X-Spam-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on sourceware.org X-Spam-Level: X-HELO: gnu.wildebeest.org Received: from wildebeest.demon.nl (HELO gnu.wildebeest.org) (212.238.236.112) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 19 Jul 2019 13:24:20 +0000 Received: from tarox.wildebeest.org (tarox.wildebeest.org [172.31.17.39]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 803CE300073F; Fri, 19 Jul 2019 15:24:14 +0200 (CEST) Received: by tarox.wildebeest.org (Postfix, from userid 1000) id 13E644000AA8; Fri, 19 Jul 2019 15:24:14 +0200 (CEST) Message-ID: <0212a4c7f14306ddda93ac6f689ece4a458560f3.camel@klomp.org> Subject: Re: [PATCH] elfclassify tool From: Mark Wielaard To: Florian Weimer Cc: elfutils-devel@sourceware.org, Panu Matilainen Date: Fri, 19 Jul 2019 13:24:00 -0000 In-Reply-To: <871s22yybt.fsf@oldenburg2.str.redhat.com> References: <87k1fz8c9q.fsf@oldenburg2.str.redhat.com> <2e6a27c552ae5e365db54ca6b432c77c9ad5b041.camel@klomp.org> <871s22yybt.fsf@oldenburg2.str.redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.28.5 (3.28.5-2.el7) Mime-Version: 1.0 X-Spam-Flag: NO X-IsSubscribed: yes X-SW-Source: 2019-q3/txt/msg00063.txt.bz2 Hi, Some answers to this older discussion to explain some of my recent commits suggested for elfclassify. On Tue, 2019-04-16 at 13:38 +0200, Florian Weimer wrote: > * Mark Wielaard: > > --elf PATH return 0 whenever the file can be opened and a minimal ELF > > header can be read (it might not be a completely valid ELF file). Do we > > want or need to do any more verification (e.g. try to get the full ELF > > header, walk through all phdrs and shdrs)? > If we ever need that, I think we should add it as separate options, > detecting both separate debuginfo and regular ELF files. You are probably right that a real verification isn't the task of elfclassify. We already have elflint. But I did make --elf required by default (with as check simply that libelf could open the file without marking it as ELF_K_NONE). The user can disable it again with --not- elf. It felt slightly odd to not have any filter active by default. I did add two "sub" classifications --elf-file and --elf-archive because I believe people often make a distinction between "normal" ELF files and ELF archives (containers of ELF objects). I also added detection of --debug-only ELF file using the heuristic that Frank also came up with for the dbgserver (contains .debug_* sections, but no allocated SHT_PROGBIT sections). > > --unstripped (not yet implemented) would be a classification that > > indicates whether the ELF file can be stripped (further), that is has a > > .symtab (symbol table), .debug_* sections (and possibly any non- > > loadable sections -- "file" only detects the first two). >=20 > Some non-allocated sections are expected in stripped binaries: > .gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this > context. I'm not sure if we should flag any other non-allocated section > in this way. I don't think those sections need special flagging. > > I am not sure --file=3DPATH is a useful option. >=20 > It's useful for determining if the file exists and can be mapped. >=20 > > But maybe we need some way to indicate whether a file is a real file or > > a symlink? But the current implementation returns 0 even for symlinks. > > As do every other option (if the file is a symlink to an ELF file of > > the requested classification). Is this what we want? I would suggest > > that we return 1 for anything that is not a regular file. But that > > would mean that for example eu-elfclassify --executable=3D/proc/$$/exe > > would also return 1 (currently it returns 0, which might be helpful in > > some cases). >=20 > I don't know what RPM needs in this context. I expect that it can > easily filter out non-regular files. My problem with symbolic link > detection is that it is so inconsistent=E2=80=94it generally applies to t= he > final pathname component, and that does not look useful to me. Since --file was available again I reused it to indicate that the final pathname component should be a regular file (not a symlink or special device). I think that is useful in general. But you are right there are other tools to filter out symlinks/special device files. It was useful for quick and dirty testing though. > > --loadable basically checks whether the given ELF file is not an object > > (ET_REL) file, so it will return 0 for either an executable, a shared > > object or core file, but not check whether any other attribute (like > > whether it has program headers and/or loadable segments). Personally I > > would like it if this at least included a check for a PT_LOAD segment. >=20 > Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible? > It is possible to have mostly empty objects, after all. I did add this extra check, because I am a little paranoid about having to deal with totally broken ELF files. My reasoning was basically that without a PT_LOAD there is nothing in memory to load/execute. > > This does not classify kernel modules as loadable objects. > > rpm does contain a check for that, it might make sense to include that > > as a separate classification in elfclassify --kernel-module. > >=20 > > Kernel modules are also special because they can be compressed ELF > > files. Do we want to support that? (It is easy with gelf_elf_begin). > > That could for example be an flag/option like --compressed which can be > > combined with any other classification option? >=20 > How relevant are kernel modules to eu-elfclassify? >=20 > Is path-based detection feasible for kernel modules? Sadly kernel modules often need special treatment that you wouldn't need for "normal" ET_REL files. path-based detection is only partially possible (rpm used to use the extension name, but that was fragile). So I added an option --linux-kernel-module that detects them. I also added a -z option to try to detect ELF files inside compressed images because it was easy and because these days kernel modules are often compressed. > > I think another useful classification would be --debugfile which > > succeeds if the primary function of the given ELF file is being a > > separete debug file (basically .debug, .dwo or dwz .multi file) which > > cannot be linked and loaded on its own >=20 > That is very difficult to detect reliably, unfortunately, and would best > be implemented in lib(g)elf itself because it would be generally useful, > for all kinds of tools which expect to process real ELF files only. I think the heuristic mentioned above for --debug-only works pretty well. I haven't spotted false positives yet. It is hard to do this in libelf because the public interface is mostly locked and libelf treats the segment and section views completely independently. > > But I think you don't want to use > > ARGP_NO_EXIT. That causes standard options like --version and -- > > help to > > not exit (with success). Which is generally what we want. > > We do want to want --version and --help to not return an error > > indicator (this is actually checked by make distcheck). >=20 > I want to exit with status 2 on usage error. I couldn't make that > happen without ARGP_NO_EXIT. I'm open to different suggestions. There is one patch in my tree that does remove the ARGP_NO_EXIT (because it breaks --help and friends). It documents that usage errors produce an status code of 64. Processing errors cause a status of 2. And matching results produce a status exit of 0 or 1. If you really want to produce 2 for usage errors then we could try to check things during ARGP_KEY_FINI. But I don't think it is too bad for usage errors to produce a different status code from process errors. Another change I made is that process errors (either opening/reading files or parsing the ELF structures) exit the elfclassify directly. The status is recorded and the next file is processed first. Only once all inputs have been classified is the exit status set. > > That is odd, I assumed !S_ISREG would by true for symlinks. >=20 > No, the open followed the symbolic link. This is needed for rejecting > directories. I've added a comment. I finally settled on treating directories special (and immediately "reject" them). I think this matches mostly how you treated them originally. > > > + if (verbose) > > > + { > > > + fprintf (stderr, "info: ELF type: %d\n", elf_type); > > > + if (has_program_interpreter) > > > + fputs ("info: program interpreter found\n", stderr); > >=20 > > You might want to print the program interpreter here. >=20 > I can't do that without detecting first if the file is separated > debuginfo. (Separated debuginfo has a program header that points > nowhere.) Yes. That is a bit of a pain :{ I didn't try this myself. It does seem too tricky. > > > + if (has_dynamic) > > > + fputs ("info: dynamic segment found\n", stderr); > > > + if (has_soname) > > > + fputs ("info: soname found\n", stderr); > >=20 > > You might want to print the soname found here. >=20 > This needs access to the dynamic string table. I don't know how easy > this is to implement. I also didn't do this. It shouldn't be that hard. We should probably use the PT headers instead of the sections. But it was indeed more work than I had. > > If you want to only allow one classification at a time you should check > > whether command is already set and call something like: > > argp_error (state, N_("Can only use one classification at a time.")); >=20 > I tried that, but I ran into issues with that. It also breaks --help > with multiple/conflicting flags. >=20 > I'm trying to come up with a different command line syntax anyway. I like the new command line syntax. Especially the --stdin[0] is very powerful! I tried briefly to detect "conflicting" classification options. But it isn't easy to do so consistently. So just let the user do whatever they want, even if that means nothing will ever match all classifications requested. Cheers, Mark