public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: elfutils-devel@sourceware.org,
	Panu Matilainen <pmatilai@laiskiainen.org>
Subject: Re: [PATCH] elfclassify tool
Date: Sun, 11 Aug 2019 23:38:00 -0000	[thread overview]
Message-ID: <a2dd6b0becb9c886f03b66a067c0d25417e4ec16.camel@klomp.org> (raw)
In-Reply-To: <20190729142419.GB2881@wildebeest.org>

[-- Attachment #1: Type: text/plain, Size: 3401 bytes --]

On Mon, 2019-07-29 at 16:24 +0200, Mark Wielaard wrote:
> On Mon, Jul 29, 2019 at 10:43:56AM +0200, Florian Weimer wrote:
> > * Mark Wielaard:
> > 
> > > +  if (elf == NULL)
> > > +    {
> > > +      /* This likely means it just isn't an ELF file, probably not a
> > > +	 real issue, but warn if verbose reporting.  */
> > > +      if (verbose > 0)
> > > +	fprintf (stderr, "warning: %s: %s\n", current_path, elf_errmsg (-1));
> > > +      return false;
> > > +    }
> > 
> > Is it possible to distinguish the error from a memory allocation error?
> > It would be wrong to mis-classify a file just because the system is low
> > on memory.
> 
> You are right this is not the proper way to report the issue.
> Normally, when just using elf_begin, a NULL return should be reported
> through elf_issue (which will set the issues flag).
> 
> But, because I added -z, we are using either elf_begin or
> dwelf_elf_begin. dwelf_elf_begin will return NULL (instead of a an
> empty (ELF_K_NONE) Elf descriptor when there is an issue, or the
> (decompressed) file wasn't an ELF file.
> 
> So we should split the error reporting. If we called elf_begin and get
> NULL we should call elf_issue to report the proper issue.
> 
> If we called dwefl_elf_begin and we get NULL, I am not sure yet what
> the proper way is to detect whether it is a real issue, or "just" not
> a (decompressed) ELF file. I am afraid the current handling is the
> best we can do.
> 
> Maybe we can fix dwelf_elf_begin to return an empty (ELF_K_NONE) Elf
> descriptor if there was no issue, but the (decompressed) file wasn't
> an ELF file.

Sorry this took so long. And this is indeed the last issue holding up
the release. But this is a tricky problem.

We made a mistake when we wrote the contract for dwelf_elf_begin by
saying it would never return ELF_K_NONE. That made it different from
elf_begin and made it impossible to distinguish between a real (file or
decompression) error and whether the file simply wasn't an ELF file and
also wasn't a compressed ELF file.

I think we should fix the contract. Technically it would be an API
break, but I think no user is really relying on the fact that the Elf
handle returned is never ELF_K_NONE. Users still need to distinguish
between ELF_K_ELF and ELF_K_AR (and theoretically any other ELF_K_type,
like COFF, which we currently don't support, but we do define it).

So that is what the attached patch does. I also audited all
decompression code to make sure it returns error codes consistently.
The decompression will either decompress successfully and return
DWFL_E_NOERROR, or if the file wasn't compressed (or an embedded image)
it will return DWFL_E_BADELF. In all other cases (file or decompression
error) it will set a a different DWFL_E error.

This "only" leaves the problem that we don't have a good way to
translate those errors into "real" libelf error codes. So for now we
just fake one if it wasn't an elf_errno value. I don't intent to try to
solve this error translation issue before the release (I don't know how
to do it yet).

What do you think about this change to dwelf_elf_begin?
The change would make it possible to detect real errors in the
elfclassify code, whether elf_begin or dwelf_elf_begin was used. So we
would not misclassify files (but return an error status of 2).

Thanks,

Mark

[-- Attachment #2: 0001-libdwelf-Make-dwelf_elf_begin-return-NULL-only-when-.patch --]
[-- Type: text/x-patch, Size: 7796 bytes --]

From 648837a9f1be7628e9ceee6818bf56c80b9d3fa1 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Mon, 12 Aug 2019 00:43:22 +0200
Subject: [PATCH] libdwelf: Make dwelf_elf_begin return NULL only when there is
 an error.

dwelf_elf_begin was slightly different from elf_begin in case the file
turned out to not be an ELF file. elf_begin would return an Elf handle
with ELF_K_NONE. But dwelf_elf_begin would return NULL. This made it
impossible to tell the difference between a file or decompression error
and a (decompressed) file not being an ELF file.

Since dwelf_elf_begin could still return different kinds of ELF files
(ELF_K_ELF or ELF_K_AR - and theoretically ELF_K_COFF) this was not
really useful anyway. So make it so that dwelf_elf_begin always returns
an Elf handle unless there was a real error reading or decompressing
the file. Otherwise return NULL to make clear there was a real error.

Make sure that the decompression function returns DWFL_E_BADELF only
when the file isn't compressed. In which case the Elf handle won't
be replaced and can be returned (as ELF_K_NONE).

Add a new version to dwelf_elf_begin so programs can rely on it
returning NULL only for real errors.

Signed-off-by: Mark Wielaard <mark@klomp.org>
---
 libdw/ChangeLog            |  4 ++++
 libdw/libdw.map            |  4 ++++
 libdwelf/ChangeLog         |  6 ++++++
 libdwelf/dwelf_elf_begin.c | 12 +++++++-----
 libdwelf/libdwelf.h        |  9 ++++++---
 libdwfl/ChangeLog          |  8 ++++++++
 libdwfl/gzip.c             |  5 +++--
 libdwfl/open.c             | 10 +++++++---
 8 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 6b779e77..bf1f4857 100644
--- a/libdw/ChangeLog
+++ b/libdw/ChangeLog
@@ -1,3 +1,7 @@
+2019-08-12  Mark Wielaard  <mark@klomp.org>
+
+	* libdw.map (ELFUTILS_0.177): Add new version of dwelf_elf_begin.
+
 2019-06-28  Mark Wielaard  <mark@klomp.org>
 
 	* libdw.map (ELFUTILS_0.177): New section. Add
diff --git a/libdw/libdw.map b/libdw/libdw.map
index 2e1c0e9e..decac05c 100644
--- a/libdw/libdw.map
+++ b/libdw/libdw.map
@@ -365,4 +365,8 @@ ELFUTILS_0.175 {
 ELFUTILS_0.177 {
   global:
     dwelf_elf_e_machine_string;
+    # Replaced ELFUTILS_0.175 versions.  Both versions point to the
+    # same implementation, but users of the new symbol version can
+    # presume that NULL is only returned on error (otherwise ELF_K_NONE).
+    dwelf_elf_begin;
 } ELFUTILS_0.175;
diff --git a/libdwelf/ChangeLog b/libdwelf/ChangeLog
index 29f9a509..5b48ed8f 100644
--- a/libdwelf/ChangeLog
+++ b/libdwelf/ChangeLog
@@ -1,3 +1,9 @@
+2019-08-12  Mark Wielaard  <mark@klomp.org>
+
+	* libdwelf.h (dwelf_elf_begin): Update documentation.
+	* dwelf_elf_begin.c (dwelf_elf_begin): Don't suppress ELF_K_NONE.
+	Mark old and new version.
+
 2019-06-28  Mark Wielaard  <mark@klomp.org>
 
 	* Makefile.am (libdwelf_a_SOURCES): Add dwelf_elf_e_machine_string.c.
diff --git a/libdwelf/dwelf_elf_begin.c b/libdwelf/dwelf_elf_begin.c
index 79825338..c7d63a1c 100644
--- a/libdwelf/dwelf_elf_begin.c
+++ b/libdwelf/dwelf_elf_begin.c
@@ -41,13 +41,13 @@ dwelf_elf_begin (int fd)
 {
   Elf *elf = NULL;
   Dwfl_Error e = __libdw_open_elf (fd, &elf);
-  if (elf != NULL && elf_kind (elf) != ELF_K_NONE)
+  if (e == DWFL_E_NOERROR)
     return elf;
 
-  /* Elf wasn't usable.  Make sure there is a proper elf error message.  */
-
-  if (elf != NULL)
-    elf_end (elf);
+  /* Elf wasn't usable.  Make sure there is a proper elf error
+     message.  This is probably not the real error, because there is
+     no good way to propagate errnos or decompression errors, but
+     better than nothing.  */
 
   if (e != DWFL_E_LIBELF)
     {
@@ -60,3 +60,5 @@ dwelf_elf_begin (int fd)
 
   return NULL;
 }
+OLD_VERSION (dwelf_elf_begin, ELFUTILS_0.175)
+NEW_VERSION (dwelf_elf_begin, ELFUTILS_0.177)
diff --git a/libdwelf/libdwelf.h b/libdwelf/libdwelf.h
index cb7ea091..dbb8f08c 100644
--- a/libdwelf/libdwelf.h
+++ b/libdwelf/libdwelf.h
@@ -128,9 +128,12 @@ extern void dwelf_strtab_free (Dwelf_Strtab *st)
 /* Creates a read-only Elf handle from the given file handle.  The
    file may be compressed and/or contain a linux kernel image header,
    in which case it is eagerly decompressed in full and the Elf handle
-   is created as if created with elf_memory ().  On error NULL is
-   returned.  The Elf handle should be closed with elf_end ().  The
-   file handle will not be closed.  Does not return ELF_K_NONE handles.  */
+   is created as if created with elf_memory ().  On decompression or
+   file errors NULL is returned (and elf_errno will be set).  If there
+   was no error, but the file is not an ELF file, then an ELF_K_NONE
+   Elf handle is returned (just like with elf_begin).  The Elf handle
+   should be closed with elf_end ().  The file handle will not be
+   closed.  */
 extern Elf *dwelf_elf_begin (int fd);
 
 /* Returns a human readable string for the given ELF header e_machine
diff --git a/libdwfl/ChangeLog b/libdwfl/ChangeLog
index 8cbe90c9..04a39637 100644
--- a/libdwfl/ChangeLog
+++ b/libdwfl/ChangeLog
@@ -1,3 +1,11 @@
+2019-08-12  Mark Wielaard  <mark@klomp.org>
+
+	* gzip.c (open_stream): Return DWFL_E_ERRNO on bad file operation.
+	* open.c (libdw_open_elf): New argument bad_elf_ok. Check it and
+	return DWFL_E_NOERROR in case it is set and error was DWFL_E_BADELF.
+	(__libdw_open_file): Call libdw_open_elf with bad_elf_ok false.
+	(__libdw_open_elf): Call libdw_open_elf with bad_elf_ok true.
+
 2019-08-05  Omar Sandoval  <osandov@fb.com>
 
 	* dwfl_segment_report_module.c (dwfl_segment_report_module): Assign
diff --git a/libdwfl/gzip.c b/libdwfl/gzip.c
index c2c13baf..043d0b6e 100644
--- a/libdwfl/gzip.c
+++ b/libdwfl/gzip.c
@@ -139,14 +139,14 @@ open_stream (int fd, off_t start_offset, struct unzip_state *state)
 {
     int d = dup (fd);
     if (unlikely (d < 0))
-      return DWFL_E_BADELF;
+      return DWFL_E_ERRNO;
     if (start_offset != 0)
       {
 	off_t off = lseek (d, start_offset, SEEK_SET);
 	if (off != start_offset)
 	  {
 	    close (d);
-	    return DWFL_E_BADELF;
+	    return DWFL_E_ERRNO;
 	  }
       }
     state->zf = gzdopen (d, "r");
@@ -288,6 +288,7 @@ unzip (int fd, off_t start_offset,
   if (result == DWFL_E_NOERROR && gzdirect (state.zf))
     {
       gzclose (state.zf);
+      /* Not a compressed stream after all.  */
       return fail (&state, DWFL_E_BADELF);
     }
 
diff --git a/libdwfl/open.c b/libdwfl/open.c
index 74367359..35fc5283 100644
--- a/libdwfl/open.c
+++ b/libdwfl/open.c
@@ -118,7 +118,7 @@ what_kind (int fd, Elf **elfp, Elf_Kind *kind, bool *may_close_fd)
 
 static Dwfl_Error
 libdw_open_elf (int *fdp, Elf **elfp, bool close_on_fail, bool archive_ok,
-		bool never_close_fd)
+		bool never_close_fd, bool bad_elf_ok)
 {
   bool may_close_fd = false;
 
@@ -164,6 +164,10 @@ libdw_open_elf (int *fdp, Elf **elfp, bool close_on_fail, bool archive_ok,
       && !(archive_ok && kind == ELF_K_AR))
     error = DWFL_E_BADELF;
 
+  /* This basically means, we keep a ELF_K_NONE Elf handle and return it.  */
+  if (bad_elf_ok && error == DWFL_E_BADELF)
+    error = DWFL_E_NOERROR;
+
   if (error != DWFL_E_NOERROR)
     {
       elf_end (elf);
@@ -184,11 +188,11 @@ libdw_open_elf (int *fdp, Elf **elfp, bool close_on_fail, bool archive_ok,
 Dwfl_Error internal_function
 __libdw_open_file (int *fdp, Elf **elfp, bool close_on_fail, bool archive_ok)
 {
-  return libdw_open_elf (fdp, elfp, close_on_fail, archive_ok, false);
+  return libdw_open_elf (fdp, elfp, close_on_fail, archive_ok, false, false);
 }
 
 Dwfl_Error internal_function
 __libdw_open_elf (int fd, Elf **elfp)
 {
-  return libdw_open_elf (&fd, elfp, false, true, true);
+  return libdw_open_elf (&fd, elfp, false, true, true, true);
 }
-- 
2.18.1


  reply	other threads:[~2019-08-11 23:38 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 15:38 Florian Weimer
2019-04-15 15:39 ` Mark Wielaard
2019-04-16 11:38   ` Florian Weimer
2019-04-18 11:17     ` Florian Weimer
2019-07-19 12:47       ` Mark Wielaard
2019-07-19 13:43         ` Dmitry V. Levin
2019-07-19 14:21           ` Mark
2019-07-19 18:35             ` Dmitry V. Levin
2019-07-19 21:00               ` Florian Weimer
2019-07-19 21:23                 ` Dmitry V. Levin
2019-07-19 21:36                   ` Mark Wielaard
2019-07-19 22:57                     ` Dmitry V. Levin
2019-07-20 21:51                       ` Mark Wielaard
2019-07-25 22:39                         ` [PATCH] elfclassify: Add --library classification Mark Wielaard
2019-07-26 22:53                           ` Dmitry V. Levin
2019-07-26 23:04                         ` [PATCH] elfclassify tool Dmitry V. Levin
2019-07-27 11:54                           ` Mark Wielaard
2019-07-20 21:40         ` Mark Wielaard
2019-07-22 15:55         ` Florian Weimer
2019-07-26 22:11           ` Mark Wielaard
2019-07-29  8:44             ` Florian Weimer
2019-07-29 14:24               ` Mark Wielaard
2019-08-11 23:38                 ` Mark Wielaard [this message]
2019-08-12  8:14                   ` Florian Weimer
2019-08-12 15:18                     ` Mark Wielaard
2019-07-29  9:16             ` Florian Weimer
2019-07-29 14:34               ` Mark Wielaard
2019-07-29 14:38                 ` Florian Weimer
2019-08-13  9:44                   ` Mark Wielaard
2019-08-13 11:42                     ` Mark Wielaard
2019-08-14 20:46                       ` [PATCH] config/elfutils.spec.in: package eu-elfclassify Dmitry V. Levin
2019-08-15  7:33                         ` Mark Wielaard
2019-07-29  9:22             ` [PATCH] elfclassify tool Florian Weimer
2019-07-29 14:40               ` Mark Wielaard
2019-07-29 14:42                 ` Florian Weimer
2019-07-19 13:24     ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2dd6b0becb9c886f03b66a067c0d25417e4ec16.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=elfutils-devel@sourceware.org \
    --cc=fweimer@redhat.com \
    --cc=pmatilai@laiskiainen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).