On Mon, 2019-07-29 at 16:24 +0200, Mark Wielaard wrote:
> On Mon, Jul 29, 2019 at 10:43:56AM +0200, Florian Weimer wrote:
> > * Mark Wielaard:
> > 
> > > +  if (elf == NULL)
> > > +    {
> > > +      /* This likely means it just isn't an ELF file, probably not a
> > > +	 real issue, but warn if verbose reporting.  */
> > > +      if (verbose > 0)
> > > +	fprintf (stderr, "warning: %s: %s\n", current_path, elf_errmsg (-1));
> > > +      return false;
> > > +    }
> > 
> > Is it possible to distinguish the error from a memory allocation error?
> > It would be wrong to mis-classify a file just because the system is low
> > on memory.
> 
> You are right this is not the proper way to report the issue.
> Normally, when just using elf_begin, a NULL return should be reported
> through elf_issue (which will set the issues flag).
> 
> But, because I added -z, we are using either elf_begin or
> dwelf_elf_begin. dwelf_elf_begin will return NULL (instead of a an
> empty (ELF_K_NONE) Elf descriptor when there is an issue, or the
> (decompressed) file wasn't an ELF file.
> 
> So we should split the error reporting. If we called elf_begin and get
> NULL we should call elf_issue to report the proper issue.
> 
> If we called dwefl_elf_begin and we get NULL, I am not sure yet what
> the proper way is to detect whether it is a real issue, or "just" not
> a (decompressed) ELF file. I am afraid the current handling is the
> best we can do.
> 
> Maybe we can fix dwelf_elf_begin to return an empty (ELF_K_NONE) Elf
> descriptor if there was no issue, but the (decompressed) file wasn't
> an ELF file.

Sorry this took so long. And this is indeed the last issue holding up
the release. But this is a tricky problem.

We made a mistake when we wrote the contract for dwelf_elf_begin by
saying it would never return ELF_K_NONE. That made it different from
elf_begin and made it impossible to distinguish between a real (file or
decompression) error and whether the file simply wasn't an ELF file and
also wasn't a compressed ELF file.

I think we should fix the contract. Technically it would be an API
break, but I think no user is really relying on the fact that the Elf
handle returned is never ELF_K_NONE. Users still need to distinguish
between ELF_K_ELF and ELF_K_AR (and theoretically any other ELF_K_type,
like COFF, which we currently don't support, but we do define it).

So that is what the attached patch does. I also audited all
decompression code to make sure it returns error codes consistently.
The decompression will either decompress successfully and return
DWFL_E_NOERROR, or if the file wasn't compressed (or an embedded image)
it will return DWFL_E_BADELF. In all other cases (file or decompression
error) it will set a a different DWFL_E error.

This "only" leaves the problem that we don't have a good way to
translate those errors into "real" libelf error codes. So for now we
just fake one if it wasn't an elf_errno value. I don't intent to try to
solve this error translation issue before the release (I don't know how
to do it yet).

What do you think about this change to dwelf_elf_begin?
The change would make it possible to detect real errors in the
elfclassify code, whether elf_begin or dwelf_elf_begin was used. So we
would not misclassify files (but return an error status of 2).

Thanks,

Mark