public inbox for elfutils@sourceware.org
 help / color / mirror / Atom feed
* runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ?
@ 2022-07-11 16:40 Milian Wolff
  2022-07-26 15:28 ` Mark Wielaard
  0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2022-07-11 16:40 UTC (permalink / raw)
  To: elfutils-devel

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

Hey there,

in heaptrack I have code to runtime attach to a program and then rewrite the 
various rel / rela / jmprel tables to intercept calls to malloc & friends.

This works, but now I have received a crash report for what seems to be an 
invalid DSO file: The jmprel table contains an invalid entry which points to 
an out-of-bounds symbol, leading to a crash when we try to look at the 
symbol's name.

I would like to protect against this crash by detecting the invalid symbols. 
But to do that, I would need to know the size of the symbol table, which is 
much harder than I would have hoped:

We have:

```
#define DT_SYMTAB	6		/* Address of symbol table */
#define DT_SYMENT	11		/* Size of one symbol table entry */
```

But there is no `DT_SYMSZ` or similar, which we would need to validate symbol 
indices. Am I overlooking something or is that really missing? Does anyone 
know why? The other tables have that, e.g.:

```
#define DT_PLTRELSZ	2		/* Size in bytes of PLT relocs */
#define DT_RELASZ	8		/* Total size of Rela relocs */
#define DT_STRSZ	10		/* Size of string table */
#define DT_RELSZ	18		/* Total size of Rel relocs */
```

Why is this missing for the symtab?

The only viable alternative seems to be to mmap the file completely to access 
the Elf header and then iterate over the Elf sections to query the size of the 
SHT_DYNSYM section. This is pretty complicated, and costly. Does anyone have a 
better solution that would allow me to validate symbol indices?

Thanks

PS: eu-elflint reports this for the broken DSOs e.g.:
```
$ eu-elflint libQt5Qml.so.5.12
section [ 3] '.dynsym': symbol 1272: st_value out of bounds
section [ 3] '.dynsym': symbol 3684: st_value out of bounds
section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match 
.got section size 18340
section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment 
size 336
section [29] '.symtab': symbol 25720: st_value out of bounds
section [29] '.symtab': symbol 27227: st_value out of bounds
```

Does anyone know how this can happen? Is this a bug in the toolchain?
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ?
  2022-07-11 16:40 runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? Milian Wolff
@ 2022-07-26 15:28 ` Mark Wielaard
  2022-07-27 11:38   ` Milian Wolff
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Wielaard @ 2022-07-26 15:28 UTC (permalink / raw)
  To: Milian Wolff, elfutils-devel

Hi Milian,

On Mon, 2022-07-11 at 18:40 +0200, Milian Wolff wrote:
> in heaptrack I have code to runtime attach to a program and then
> rewrite the 
> various rel / rela / jmprel tables to intercept calls to malloc & friends.
> 
> This works, but now I have received a crash report for what seems to
> be an 
> invalid DSO file: The jmprel table contains an invalid entry which
> points to 
> an out-of-bounds symbol, leading to a crash when we try to look at
> the 
> symbol's name.
> 
> I would like to protect against this crash by detecting the invalid
> symbols. 
> But to do that, I would need to know the size of the symbol table,
> which is 
> much harder than I would have hoped:
> 
> We have:
> 
> ```
> #define DT_SYMTAB	6		/* Address of symbol table */
> #define DT_SYMENT	11		/* Size of one symbol table
> entry */
> ```
> 
> But there is no `DT_SYMSZ` or similar, which we would need to
> validate symbol 
> indices. Am I overlooking something or is that really missing? Does
> anyone 
> know why? The other tables have that, e.g.:
> 
> ```
> #define DT_PLTRELSZ	2		/* Size in bytes of PLT relocs */
> #define DT_RELASZ	8		/* Total size of Rela relocs */
> #define DT_STRSZ	10		/* Size of string table */
> #define DT_RELSZ	18		/* Total size of Rel relocs
> */
> ```
> 
> Why is this missing for the symtab?
>
> The only viable alternative seems to be to mmap the file completely
> to access 
> the Elf header and then iterate over the Elf sections to query the
> size of the 
> SHT_DYNSYM section. This is pretty complicated, and costly. Does
> anyone have a 
> better solution that would allow me to validate symbol indices?

I don't know why it is missing, but it is indeed a tricky issue. You
really want to know the number of elements (or the size) of the symbol
table, but it takes a little gymnastics to get that. Di Chen recently
(or actually not that recently, I just still haven't reviewed, sorry!)
posted a patch for 
https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out the
symbols from the dynamic segment 
https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html

> PS: eu-elflint reports this for the broken DSOs e.g.:
> ```
> $ eu-elflint libQt5Qml.so.5.12
> section [ 3] '.dynsym': symbol 1272: st_value out of bounds
> section [ 3] '.dynsym': symbol 3684: st_value out of bounds
> section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not
> match 
> .got section size 18340
> section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic
> segment 
> size 336
> section [29] '.symtab': symbol 25720: st_value out of bounds
> section [29] '.symtab': symbol 27227: st_value out of bounds
> ```
> 
> Does anyone know how this can happen? Is this a bug in the toolchain?

Try with eu-elflint --gnu which suppresses some known issues.
Also could you show those symbol values (1272, 3684, 25720, 27227) they
might have a special type, so their st_value isn't really an address?

Cheers,

Mark


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ?
  2022-07-26 15:28 ` Mark Wielaard
@ 2022-07-27 11:38   ` Milian Wolff
  2022-07-28 16:41     ` Mark Wielaard
  0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2022-07-27 11:38 UTC (permalink / raw)
  To: elfutils-devel, Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 4315 bytes --]

On Dienstag, 26. Juli 2022 17:28:11 CEST Mark Wielaard wrote:
> Hi Milian,
> 
> On Mon, 2022-07-11 at 18:40 +0200, Milian Wolff wrote:
> > in heaptrack I have code to runtime attach to a program and then
> > rewrite the
> > various rel / rela / jmprel tables to intercept calls to malloc & friends.
> > 
> > This works, but now I have received a crash report for what seems to
> > be an
> > invalid DSO file: The jmprel table contains an invalid entry which
> > points to
> > an out-of-bounds symbol, leading to a crash when we try to look at
> > the
> > symbol's name.
> > 
> > I would like to protect against this crash by detecting the invalid
> > symbols.
> > But to do that, I would need to know the size of the symbol table,
> > which is
> > much harder than I would have hoped:
> > 
> > We have:
> > 
> > ```
> > #define DT_SYMTAB	6		/* Address of symbol table */
> > #define DT_SYMENT	11		/* Size of one symbol table
> > entry */
> > ```
> > 
> > But there is no `DT_SYMSZ` or similar, which we would need to
> > validate symbol
> > indices. Am I overlooking something or is that really missing? Does
> > anyone
> > know why? The other tables have that, e.g.:
> > 
> > ```
> > #define DT_PLTRELSZ	2		/* Size in bytes of PLT relocs */
> > #define DT_RELASZ	8		/* Total size of Rela relocs */
> > #define DT_STRSZ	10		/* Size of string table */
> > #define DT_RELSZ	18		/* Total size of Rel relocs
> > */
> > ```
> > 
> > Why is this missing for the symtab?
> > 
> > The only viable alternative seems to be to mmap the file completely
> > to access
> > the Elf header and then iterate over the Elf sections to query the
> > size of the
> > SHT_DYNSYM section. This is pretty complicated, and costly. Does
> > anyone have a
> > better solution that would allow me to validate symbol indices?
> 
> I don't know why it is missing, but it is indeed a tricky issue. You
> really want to know the number of elements (or the size) of the symbol
> table, but it takes a little gymnastics to get that.

Thanks for confirming that this isn't available currently. Would it be 
possible to add this? What's the process for standardization here? I guess it 
would take a very long time, yet this seems to me as if it would be beneficial 
in the long term.

> Di Chen recently
> (or actually not that recently, I just still haven't reviewed, sorry!)
> posted a patch for
> https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out the
> symbols from the dynamic segment
> https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html

Interesting. But from what I can tell, this patch has access to the full Elf 
object and thus can access segments which are not normally loaded at runtime?

> > PS: eu-elflint reports this for the broken DSOs e.g.:
> > ```
> > $ eu-elflint libQt5Qml.so.5.12
> > section [ 3] '.dynsym': symbol 1272: st_value out of bounds
> > section [ 3] '.dynsym': symbol 3684: st_value out of bounds
> > section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not
> > match
> > .got section size 18340
> > section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic
> > segment
> > size 336
> > section [29] '.symtab': symbol 25720: st_value out of bounds
> > section [29] '.symtab': symbol 27227: st_value out of bounds
> > ```
> > 
> > Does anyone know how this can happen? Is this a bug in the toolchain?
> 
> Try with eu-elflint --gnu which suppresses some known issues.

Indeed, with `--gnu` the tool reports `No errors`.

> Also could you show those symbol values (1272, 3684, 25720, 27227) they
> might have a special type, so their st_value isn't really an address?

```
$ eu-readelf -s libQt5Qml.so.5.12.0 | grep -E "^\s*(1272|3684|25720|27227):"
 1272: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start__@@Qt_5
 3684: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start@@Qt_5
 1272: 003ccc4c      0 NOTYPE  LOCAL  DEFAULT       17 $d
 3684: 003cbfec      0 NOTYPE  LOCAL  DEFAULT       17 $d
25720: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start
27227: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start__
```

The first two matches come from the `.dynsym`, the last four come from 
`.symtab`.

Can anyone tell me how `eu-readelf` resolves these symbol names?

Thanks

-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ?
  2022-07-27 11:38   ` Milian Wolff
@ 2022-07-28 16:41     ` Mark Wielaard
  2022-08-28  6:41       ` Jacob Burkholder
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Wielaard @ 2022-07-28 16:41 UTC (permalink / raw)
  To: Milian Wolff, elfutils-devel

Hi Milian,

On Wed, 2022-07-27 at 13:38 +0200, Milian Wolff wrote:
> Thanks for confirming that this isn't available currently. Would it
> be 
> possible to add this? What's the process for standardization here? I guess it 
> would take a very long time, yet this seems to me as if it would be beneficial 
> in the long term.

Standardization of the ELF gabi takes place on (sorry google groups, I
know, sigh): https://groups.google.com/g/generic-abi you should be able
to subscribe with generic-abi+subscribe@googlegroups.com so you don't
have to go through the webgui mess.

There is also https://sourceware.org/gnu-gabi/ but that is more for GNU
extensions and I think you want something generic.

> > Di Chen recently
> > (or actually not that recently, I just still haven't reviewed,
> > sorry!)
> > posted a patch for
> > https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out
> > the
> > symbols from the dynamic segment
> > https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html
> 
> Interesting. But from what I can tell, this patch has access to the
> full Elf 
> object and thus can access segments which are not normally loaded at
> runtime?

Yes it could, but it doesn't use anything that isn't referenced from
the phdrs or dynamic segment, so it only uses those parts that are
normally loaded at runtime. If you go through the dynamic segment then
everything it references (.dynsym in this case) is from a loaded
segment. So going through phdrs to check where it is loaded and the
length is fine.

> Try with eu-elflint --gnu which suppresses some known issues.
> 
> Indeed, with `--gnu` the tool reports `No errors`.
> 
> > Also could you show those symbol values (1272, 3684, 25720, 27227)
> > they
> > might have a special type, so their st_value isn't really an
> > address?
> 
> ```
> $ eu-readelf -s libQt5Qml.so.5.12.0 | grep -E
> "^\s*(1272|3684|25720|27227):"
>  1272: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25
> __bss_start__@@Qt_5
>  3684: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25
> __bss_start@@Qt_5
>  1272: 003ccc4c      0 NOTYPE  LOCAL  DEFAULT       17 $d
>  3684: 003cbfec      0 NOTYPE  LOCAL  DEFAULT       17 $d
> 25720: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start
> 27227: 003f9974      0 NOTYPE  GLOBAL DEFAULT       25 __bss_start__
> ```
> 
> The first two matches come from the `.dynsym`, the last four come
> from 
> `.symtab`.
> 
> Can anyone tell me how `eu-readelf` resolves these symbol names?

Currently through the section tables, which point to the string table
section used. But Di Chen's patch would change that by going through
the dynamic segment and phdrs to find the strtab for the dynsym segment
(but will of course still need to go through the sections for the
.symtab symbols since those aren't accessible through the phdrs).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ?
  2022-07-28 16:41     ` Mark Wielaard
@ 2022-08-28  6:41       ` Jacob Burkholder
  0 siblings, 0 replies; 5+ messages in thread
From: Jacob Burkholder @ 2022-08-28  6:41 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Milian Wolff, elfutils-devel

I've always used the fact that the symbol table and string table are
present in the binary in that order and adjacent to each other to find
the number of symbol table entries., ie (strtab - (char *)symtab) /
sizeof(*symtab), although clearly this is not required and may just be
gnu ld convention.  DT_SYMSZ would be a useful addition to the ELF
standard IMO.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-28  6:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-11 16:40 runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? Milian Wolff
2022-07-26 15:28 ` Mark Wielaard
2022-07-27 11:38   ` Milian Wolff
2022-07-28 16:41     ` Mark Wielaard
2022-08-28  6:41       ` Jacob Burkholder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).