* runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? @ 2022-07-11 16:40 Milian Wolff 2022-07-26 15:28 ` Mark Wielaard 0 siblings, 1 reply; 5+ messages in thread From: Milian Wolff @ 2022-07-11 16:40 UTC (permalink / raw) To: elfutils-devel [-- Attachment #1: Type: text/plain, Size: 2142 bytes --] Hey there, in heaptrack I have code to runtime attach to a program and then rewrite the various rel / rela / jmprel tables to intercept calls to malloc & friends. This works, but now I have received a crash report for what seems to be an invalid DSO file: The jmprel table contains an invalid entry which points to an out-of-bounds symbol, leading to a crash when we try to look at the symbol's name. I would like to protect against this crash by detecting the invalid symbols. But to do that, I would need to know the size of the symbol table, which is much harder than I would have hoped: We have: ``` #define DT_SYMTAB 6 /* Address of symbol table */ #define DT_SYMENT 11 /* Size of one symbol table entry */ ``` But there is no `DT_SYMSZ` or similar, which we would need to validate symbol indices. Am I overlooking something or is that really missing? Does anyone know why? The other tables have that, e.g.: ``` #define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */ #define DT_RELASZ 8 /* Total size of Rela relocs */ #define DT_STRSZ 10 /* Size of string table */ #define DT_RELSZ 18 /* Total size of Rel relocs */ ``` Why is this missing for the symtab? The only viable alternative seems to be to mmap the file completely to access the Elf header and then iterate over the Elf sections to query the size of the SHT_DYNSYM section. This is pretty complicated, and costly. Does anyone have a better solution that would allow me to validate symbol indices? Thanks PS: eu-elflint reports this for the broken DSOs e.g.: ``` $ eu-elflint libQt5Qml.so.5.12 section [ 3] '.dynsym': symbol 1272: st_value out of bounds section [ 3] '.dynsym': symbol 3684: st_value out of bounds section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match .got section size 18340 section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment size 336 section [29] '.symtab': symbol 25720: st_value out of bounds section [29] '.symtab': symbol 27227: st_value out of bounds ``` Does anyone know how this can happen? Is this a bug in the toolchain? -- Milian Wolff mail@milianw.de http://milianw.de [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? 2022-07-11 16:40 runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? Milian Wolff @ 2022-07-26 15:28 ` Mark Wielaard 2022-07-27 11:38 ` Milian Wolff 0 siblings, 1 reply; 5+ messages in thread From: Mark Wielaard @ 2022-07-26 15:28 UTC (permalink / raw) To: Milian Wolff, elfutils-devel Hi Milian, On Mon, 2022-07-11 at 18:40 +0200, Milian Wolff wrote: > in heaptrack I have code to runtime attach to a program and then > rewrite the > various rel / rela / jmprel tables to intercept calls to malloc & friends. > > This works, but now I have received a crash report for what seems to > be an > invalid DSO file: The jmprel table contains an invalid entry which > points to > an out-of-bounds symbol, leading to a crash when we try to look at > the > symbol's name. > > I would like to protect against this crash by detecting the invalid > symbols. > But to do that, I would need to know the size of the symbol table, > which is > much harder than I would have hoped: > > We have: > > ``` > #define DT_SYMTAB 6 /* Address of symbol table */ > #define DT_SYMENT 11 /* Size of one symbol table > entry */ > ``` > > But there is no `DT_SYMSZ` or similar, which we would need to > validate symbol > indices. Am I overlooking something or is that really missing? Does > anyone > know why? The other tables have that, e.g.: > > ``` > #define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */ > #define DT_RELASZ 8 /* Total size of Rela relocs */ > #define DT_STRSZ 10 /* Size of string table */ > #define DT_RELSZ 18 /* Total size of Rel relocs > */ > ``` > > Why is this missing for the symtab? > > The only viable alternative seems to be to mmap the file completely > to access > the Elf header and then iterate over the Elf sections to query the > size of the > SHT_DYNSYM section. This is pretty complicated, and costly. Does > anyone have a > better solution that would allow me to validate symbol indices? I don't know why it is missing, but it is indeed a tricky issue. You really want to know the number of elements (or the size) of the symbol table, but it takes a little gymnastics to get that. Di Chen recently (or actually not that recently, I just still haven't reviewed, sorry!) posted a patch for https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out the symbols from the dynamic segment https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html > PS: eu-elflint reports this for the broken DSOs e.g.: > ``` > $ eu-elflint libQt5Qml.so.5.12 > section [ 3] '.dynsym': symbol 1272: st_value out of bounds > section [ 3] '.dynsym': symbol 3684: st_value out of bounds > section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not > match > .got section size 18340 > section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic > segment > size 336 > section [29] '.symtab': symbol 25720: st_value out of bounds > section [29] '.symtab': symbol 27227: st_value out of bounds > ``` > > Does anyone know how this can happen? Is this a bug in the toolchain? Try with eu-elflint --gnu which suppresses some known issues. Also could you show those symbol values (1272, 3684, 25720, 27227) they might have a special type, so their st_value isn't really an address? Cheers, Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? 2022-07-26 15:28 ` Mark Wielaard @ 2022-07-27 11:38 ` Milian Wolff 2022-07-28 16:41 ` Mark Wielaard 0 siblings, 1 reply; 5+ messages in thread From: Milian Wolff @ 2022-07-27 11:38 UTC (permalink / raw) To: elfutils-devel, Mark Wielaard [-- Attachment #1: Type: text/plain, Size: 4315 bytes --] On Dienstag, 26. Juli 2022 17:28:11 CEST Mark Wielaard wrote: > Hi Milian, > > On Mon, 2022-07-11 at 18:40 +0200, Milian Wolff wrote: > > in heaptrack I have code to runtime attach to a program and then > > rewrite the > > various rel / rela / jmprel tables to intercept calls to malloc & friends. > > > > This works, but now I have received a crash report for what seems to > > be an > > invalid DSO file: The jmprel table contains an invalid entry which > > points to > > an out-of-bounds symbol, leading to a crash when we try to look at > > the > > symbol's name. > > > > I would like to protect against this crash by detecting the invalid > > symbols. > > But to do that, I would need to know the size of the symbol table, > > which is > > much harder than I would have hoped: > > > > We have: > > > > ``` > > #define DT_SYMTAB 6 /* Address of symbol table */ > > #define DT_SYMENT 11 /* Size of one symbol table > > entry */ > > ``` > > > > But there is no `DT_SYMSZ` or similar, which we would need to > > validate symbol > > indices. Am I overlooking something or is that really missing? Does > > anyone > > know why? The other tables have that, e.g.: > > > > ``` > > #define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */ > > #define DT_RELASZ 8 /* Total size of Rela relocs */ > > #define DT_STRSZ 10 /* Size of string table */ > > #define DT_RELSZ 18 /* Total size of Rel relocs > > */ > > ``` > > > > Why is this missing for the symtab? > > > > The only viable alternative seems to be to mmap the file completely > > to access > > the Elf header and then iterate over the Elf sections to query the > > size of the > > SHT_DYNSYM section. This is pretty complicated, and costly. Does > > anyone have a > > better solution that would allow me to validate symbol indices? > > I don't know why it is missing, but it is indeed a tricky issue. You > really want to know the number of elements (or the size) of the symbol > table, but it takes a little gymnastics to get that. Thanks for confirming that this isn't available currently. Would it be possible to add this? What's the process for standardization here? I guess it would take a very long time, yet this seems to me as if it would be beneficial in the long term. > Di Chen recently > (or actually not that recently, I just still haven't reviewed, sorry!) > posted a patch for > https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out the > symbols from the dynamic segment > https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html Interesting. But from what I can tell, this patch has access to the full Elf object and thus can access segments which are not normally loaded at runtime? > > PS: eu-elflint reports this for the broken DSOs e.g.: > > ``` > > $ eu-elflint libQt5Qml.so.5.12 > > section [ 3] '.dynsym': symbol 1272: st_value out of bounds > > section [ 3] '.dynsym': symbol 3684: st_value out of bounds > > section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not > > match > > .got section size 18340 > > section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic > > segment > > size 336 > > section [29] '.symtab': symbol 25720: st_value out of bounds > > section [29] '.symtab': symbol 27227: st_value out of bounds > > ``` > > > > Does anyone know how this can happen? Is this a bug in the toolchain? > > Try with eu-elflint --gnu which suppresses some known issues. Indeed, with `--gnu` the tool reports `No errors`. > Also could you show those symbol values (1272, 3684, 25720, 27227) they > might have a special type, so their st_value isn't really an address? ``` $ eu-readelf -s libQt5Qml.so.5.12.0 | grep -E "^\s*(1272|3684|25720|27227):" 1272: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start__@@Qt_5 3684: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start@@Qt_5 1272: 003ccc4c 0 NOTYPE LOCAL DEFAULT 17 $d 3684: 003cbfec 0 NOTYPE LOCAL DEFAULT 17 $d 25720: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start 27227: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start__ ``` The first two matches come from the `.dynsym`, the last four come from `.symtab`. Can anyone tell me how `eu-readelf` resolves these symbol names? Thanks -- Milian Wolff mail@milianw.de http://milianw.de [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? 2022-07-27 11:38 ` Milian Wolff @ 2022-07-28 16:41 ` Mark Wielaard 2022-08-28 6:41 ` Jacob Burkholder 0 siblings, 1 reply; 5+ messages in thread From: Mark Wielaard @ 2022-07-28 16:41 UTC (permalink / raw) To: Milian Wolff, elfutils-devel Hi Milian, On Wed, 2022-07-27 at 13:38 +0200, Milian Wolff wrote: > Thanks for confirming that this isn't available currently. Would it > be > possible to add this? What's the process for standardization here? I guess it > would take a very long time, yet this seems to me as if it would be beneficial > in the long term. Standardization of the ELF gabi takes place on (sorry google groups, I know, sigh): https://groups.google.com/g/generic-abi you should be able to subscribe with generic-abi+subscribe@googlegroups.com so you don't have to go through the webgui mess. There is also https://sourceware.org/gnu-gabi/ but that is more for GNU extensions and I think you want something generic. > > Di Chen recently > > (or actually not that recently, I just still haven't reviewed, > > sorry!) > > posted a patch for > > https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out > > the > > symbols from the dynamic segment > > https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html > > Interesting. But from what I can tell, this patch has access to the > full Elf > object and thus can access segments which are not normally loaded at > runtime? Yes it could, but it doesn't use anything that isn't referenced from the phdrs or dynamic segment, so it only uses those parts that are normally loaded at runtime. If you go through the dynamic segment then everything it references (.dynsym in this case) is from a loaded segment. So going through phdrs to check where it is loaded and the length is fine. > Try with eu-elflint --gnu which suppresses some known issues. > > Indeed, with `--gnu` the tool reports `No errors`. > > > Also could you show those symbol values (1272, 3684, 25720, 27227) > > they > > might have a special type, so their st_value isn't really an > > address? > > ``` > $ eu-readelf -s libQt5Qml.so.5.12.0 | grep -E > "^\s*(1272|3684|25720|27227):" > 1272: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 > __bss_start__@@Qt_5 > 3684: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 > __bss_start@@Qt_5 > 1272: 003ccc4c 0 NOTYPE LOCAL DEFAULT 17 $d > 3684: 003cbfec 0 NOTYPE LOCAL DEFAULT 17 $d > 25720: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start > 27227: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start__ > ``` > > The first two matches come from the `.dynsym`, the last four come > from > `.symtab`. > > Can anyone tell me how `eu-readelf` resolves these symbol names? Currently through the section tables, which point to the string table section used. But Di Chen's patch would change that by going through the dynamic segment and phdrs to find the strtab for the dynsym segment (but will of course still need to go through the sections for the .symtab symbols since those aren't accessible through the phdrs). Cheers, Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? 2022-07-28 16:41 ` Mark Wielaard @ 2022-08-28 6:41 ` Jacob Burkholder 0 siblings, 0 replies; 5+ messages in thread From: Jacob Burkholder @ 2022-08-28 6:41 UTC (permalink / raw) To: Mark Wielaard; +Cc: Milian Wolff, elfutils-devel I've always used the fact that the symbol table and string table are present in the binary in that order and adjacent to each other to find the number of symbol table entries., ie (strtab - (char *)symtab) / sizeof(*symtab), although clearly this is not required and may just be gnu ld convention. DT_SYMSZ would be a useful addition to the ELF standard IMO. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-28 6:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-11 16:40 runtime validation of DT_SYMTAB lookups - why is there no DT_SYMSZ? Milian Wolff 2022-07-26 15:28 ` Mark Wielaard 2022-07-27 11:38 ` Milian Wolff 2022-07-28 16:41 ` Mark Wielaard 2022-08-28 6:41 ` Jacob Burkholder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).