On Dienstag, 26. Juli 2022 17:28:11 CEST Mark Wielaard wrote: > Hi Milian, > > On Mon, 2022-07-11 at 18:40 +0200, Milian Wolff wrote: > > in heaptrack I have code to runtime attach to a program and then > > rewrite the > > various rel / rela / jmprel tables to intercept calls to malloc & friends. > > > > This works, but now I have received a crash report for what seems to > > be an > > invalid DSO file: The jmprel table contains an invalid entry which > > points to > > an out-of-bounds symbol, leading to a crash when we try to look at > > the > > symbol's name. > > > > I would like to protect against this crash by detecting the invalid > > symbols. > > But to do that, I would need to know the size of the symbol table, > > which is > > much harder than I would have hoped: > > > > We have: > > > > ``` > > #define DT_SYMTAB 6 /* Address of symbol table */ > > #define DT_SYMENT 11 /* Size of one symbol table > > entry */ > > ``` > > > > But there is no `DT_SYMSZ` or similar, which we would need to > > validate symbol > > indices. Am I overlooking something or is that really missing? Does > > anyone > > know why? The other tables have that, e.g.: > > > > ``` > > #define DT_PLTRELSZ 2 /* Size in bytes of PLT relocs */ > > #define DT_RELASZ 8 /* Total size of Rela relocs */ > > #define DT_STRSZ 10 /* Size of string table */ > > #define DT_RELSZ 18 /* Total size of Rel relocs > > */ > > ``` > > > > Why is this missing for the symtab? > > > > The only viable alternative seems to be to mmap the file completely > > to access > > the Elf header and then iterate over the Elf sections to query the > > size of the > > SHT_DYNSYM section. This is pretty complicated, and costly. Does > > anyone have a > > better solution that would allow me to validate symbol indices? > > I don't know why it is missing, but it is indeed a tricky issue. You > really want to know the number of elements (or the size) of the symbol > table, but it takes a little gymnastics to get that. Thanks for confirming that this isn't available currently. Would it be possible to add this? What's the process for standardization here? I guess it would take a very long time, yet this seems to me as if it would be beneficial in the long term. > Di Chen recently > (or actually not that recently, I just still haven't reviewed, sorry!) > posted a patch for > https://sourceware.org/bugzilla/show_bug.cgi?id=28873 to print out the > symbols from the dynamic segment > https://sourceware.org/pipermail/elfutils-devel/2022q2/005086.html Interesting. But from what I can tell, this patch has access to the full Elf object and thus can access segments which are not normally loaded at runtime? > > PS: eu-elflint reports this for the broken DSOs e.g.: > > ``` > > $ eu-elflint libQt5Qml.so.5.12 > > section [ 3] '.dynsym': symbol 1272: st_value out of bounds > > section [ 3] '.dynsym': symbol 3684: st_value out of bounds > > section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not > > match > > .got section size 18340 > > section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic > > segment > > size 336 > > section [29] '.symtab': symbol 25720: st_value out of bounds > > section [29] '.symtab': symbol 27227: st_value out of bounds > > ``` > > > > Does anyone know how this can happen? Is this a bug in the toolchain? > > Try with eu-elflint --gnu which suppresses some known issues. Indeed, with `--gnu` the tool reports `No errors`. > Also could you show those symbol values (1272, 3684, 25720, 27227) they > might have a special type, so their st_value isn't really an address? ``` $ eu-readelf -s libQt5Qml.so.5.12.0 | grep -E "^\s*(1272|3684|25720|27227):" 1272: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start__@@Qt_5 3684: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start@@Qt_5 1272: 003ccc4c 0 NOTYPE LOCAL DEFAULT 17 $d 3684: 003cbfec 0 NOTYPE LOCAL DEFAULT 17 $d 25720: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start 27227: 003f9974 0 NOTYPE GLOBAL DEFAULT 25 __bss_start__ ``` The first two matches come from the `.dynsym`, the last four come from `.symtab`. Can anyone tell me how `eu-readelf` resolves these symbol names? Thanks -- Milian Wolff mail@milianw.de http://milianw.de