public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* LD_PRELOAD wrappers for system calls and stdio
@ 2021-09-04 19:00 Dennis Filder
  2021-09-09 10:51 ` Adhemerval Zanella
  0 siblings, 1 reply; 2+ messages in thread
From: Dennis Filder @ 2021-09-04 19:00 UTC (permalink / raw)
  To: libc-help

Hi,

I'm trying to write an LD_PRELOAD hack for the purpose of
tracing/logging.  The goal was to get a time-stamped copy of
everything that is ever read/written over a selected set of file
descriptors and log it to another file descriptor.  A second goal was
achieving a high degree of portability.

My naive hope was that I could implement this by wrapping just a
handful of system call wrappers and be done with it.  Imagine my
surprise when in the process of coding that up (under Linux) I came to
notice that not all stdio functions use write() internally to actually
write data to a file descriptor.  Some, e.g. fwrite(), do something
esoteric which involves book-keeping with a FILE object and also an
apparently in-lined invocation of syscall(SYS_write, ...).  If my
understanding is correct then this means that it is literally
impossible to attain my goal by wrapping only the system call wrappers
which would leave me (and thus basically everyone in a similar
position) with these options:

 a) also wrap essentially /every/ stdio function that is not
    guaranteed to only call already wrapped functions,

 b) use the Linux Auditing System instead, or

 c) use ptrace() instead and reimplement 3/4 of ltrace/strace.

Neither prospect has me rejoicing as they involve either a ton of work
and/or sacrificing portability.

What am I supposed to do?

I'm currently examining what it would take for option a), but I'm
running into a steady stream of roadblocks.  A major one I'm stuck at
are the variadic functions (printf and friends).  One way out seems to
be to use GCC's __builtin_apply and calculate its size argument using
a function that would have to be similar to glibc's
parse_printf_format, but which would only return the number of bytes
the arguments occupy on the stack (Would it be too much to ask to
provide such a function as part of glibc?).  But I don't know if
register-involving calling conventions will harmonize with that
approach.  Will they?  Also what makes me reluctant to explore this
further is the fear that I will eventually have to implement not just
wrappers, but full-on replacements.  And I'd probably have to do the
same for libstdc++, too.

Thanks in advance for any help/clarification.

P.S.: Solutions that involve installing a specially built version of
glibc (e.g. with INLINE_SYSCALL undefined) are less than ideal because
this project is not for personal, but public use, and having a
custom-built libc as a dependency is thus effectively a showstopper.
But maybe it would be possible to transplant a subset of routines from
such a libc into my library.  But how would I even do that?  Close
study of build logs tells me one of stdio-common/stamp.o and
libc_pic.{a,os,os.clean} probably contains what I want, but I'm not
sure what will break if I just copy that over.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: LD_PRELOAD wrappers for system calls and stdio
  2021-09-04 19:00 LD_PRELOAD wrappers for system calls and stdio Dennis Filder
@ 2021-09-09 10:51 ` Adhemerval Zanella
  0 siblings, 0 replies; 2+ messages in thread
From: Adhemerval Zanella @ 2021-09-09 10:51 UTC (permalink / raw)
  To: Dennis Filder, Libc-help



On 04/09/2021 16:00, Dennis Filder via Libc-help wrote:
> Hi,
> 
> I'm trying to write an LD_PRELOAD hack for the purpose of
> tracing/logging.  The goal was to get a time-stamped copy of
> everything that is ever read/written over a selected set of file
> descriptors and log it to another file descriptor.  A second goal was
> achieving a high degree of portability.
> 
> My naive hope was that I could implement this by wrapping just a
> handful of system call wrappers and be done with it.  Imagine my
> surprise when in the process of coding that up (under Linux) I came to
> notice that not all stdio functions use write() internally to actually
> write data to a file descriptor.  Some, e.g. fwrite(), do something
> esoteric which involves book-keeping with a FILE object and also an
> apparently in-lined invocation of syscall(SYS_write, ...).  If my
> understanding is correct then this means that it is literally
> impossible to attain my goal by wrapping only the system call wrappers
> which would leave me (and thus basically everyone in a similar
> position) with these options:

Yes, glibc internal calls to functions like read() and write() are
*not* done through PLT calls.  It means that symbols interposition
does not work for such cases.

> 
>  a) also wrap essentially /every/ stdio function that is not
>     guaranteed to only call already wrapped functions,
> 
>  b) use the Linux Auditing System instead, or
> 
>  c) use ptrace() instead and reimplement 3/4 of ltrace/strace.
> 
> Neither prospect has me rejoicing as they involve either a ton of work
> and/or sacrificing portability.
> 
> What am I supposed to do?

For Linux you have seccomp filters [1] and with 3.5+ you can optimize
it a bit by setting only the syscalls you are interested.  Mike Frysinger 
discussed with some options on a previous thread [2].

> 
> I'm currently examining what it would take for option a), but I'm
> running into a steady stream of roadblocks.  A major one I'm stuck at
> are the variadic functions (printf and friends).  One way out seems to
> be to use GCC's __builtin_apply and calculate its size argument using
> a function that would have to be similar to glibc's
> parse_printf_format, but which would only return the number of bytes
> the arguments occupy on the stack (Would it be too much to ask to
> provide such a function as part of glibc?).  But I don't know if
> register-involving calling conventions will harmonize with that
> approach.  Will they?  Also what makes me reluctant to explore this
> further is the fear that I will eventually have to implement not just
> wrappers, but full-on replacements.  And I'd probably have to do the
> same for libstdc++, too.

Depending of what you intended to catch you will need *a lot* of 
boilerplate for this approach indeed.  I haven't explored a way to
interpose variadic functions, but afaik you can't really do it in
*portable* way (you will need to either resort in a compiler or
ABI extension).

> 
> Thanks in advance for any help/clarification.
> 
> P.S.: Solutions that involve installing a specially built version of
> glibc (e.g. with INLINE_SYSCALL undefined) are less than ideal because
> this project is not for personal, but public use, and having a
> custom-built libc as a dependency is thus effectively a showstopper.
> But maybe it would be possible to transplant a subset of routines from
> such a libc into my library.  But how would I even do that?  Close
> study of build logs tells me one of stdio-common/stamp.o and
> libc_pic.{a,os,os.clean} probably contains what I want, but I'm not
> sure what will break if I just copy that over.
> 

This was suggested some time ago and it is the idea of libOS [3].
On that thread there is some discussion on pro and cons with this
approach.  You can also check how it has done it.


[1] https://sourceware.org/pipermail/libc-help/2021-August/006002.html
[2] https://sourceware.org/pipermail/libc-help/2021-August/006002.html
[3] https://sourceware.org/legacy-ml/libc-alpha/2019-09/msg00188.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-09-09 10:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-04 19:00 LD_PRELOAD wrappers for system calls and stdio Dennis Filder
2021-09-09 10:51 ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).